HAI AI Index Report 2024

Artificial
Intelligence
Index Report
2024
Artificial Intelligence
Index Report 2024
Introduction to the
AI Index Report 2024
Welcome to the seventh edition of the AI Index report. The 2024 Index is our most comprehensive to date and
arrives at an important moment when AI’s influence on society has never been more pronounced. This year,
we have broadened our scope to more extensively cover essential trends such as technical advancements
in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development.
Featuring more original data than ever before, this edition introduces new estimates on AI training costs,
detailed analyses of the responsible AI landscape, and an entirely new chapter dedicated to AI’s impact on
science and medicine.
The AI Index report tracks, collates, distills, and visualizes data related to artificial intelligence (AI). Our
mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers,
executives, journalists, and the general public to develop a more thorough and nuanced understanding of the
complex field of AI.
The AI Index is recognized globally as one of the most credible and authoritative sources for data and insights
on artificial intelligence. Previous editions have been cited in major newspapers, including the The New York
Times, Bloomberg, and The Guardian, have amassed hundreds of academic citations, and been referenced
by high-level policymakers in the United States, the United Kingdom, and the European Union, among
other places. This year’s edition surpasses all previous ones in size, scale, and scope, reflecting the growing
significance that AI is coming to hold in all of our lives.
2
Index Report 2024
Message From
the Co-directors
A decade ago, the best AI systems in the world were unable to classify objects in images at a human level. AI
struggled with language comprehension and could not solve math problems. Today, AI systems routinely exceed
human performance on standard benchmarks.
Progress accelerated in 2023. New state-of-the-art systems like GPT-4, Gemini, and Claude 3 are impressively
multimodal: They can generate fluent text in dozens of languages, process audio, and even explain memes. As AI
has improved, it has increasingly forced its way into our lives. Companies are racing to build AI-based products,
and AI is increasingly being used by the general public. But current AI technology still has significant problems. It
cannot reliably deal with facts, perform complex reasoning, or explain its conclusions.
AI faces two interrelated futures. First, technology continues to improve and is increasingly used, having major
consequences for productivity and employment. It can be put to both good and bad uses. In the second future,
the adoption of AI is constrained by the limitations of the technology. Regardless of which future unfolds,
governments are increasingly concerned. They are stepping in to encourage the upside, such as funding university
R&D and incentivizing private investment. Governments are also aiming to manage the potential downsides, such
as impacts on employment, privacy concerns, misinformation, and intellectual property rights.
As AI rapidly evolves, the AI Index aims to help the AI community, policymakers, business leaders, journalists, and
the general public navigate this complex landscape. It provides ongoing, objective snapshots tracking several
key areas: technical progress in AI capabilities, the community and investments driving AI development and
deployment, public opinion on current and potential future impacts, and policy measures taken to stimulate AI
innovation while managing its risks and challenges. By comprehensively monitoring the AI ecosystem, the Index
serves as an important resource for understanding this transformative technological force.
On the technical front, this year’s AI Index reports that the number of new large language models released
worldwide in 2023 doubled over the previous year. Two-thirds were open-source, but the highest-performing
models came from industry players with closed systems. Gemini Ultra became the first LLM to reach human-
level performance on the Massive Multitask Language Understanding (MMLU) benchmark; performance on the
benchmark has improved by 15 percentage points since last year. Additionally, GPT-4 achieved an impressive 0.96
mean win rate score on the comprehensive Holistic Evaluation of Language Models (HELM) benchmark, which
includes MMLU among other evaluations.
3
Index Report 2024
Message From the

Co-directors (cont’d)
Although global private investment in AI decreased for the second consecutive year, investment in generative
AI skyrocketed. More Fortune 500 earnings calls mentioned AI than ever before, and new studies show that AI
tangibly boosts worker productivity. On the policymaking front, global mentions of AI in legislative proceedings
have never been higher. U.S. regulators passed more AI-related regulations in 2023 than ever before. Still, many
expressed concerns about AI’s ability to generate deepfakes and impact elections. The public became more
aware of AI, and studies suggest that they responded with nervousness.
Ray Perrault and Jack Clark

Co-directors, AI Index
4
Index Report 2024
Top 10 Takeaways
1. AI beats humans on some tasks, but not on all. AI has surpassed human performance on several
benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails
behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.
2. Industry continues to dominate frontier AI research. In 2023, industry produced 51 notable

machine learning models, while academia contributed only 15. There were also 21 notable models resulting from
industry-academia collaborations in 2023, a new high.
3. Frontier models get way more expensive. According to AI Index estimates, the training costs
of state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an
estimated $78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute.
4. The United States leads China, the EU, and the U.K. as the leading source of top AI
models. In 2023, 61 notable AI models originated from U.S.-based institutions, far outpacing the European
Union’s 21 and China’s 15.
5. Robust and standardized evaluations for LLM responsibility are seriously lacking.
New research from the AI Index reveals a significant lack of standardization in responsible AI reporting.
Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different
responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and
limitations of top AI models.
6. Generative AI investment skyrockets. Despite a decline in overall AI private investment last

year, funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion. Major players in
the generative AI space, including OpenAI, Anthropic, Hugging Face, and Inflection, reported substantial
fundraising rounds.
7. The data is in: AI makes workers more productive and leads to higher quality work. In
2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks more
quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge
the skill gap between low- and high-skilled workers. Still, other studies caution that using AI without proper
oversight can lead to diminished performance.
5
Index Report 2024
Top 10 Takeaways (cont’d)

8. Scientific progress accelerates even further, thanks to AI. In 2022, AI began to advance
scientific discovery. 2023, however, saw the launch of even more significant science-related AI applications—
from AlphaDev, which makes algorithmic sorting more efficient, to GNoME, which facilitates the process of
materials discovery.
9. The number of AI regulations in the United States sharply increases. The number of AI-
related regulations in the U.S. has risen significantly in the past year and over the last five years. In 2023, there
were 25 AI-related regulations, up from just one in 2016. Last year alone, the total number of AI-related regulations
grew by 56.3%.
10. People across the globe are more cognizant of AI’s potential impact—and more nervous.
A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect their
lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness toward AI
products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests that 52% of
Americans report feeling more concerned than excited about AI, rising from 37% in 2022.
6
Index Report 2024
Steering Committee
Co-directors
Jack Clark, Anthropic, OECD
Raymond Perrault, SRI International
Members
Erik Brynjolfsson, Stanford University Juan Carlos Niebles, Stanford University, Salesforce
John Etchemendy, Stanford University Vanessa Parli, Stanford University
Katrina Ligett, Hebrew University Yoav Shoham, Stanford University, AI21 Labs
Terah Lyons, JPMorgan Chase & Co. Russell Wald, Stanford University
James Manyika, Google, University of Oxford
Staff and Researchers

Research Manager and Editor in Chief
Nestor Maslej
Stanford University
Research Associate
Loredana Fattorini
Stanford University
Affiliated Researchers
Elif Kiesow Cortez, Stanford Law School Research Fellow Alexandra Rome, Freelance Researcher
Anka Reuel, Stanford University Lapo Santarlasci, IMT School for
Robi Rahman, Data Scientist Advanced Studies Lucca
Graduate Researchers
James da Costa, Stanford University
Simba Jonga, Stanford University
Undergraduate Researchers
Emily Capstick, Stanford University Julia Betts Lotufo, Stanford University
Summer Flowers, Stanford University Sukrut Oak, Stanford University
Armin Hamrah, Claremont McKenna College Andrew Shi, Stanford University
Amelia Hardy, Stanford University Jason Shin, Stanford University
Mena Hassan, Stanford University Emma Williamson, Stanford University
Ethan Duncan He-Li Hellman, Stanford University Alfred Yu, Stanford University
7
Index Report 2024
How to Cite This Report

Nestor Maslej, Loredana Fattorini, Raymond Perrault, Vanessa Parli, Anka Reuel, Erik Brynjolfsson, John Etchemendy,
Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, and Jack Clark,
“The AI Index 2024 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford
University, Stanford, CA, April 2024.
The AI Index 2024 Annual Report by Stanford University is licensed under Attribution-NoDerivatives 4.0 International.
Public Data and Tools

The AI Index 2024 Report is supplemented by raw data and an interactive tool. We invite each reader to use the
data and the tool in a way most relevant to their work and interests.
• Raw data and charts: The public data and high-resolution images of all the charts in the report are
available on Google Drive.
• Global AI Vibrancy Tool: Compare the AI ecosystems of over 30 countries. The Global AI Vibrancy tool
will be updated in the summer of 2024.
AI Index and Stanford HAI

The AI Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI).
The AI Index was conceived within the One Hundred Year Study on Artificial Intelligence (AI100).
The AI Index welcomes feedback and new ideas for next year. Contact us at AI-Index-Report@stanford.edu.
The AI Index acknowledges that while authored by a team of human researchers, its writing process was aided
by AI tools. Specifically, the authors used ChatGPT and Claude to help tighten and copy edit initial drafts.
The workflow involved authors writing the original copy, then utilizing AI tools as part of the editing process.
8
Index Report 2024
Supporting Partners
Analytics and
Research Partners
9
Index Report 2024
Contributors
The AI Index wants to acknowledge the following individuals by chapter and section for their contributions of data,
analysis, advice, and expert commentary included in the AI Index 2024 Report:
Introduction
Loredana Fattorini, Nestor Maslej, Vanessa Parli, Ray Perrault
Chapter 1: Research and Development

Catherine Aiken, Terry Auricchio, Tamay Besiroglu, Rishi Bommasani, Andrew Brown, Peter Cihon, James da Costa,
Ben Cottier, James Cussens, James Dunham, Meredith Ellison, Loredana Fattorini, Enrico Gerding, Anson Ho,
Percy Liang, Nestor Maslej, Greg Mori, Tristan Naumann, Vanessa Parli, Pavlos Peppas, Ray Perrault, Robi Rahman,
Vesna Sablijakovic-Fritz, Jim Schmiedeler, Jaime Sevilla, Autumn Toney, Kevin Xu, Meg Young, Milena Zeithamlova
Chapter 2: Technical Performance

Rishi Bommasani, Emma Brunskill, Erik Brynjolfsson, Emily Capstick, Jack Clark, Loredana Fattorini, Tobi Gertsenberg,
Noah Goodman, Nicholas Haber, Sanmi Koyejo, Percy Liang, Katrina Ligett, Sasha Luccioni, Nestor Maslej,
Juan Carlos Niebles, Sukrut Oak, Vanessa Parli, Ray Perrault, Andrew Shi, Yoav Shoham, Emma Williamson
Chapter 3: Responsible AI
Jack Clark, Loredana Fattorini, Amelia Hardy, Katrina Ligett, Nestor Maslej, Vanessa Parli, Ray Perrault,
Anka Reuel, Andrew Shi
Chapter 4: Economy
Susanne Bieller, Erik Brynjolfsson, Mar Carpanelli, James da Costa, Natalia Dorogi, Heather English, Murat Erer,
Loredana Fattorini, Akash Kaura, James Manyika, Nestor Maslej, Cal McKeever, Julia Nitschke, Layla O’Kane,
Vanessa Parli, Ray Perrault, Brittany Presten, Carl Shan, Bill Valle, Casey Weston, Emma Williamson
Chapter 5: Science and Medicine

Russ Altman, Loredana Fattorini, Remi Lam, Curtis Langlotz, James Manyika, Nestor Maslej, Vanessa Parli,
Ray Perrault, Emma Williamson
10
Index Report 2024
Contributors (cont’d)
Chapter 6: Education
Betsy Bizot, John Etchemendy, Loredana Fattorini, Kirsten Feddersen, Matt Hazenbush, Nestor Maslej,
Vanessa Parli, Ray Perrault, Svetlana Tikhonenko, Laurens Vehmeijer, Hannah Weissman, Stuart Zweben
Chapter 7: Policy and Governance

Alison Boyer, Elif Kiesow Cortez, Rebecca DeCrescenzo, David Freeman Engstrom, Loredana Fattorini,
Philip de Guzman, Mena Hassan, Ethan Duncan He-Li Hellman, Daniel Ho, Simba Jonga, Rohini Kosoglu,
Mark Lemley, Julia Betts Lotufo, Nestor Maslej, Caroline Meinhardt, Julian Nyarko, Jeff Park, Vanessa Parli,
Ray Perrault, Alexandra Rome, Lapo Santarlasci, Sarah Smedley, Russell Wald, Emma Williamson, Daniel Zhang
Chapter 8: Diversity
Betsy Bizot, Loredana Fattorini, Kirsten Feddersen, Matt Hazenbush, Nestor Maslej, Vanessa Parli, Ray Perrault,
Svetlana Tikhonenko, Laurens Vehmeijer, Caroline Weis, Hannah Weissman, Stuart Zweben
Chapter 9: Public Opinion

Maggie Arai, Heather English, Loredana Fattorini, Armin Hamrah, Peter Loewen, Nestor Maslej, Vanessa Parli,
Ray Perrault, Marco Monteiro Silva, Lee Slinger, Bill Valle, Russell Wald
11
Index Report 2024
The AI Index thanks the following organizations and individuals who provided data for inclusion in this year’s report:
Organizations
Center for Research on International Federation of Robotics
Foundation Models Susanne Bieller
Rishi Bommasani, Percy Liang
Lightcast
Center for Security and Emerging Cal McKeever, Julia Nitschke, Layla O’Kane
Technology, Georgetown University
Catherine Aiken, James Dunham, Autumn Toney LinkedIn
Murat Erer, Akash Kaura, Casey Weston
Code.org
Hannah Weissman McKinsey & Company
Natalia Dorogi, Brittany Presten
Computing Research Association
Betsy Bizot, Stuart Zweben Munk School of Global Affairs and Public Policy
Peter Loewen, Lee Slinger
Epoch
Ben Cottier, Robi Rahman Quid
Heather English, Bill Valle
GitHub
Peter Cihon, Kevin Xu Schwartz Reisman Institute for Technology
and Society
Govini Maggie Arai, Marco Monteiro Silva
Alison Boyer, Rebecca DeCrescenzo,
Philip de Guzman, Jeff Park Studyportals
Kirsten Feddersen, Laurens Vehmeijer
Informatics Europe
Svetlana Tikhonenko Women in Machine Learning
Caroline Weis
The AI Index also thanks Jeanina Casusi, Nancy King, Carolyn Lehman, Shana Lynch, Jonathan Mindes, and
Michi Turner for their help in preparing this report; Joe Hinman and Nabarun Mukherjee for their help in
maintaining the AI Index website; and Annie Benisch, Marc Gough, Panos Madamopoulos-Moraris, Kaci Peel,
Drew Spence, Madeline Wright, and Daniel Zhang for their work in helping promote the report.
12
Index Report 2024
Table of Contents
Report Highlights 14
Chapter 1 Research and Development 27
Chapter 2 Technical Performance 73
Chapter 3 Responsible AI 159
Chapter 4 Economy 213
Chapter 5 Science and Medicine 296
Chapter 6 Education 325
Chapter 7 Policy and Governance 366
Chapter 8 Diversity 411
Chapter 9 Public Opinion 435
Appendix 458
ACCESS THE PUBLIC DATA
13
Index Report 2024
Report Highlights
2. More foundation models and more open foundation models. In 2023, a total of 149 foundation
models were released, more than double the amount released in 2022. Of these newly released models, 65.7%
were open-source, compared to only 44.4% in 2022 and 33.3% in 2021.
3. Frontier models get way more expensive. According to AI Index estimates, the training costs of
state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated
$78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute.
5. The number of AI patents skyrockets. From 2021 to 2022, AI patent grants worldwide increased
sharply by 62.7%. Since 2010, the number of granted AI patents has increased more than 31 times.
6. China dominates AI patents. In 2022, China led global AI patent origins with 61.1%, significantly
outpacing the United States, which accounted for 20.9% of AI patent origins. Since 2010, the U.S. share of AI
patents has decreased from 54.1%.
7. Open-source AI research explodes. Since 2011, the number of AI-related projects on GitHub has
seen a consistent increase, growing from 845 in 2011 to approximately 1.8 million in 2023. Notably, there was a
sharp 59.3% rise in the total number of GitHub AI projects in 2023 alone. The total number of stars for AI-related
projects on GitHub also significantly increased in 2023, more than tripling from 4.0 million in 2022 to 12.2 million.
8. The number of AI publications continues to rise. Between 2010 and 2022, the total number of AI
publications nearly tripled, rising from approximately 88,000 in 2010 to more than 240,000 in 2022. The increase
over the last year was a modest 1.1%.
14
Index Report 2024
Report Highlights
benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails
behind on more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.
2. Here comes multimodal AI. Traditionally AI systems have been limited in scope, with language models
excelling in text comprehension but faltering in image processing, and vice versa. However, recent advancements
have led to the development of strong multimodal models, such as Google’s Gemini and OpenAI’s GPT-4. These
models demonstrate flexibility and are capable of handling images and text and, in some instances, can even
process audio.
3. Harder benchmarks emerge. AI models have reached performance saturation on established

benchmarks such as ImageNet, SQuAD, and SuperGLUE, prompting researchers to develop more challenging
ones. In 2023, several challenging new benchmarks emerged, including SWE-bench for coding, HEIM for image
generation, MMMU for general reasoning, MoCa for moral reasoning, AgentBench for agent-based behavior, and
HaluEval for hallucinations.
4. Better AI means better data which means … even better AI. New AI models such as
SegmentAnything and Skoltech are being used to generate specialized data for tasks like image segmentation and
3D reconstruction. Data is vital for AI technical improvements. The use of AI to create more data enhances current
capabilities and paves the way for future algorithmic improvements, especially on harder tasks.
5. Human evaluation is in. With generative models producing high-quality text, images, and more,
benchmarking has slowly started shifting toward incorporating human evaluations like the Chatbot Arena
Leaderboard rather than computerized rankings like ImageNet or SQuAD. Public sentiment about AI is becoming
an increasingly important consideration in tracking AI progress.
6. Thanks to LLMs, robots have become more flexible. The fusion of language modeling with
robotics has given rise to more flexible robotic systems like PaLM-E and RT-2. Beyond their improved robotic
capabilities, these models can ask questions, which marks a significant step toward robots that can interact more
effectively with the real world.
15
Index Report 2024
Chapter 2: Technical Performance (cont’d)

7. More technical research in agentic AI. Creating AI agents, systems capable of autonomous operation
in specific environments, has long challenged computer scientists. However, emerging research suggests that
the performance of autonomous AI agents is improving. Current agents can now master complex games like
Minecraft and effectively tackle real-world tasks, such as online shopping and research assistance.
8. Closed LLMs significantly outperform open ones. On 10 select AI benchmarks, closed models
outperformed open ones, with a median performance advantage of 24.2%. Differences in the performance of
closed and open models carry important implications for AI policy debates.
16
Index Report 2024
Report Highlights
New research from the AI Index reveals a significant lack of standardization in responsible AI reporting. Leading
developers, including OpenAI, Google, and Anthropic, primarily test their models against different responsible AI
benchmarks. This practice complicates efforts to systematically compare the risks and limitations of top AI models.
2. Political deepfakes are easy to generate and difficult to detect. Political deepfakes are already
affecting elections across the world, with recent research suggesting that existing AI deepfake methods perform
with varying levels of accuracy. In addition, new projects like CounterCloud demonstrate how easily AI can create
and disseminate fake content.
3. Researchers discover more complex vulnerabilities in LLMs. Previously, most efforts to

red team AI models focused on testing adversarial prompts that intuitively made sense to humans. This year,
researchers found less obvious strategies to get LLMs to exhibit harmful behavior, like asking the models to
infinitely repeat random words.
4. Risks from AI are becoming a concern for businesses across the globe. A global survey on
responsible AI highlights that companies’ top AI-related concerns include privacy, data security, and reliability.
The survey shows that organizations are beginning to take steps to mitigate these risks. Globally, however, most
companies have so far only mitigated a small portion of these risks.
5. LLMs can output copyrighted material. Multiple researchers have shown that the generative outputs
of popular LLMs may contain copyrighted material, such as excerpts from The New York Times or scenes from
movies. Whether such output constitutes copyright violations is becoming a central legal question.
6. AI developers score low on transparency, with consequences for research. The newly
introduced Foundation Model Transparency Index shows that AI developers lack transparency, especially
regarding the disclosure of training data and methodologies. This lack of openness hinders efforts to further
understand the robustness and safety of AI systems.
17
Index Report 2024
Chapter 3: Responsible AI (cont’d)

7. Extreme AI risks are difficult to analyze. Over the past year, a substantial debate has emerged among
AI scholars and practitioners regarding the focus on immediate model risks, like algorithmic discrimination, versus
potential long-term existential threats. It has become challenging to distinguish which claims are scientifically
founded and should inform policymaking. This difficulty is compounded by the tangible nature of already present
short-term risks in contrast with the theoretical nature of existential threats.
8. The number of AI incidents continues to rise. According to the AI Incident Database, which tracks
incidents related to the misuse of AI, 123 incidents were reported in 2023, a 32.3 percentage point increase from
2022. Since 2013, AI incidents have grown by over twentyfold. A notable example includes AI-generated, sexually
explicit deepfakes of Taylor Swift that were widely shared online.
9. ChatGPT is politically biased. Researchers find a significant bias in ChatGPT toward Democrats in the
United States and the Labour Party in the U.K. This finding raises concerns about the tool’s potential to influence
users’ political views, particularly in a year marked by major global elections.
18
Index Report 2024
Report Highlights
Chapter 4: Economy
1. Generative AI investment skyrockets. Despite a decline in overall AI private investment last year,
funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion. Major players in the generative
AI space, including OpenAI, Anthropic, Hugging Face, and Inflection, reported substantial fundraising rounds.
2. Already a leader, the United States pulls even further ahead in AI private investment.
In 2023, the United States saw AI investments reach $67.2 billion, nearly 8.7 times more than China, the next
highest investor. While private AI investment in China and the European Union, including the United Kingdom,
declined by 44.2% and 14.1%, respectively, since 2022, the United States experienced a notable increase of 22.1%
in the same time frame.
3. Fewer AI jobs in the United States and across the globe. In 2022, AI-related positions made
up 2.0% of all job postings in America, a figure that decreased to 1.6% in 2023. This decline in AI job listings is
attributed to fewer postings from leading AI firms and a reduced proportion of tech roles within these companies.
4. AI decreases costs and increases revenues. A new McKinsey survey reveals that 42% of surveyed
organizations report cost reductions from implementing AI (including generative AI), and 59% report revenue
increases. Compared to the previous year, there was a 10 percentage point increase in respondents reporting
decreased costs, suggesting AI is driving significant business efficiency gains.
5. Total AI private investment declines again, while the number of newly funded AI
companies increases. Global private AI investment has fallen for the second year in a row, though less than
the sharp decrease from 2021 to 2022. The count of newly funded AI companies spiked to 1,812, up 40.6% from
the previous year.
6. AI organizational adoption ticks up. A 2023 McKinsey report reveals that 55% of organizations now
use AI (including generative AI) in at least one business unit or function, up from 50% in 2022 and 20% in 2017.
7. China dominates industrial robotics. Since surpassing Japan in 2013 as the leading installer of
industrial robots, China has significantly widened the gap with the nearest competitor nation. In 2013, China’s
installations accounted for 20.8% of the global total, a share that rose to 52.4% by 2022.
19
Index Report 2024
Chapter 4: Economy (cont’d)

8. Greater diversity in robot installations. In 2017, collaborative robots represented a mere 2.8% of all
new industrial robot installations, a figure that climbed to 9.9% by 2022. Similarly, 2022 saw a rise in service robot
installations across all application categories, except for medical robotics. This trend indicates not just an overall
increase in robot installations but also a growing emphasis on deploying robots for human-facing roles.
9. The data is in: AI makes workers more productive and leads to higher quality work.
In 2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks more
quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge the skill
gap between low- and high-skilled workers. Still, other studies caution that using AI without proper oversight can
lead to diminished performance.
10. Fortune 500 companies start talking a lot about AI, especially generative AI. In 2023,
AI was mentioned in 394 earnings calls (nearly 80% of all Fortune 500 companies), a notable increase from
266 mentions in 2022. Since 2018, mentions of AI in Fortune 500 earnings calls have nearly doubled. The most
frequently cited theme, appearing in 19.7% of all earnings calls, was generative AI.
20
Index Report 2024
Report Highlights
2. AI helps medicine take significant strides forward. In 2023, several significant medical systems
were launched, including EVEscape, which enhances pandemic prediction, and AlphaMissence, which assists in
AI-driven mutation classification. AI is increasingly being utilized to propel medical advancements.
3. Highly knowledgeable medical AI has arrived. Over the past few years, AI systems have shown
remarkable improvement on the MedQA benchmark, a key test for assessing AI’s clinical knowledge. The
standout model of 2023, GPT-4 Medprompt, reached an accuracy rate of 90.2%, marking a 22.6 percentage
point increase from the highest score in 2022. Since the benchmark’s introduction in 2019, AI performance on
MedQA has nearly tripled.
4. The FDA approves more and more AI-related medical devices. In 2022, the FDA approved 139
AI-related medical devices, a 12.1% increase from 2021. Since 2012, the number of FDA-approved AI-related medical
devices has increased by more than 45-fold. AI is increasingly being used for real-world medical purposes.
21
Index Report 2024
Report Highlights
1. The number of American and Canadian CS bachelor’s graduates continues to rise, new
CS master’s graduates stay relatively flat, and PhD graduates modestly grow. While the
number of new American and Canadian bachelor’s graduates has consistently risen for more than a decade, the
number of students opting for graduate education in CS has flattened. Since 2018, the number of CS master’s and
PhD graduates has slightly declined.
2. The migration of AI PhDs to industry continues at an accelerating pace. In 2011, roughly

equal percentages of new AI PhDs took jobs in industry (40.9%) and academia (41.6%). However, by 2022, a
significantly larger proportion (70.7%) joined industry after graduation compared to those entering academia
(20.0%). Over the past year alone, the share of industry-bound AI PhDs has risen by 5.3 percentage points,
indicating an intensifying brain drain from universities into industry.
3. Less transition of academic talent from industry to academia. In 2019, 13% of new AI faculty
in the United States and Canada were from industry. By 2021, this figure had declined to 11%, and in 2022, it
further dropped to 7%. This trend indicates a progressively lower migration of high-level AI talent from industry
into academia.
4. CS education in the United States and Canada becomes less international. Proportionally
fewer international CS bachelor’s, master’s, and PhDs graduated in 2022 than in 2021. The drop in international
students in the master’s category was especially pronounced.
5. More American high school students take CS courses, but access problems remain.
In 2022, 201,000 AP CS exams were administered. Since 2007, the number of students taking these exams has
increased more than tenfold. However, recent evidence indicates that students in larger high schools and those in
suburban areas are more likely to have access to CS courses.
6. AI-related degree programs are on the rise internationally. The number of English-language,
AI-related postsecondary degree programs has tripled since 2017, showing a steady annual increase over the past
five years. Universities worldwide are offering more AI-focused degree programs.
22
Index Report 2024
Chapter 6: Education (cont’d)

7. The United Kingdom and Germany lead in European informatics, CS, CE, and IT
graduate production. The United Kingdom and Germany lead Europe in producing the highest number
of new informatics, CS, CE, and information bachelor’s, master’s, and PhD graduates. On a per capita basis,
Finland leads in the production of both bachelor’s and PhD graduates, while Ireland leads in the production of
master’s graduates.
23
Index Report 2024
Report Highlights
1. The number of AI regulations in the United States sharply increases. The number of AI-related
regulations has risen significantly in the past year and over the last five years. In 2023, there were 25 AI-related
regulations, up from just one in 2016. Last year alone, the total number of AI-related regulations grew by 56.3%.
2. The United States and the European Union advance landmark AI policy action. In 2023,
policymakers on both sides of the Atlantic put forth substantial proposals for advancing AI regulation The
European Union reached a deal on the terms of the AI Act, a landmark piece of legislation enacted in 2024.
Meanwhile, President Biden signed an Executive Order on AI, the most notable AI policy initiative in the United
States that year.
3. AI captures U.S. policymaker attention. The year 2023 witnessed a remarkable increase in AI-related
legislation at the federal level, with 181 bills proposed, more than double the 88 proposed in 2022.
4. Policymakers across the globe cannot stop talking about AI. Mentions of AI in legislative
proceedings across the globe have nearly doubled, rising from 1,247 in 2022 to 2,175 in 2023. AI was mentioned in
the legislative proceedings of 49 countries in 2023. Moreover, at least one country from every continent discussed
AI in 2023, underscoring the truly global reach of AI policy discourse.
5. More regulatory agencies turn their attention toward AI. The number of U.S. regulatory agencies
issuing AI regulations increased to 21 in 2023 from 17 in 2022, indicating a growing concern over AI regulation
among a broader array of American regulatory bodies. Some of the new regulatory agencies that enacted AI-
related regulations for the first time in 2023 include the Department of Transportation, the Department of Energy,
and the Occupational Safety and Health Administration.
24
Index Report 2024
Report Highlights
1. U.S. and Canadian bachelor’s, master’s, and PhD CS students continue to grow more
ethnically diverse. While white students continue to be the most represented ethnicity among new resident
graduates at all three levels, the representation from other ethnic groups, such as Asian, Hispanic, and Black or
African American students, continues to grow. For instance, since 2011, the proportion of Asian CS bachelor’s
degree graduates has increased by 19.8 percentage points, and the proportion of Hispanic CS bachelor’s degree
graduates has grown by 5.2 percentage points.
2. Substantial gender gaps persist in European informatics, CS, CE, and IT graduates at
all educational levels. Every surveyed European country reported more male than female graduates in
bachelor’s, master’s, and PhD programs for informatics, CS, CE, and IT. While the gender gaps have narrowed in
most countries over the last decade, the rate of this narrowing has been slow.
3. U.S. K–12 CS education is growing more diverse, reflecting changes in both gender and
ethnic representation. The proportion of AP CS exams taken by female students rose from 16.8% in 2007 to
30.5% in 2022. Similarly, the participation of Asian, Hispanic/Latino/Latina, and Black/African American students
in AP CS has consistently increased year over year.
25
Index Report 2024
Report Highlights
A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect
their lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness
toward AI products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests
that 52% of Americans report feeling more concerned than excited about AI, rising from 38% in 2022.
2. AI sentiment in Western nations continues to be low, but is slowly improving. In 2022,

several developed Western nations, including Germany, the Netherlands, Australia, Belgium, Canada, and
the United States, were among the least positive about AI products and services. Since then, each of these
countries has seen a rise in the proportion of respondents acknowledging the benefits of AI, with the Netherlands
experiencing the most significant shift.
3. The public is pessimistic about AI’s economic impact. In an Ipsos survey, only 37% of
respondents feel AI will improve their job. Only 34% anticipate AI will boost the economy, and 32% believe it will
enhance the job market.
4. Demographic differences emerge regarding AI optimism. Significant demographic

differences exist in perceptions of AI’s potential to enhance livelihoods, with younger generations generally
more optimistic. For instance, 59% of Gen Z respondents believe AI will improve entertainment options,
versus only 40% of baby boomers. Additionally, individuals with higher incomes and education levels are more
optimistic about AI’s positive impacts on entertainment, health, and the economy than their lower-income and
less-educated counterparts.
5. ChatGPT is widely known and widely used. An international survey from the University of Toronto
suggests that 63% of respondents are aware of ChatGPT. Of those aware, around half report using ChatGPT at
least once weekly.
26
CHAPTER 1:
Index Report 2024
Research and
Development
CHAPTER 1:
Index Report 2024 Research and
Development
Preview
Overview 29 1.4 AI Conferences 66
Chapter Highlights 30 Conference Attendance 66
1.1 Publications 31 1.5 Open-Source AI Software 69

Overview 31 Projects 69
Total Number of AI Publications 31 Stars 71
By Type of Publication 32
By Field of Study 33
By Sector 34
AI Journal Publications 36
AI Conference Publications 37
1.2 Patents 38
AI Patents 38
Overview 38
By Filing Status and Region 39
1.3 Frontier AI Research 45

General Machine Learning Models 45
Overview 45
Sector Analysis 46
National Affiliation 47
Parameter Trends 49
Compute Trends 50
Highlight: Will Models Run Out of Data? 52
Foundation Models 56
Model Release 56
Organizational Affiliation 58
National Affiliation 61
Training Cost 63
Table of Contents 28
CHAPTER 1:
Development
Overview
This chapter studies trends in AI research and development. It begins by examining
trends in AI publications and patents, and then examines trends in notable AI systems and
foundation models. It concludes by analyzing AI conference attendance and open-source
AI software projects.
CHAPTER 1:
Development
Chapter Highlights
2. More foundation models and more open foundation models. In 2023, a total of 149 foundation
models were released, more than double the amount released in 2022. Of these newly released models, 65.7%
were open-source, compared to only 44.4% in 2022 and 33.3% in 2021.
3. Frontier models get way more expensive. According to AI Index estimates, the training costs of
state-of-the-art AI models have reached unprecedented levels. For example, OpenAI’s GPT-4 used an estimated
$78 million worth of compute to train, while Google’s Gemini Ultra cost $191 million for compute.
5. The number of AI patents skyrockets. From 2021 to 2022, AI patent grants worldwide increased
sharply by 62.7%. Since 2010, the number of granted AI patents has increased more than 31 times.
6. China dominates AI patents. In 2022, China led global AI patent origins with 61.1%, significantly
outpacing the United States, which accounted for 20.9% of AI patent origins. Since 2010, the U.S. share of AI
patents has decreased from 54.1%.
7. Open-source AI research explodes. Since 2011, the number of AI-related projects on GitHub has
seen a consistent increase, growing from 845 in 2011 to approximately 1.8 million in 2023. Notably, there was a
sharp 59.3% rise in the total number of GitHub AI projects in 2023 alone. The total number of stars for AI-related
projects on GitHub also significantly increased in 2023, more than tripling from 4.0 million in 2022 to 12.2 million.
8. The number of AI publications continues to rise. Between 2010 and 2022, the total number of AI
publications nearly tripled, rising from approximately 88,000 in 2010 to more than 240,000 in 2022. The increase
over the last year was a modest 1.1%.
Artificial Intelligence Chapter 1: Research and Development
Index Report 2024 1.1 Publications
1.1 Publications
Overview Total Number of AI Publications1
The figures below present the global count of Figure 1.1.1 displays the global count of AI publications.
English- and Chinese-language AI publications from Between 2010 and 2022, the total number of AI
2010 to 2022, categorized by type of affiliation and publications nearly tripled, rising from approximately
cross-sector collaborations. Additionally, this section 88,000 in 2010 to more than 240,000 in 2022. The
details publication data for AI journal articles and increase over the last year was a modest 1.1%.
conference papers.
Number of AI publications in the world, 2010–22

Source: Center for Security and Emerging Technology, 2023 | Chart: 2024 AI Index report
250 242.29
200
Number of AI publications (in thousands)
150
100
50
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.1
1 The data on publications presented this year is sourced from CSET. Both the methodology and data sources used by CSET to classify AI publications have changed since their data was last
featured in the AI Index (2023). As a result, the numbers reported in this year’s section differ slightly from those reported in last year’s edition. Moreover, the AI-related publication data is fully
available only up to 2022 due to a significant lag in updating publication data. Readers are advised to approach publication figures with appropriate caution.
Table of Contents Chapter 1 Preview 31

By Type of Publication
Figure 1.1.2 illustrates the distribution of AI publication journal and conference publications have increased
types globally over time. In 2022, there were roughly at comparable rates. In 2022, there were 2.6 times as
230,000 AI journal articles compared to roughly many conference publications and 2.4 times as many
42,000 conference submissions. Since 2015, AI journal publications as there were in 2015.
Number of AI publications by type, 2010–22

232.67, Journal
200
150
100
41.17, Conference
12.88, Book chapter
50 5.07, Preprint
1.49, Article
0.79, Unknown
0.70, Dissertation
0.57, Book
0.12, Other
0 0.05, Clinical trial
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.22
2 It is possible for an AI publication to be mapped to more than one publication type, so the totals in Figure 1.1.2 do not completely align with those in Figure 1.1.1.

By Field of Study
Figure 1.1.3 examines the total number of AI sevenfold since 2015. Following machine learning, the
publications by field of study since 2010. Machine most published AI fields in 2022 were computer vision
learning publications have seen the most rapid (21,309 publications), pattern recognition (19,841), and
growth over the past decade, increasing nearly process management (12,052).
Number of AI publications by eld of study (excluding Other AI), 2010–22

72.23, Machine learning

70
60
50
40
30
21.31, Computer vision
19.84, Pattern recognition
20 12.05, Process management
10.39, Computer network
9.17, Control theory
8.31, Algorithm
10 7.18, Linguistics
6.83, Mathematical optimization
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.3

By Sector
This section presents the distribution of AI publications (81.1%), maintaining its position as the
publications by sector—education, government, leading global source of AI research over the past
industry, nonprofit, and other—globally and then decade across all regions (Figure 1.1.4 and Figure 1.1.5).
specifically within the United States, China, and the Industry participation is most significant in the United
European Union plus the United Kingdom. In 2022, States, followed by the European Union plus the United
the academic sector contributed the majority of AI Kingdom, and China (Figure 1.1.5).
AI publications (% of total) by sector, 2010–22

80% 81.07%, Education
70%
60%
AI publications (% of total)
50%
40%
30%
20%
10% 7.89%, Industry

6.97%, Government
2.62%, Nonpro t
0% 1.46%, Other
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.4

AI publications (% of total) by sector and geographic area, 2022

75.48%
Education 75.63%
81.75%
14.06%
Industry 9.47%
7.39%
5.60%
Government 9.28%
10.05%
4.87%
United States
Nonpro t 5.62% European Union and United Kingdom
0.80% China
0% 10% 20% 30% 40% 50% 60% 70% 80%

AI publications (% of total)
Figure 1.1.5

AI Journal Publications
Figure 1.1.6 illustrates the total number of AI journal publications from 2010 to 2022. The number of AI journal
publications experienced modest growth from 2010 to 2015 but grew approximately 2.4 times since 2015.
Between 2021 and 2022, AI journal publications saw a 4.5% increase.
Number of AI journal publications, 2010–22

232.67
200
150
100
50
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.6

AI Conference Publications years, climbing from 22,727 in 2020 to 31,629 in

2021, and reaching 41,174 in 2022. Over the last year
Figure 1.1.7 visualizes the total number of AI conference alone, there was a 30.2% increase in AI conference
publications since 2010. The number of AI conference publications. Since 2010, the number of AI
publications has seen a notable rise in the past two conference publications has more than doubled.
Number of AI conference publications, 2010–22

41.17
40
Number of AI conference publications (in thousands)
35
30
25
20
15
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.1.7

Index Report 2024 1.2 Patents
This section examines trends over time in global AI patents, which can reveal important insights into the evolution of
innovation, research, and development within AI. Additionally, analyzing AI patents can reveal how these advancements
are distributed globally. Similar to the publications data, there is a noticeable delay in AI patent data availability, with
2022 being the most recent year for which data is accessible. The data in this section comes from CSET.
1.2 Patents
AI Patents
Overview
Figure 1.2.1 examines the global growth in granted years. For instance, between 2010 and 2014, the total
AI patents from 2010 to 2022. Over the last decade, growth in granted AI patents was 56.1%. However,
there has been a significant rise in the number of AI from 2021 to 2022 alone, the number of AI patents
patents, with a particularly sharp increase in recent increased by 62.7%.
Number of AI patents granted, 2010–22

62.26
60
50
Number of AI patents (in thousands)
40
30
20
10
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.1

By Filing Status and Region

(62,264). Over time, the landscape of AI patent
The following section disaggregates AI patents by
approvals has shifted markedly. Until 2015, a larger
their filing status (whether they were granted or not
proportion of filed AI patents were granted. However,
granted), as well as the region of their publication.
since then, the majority of AI patent filings have not
Figure 1.2.2 compares global AI patents by application been granted, with the gap widening significantly. For
status. In 2022, the number of ungranted AI patents instance, in 2015, 42.2% of all filed AI patents were not
(128,952) was more than double the amount granted granted. By 2022, this figure had risen to 67.4%.
AI patents by application status, 2010–22

128.95, not granted
120
100
Number of AI patents (in thousands)
80
60 62.26, granted
40
20
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.2

The gap between granted and not granted AI (Figure 1.2.3). In recent years, all three geographic
patents is evident across all major patent-originating areas have experienced an increase in both the total
geographic areas, including China, the European number of AI patent filings and the number of
Union and United Kingdom, and the United States patents granted.
AI patents by application status by geographic area, 2010–22

China European Union and United Kingdom United States
Not granted 80.46

80 80 80
Granted
70 70 70
Number of AI patent lings (in thousands)
60 60 60
50 50 50
40 40 40
35.31
30 30 30
20 20 20
15.11
12.08
10 10 10
2.17
0 0 1.17 0
2010 2012 2014 2016 2018 2020 2022 2010 2012 2014 2016 2018 2020 2022 2010 2012 2014 2016 2018 2020 2022
Figure 1.2.3

Figure 1.2.4 showcases the regional breakdown North America led in the number of global AI patents.
of granted AI patents. As of 2022, the bulk of the However, since then, there has been a significant
world’s granted AI patents (75.2%) originated from shift toward an increasing proportion of AI patents
East Asia and the Pacific, with North America being originating from East Asia and the Pacific.
the next largest contributor at 21.2%. Up until 2011,
Granted AI patents (% of world total) by region, 2010–22

80%
75.20%, East Asia and Paci c
70%
Granted AI patents (% of world total)
60%
50%
40%
30%
20% 21.21%, North America
2.33%, Europe and Central Asia

0.68%, Rest of the world
10% 0.23%, South Asia
0.21%, Latin America and the Caribbean
0.12%, Sub-Saharan Africa
0% 0.03%, Middle East and North Africa
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.4

Disaggregated by geographic area, the majority of the world’s granted AI patents are from China (61.1%) and the
United States (20.9%) (Figure 1.2.5). The share of AI patents originating from the United States has declined from
54.1% in 2010.
Granted AI patents (% of world total) by geographic area, 2010–22

61.13%, China
60%
50%
Granted AI patents (% of world total)
40%
30%
20% 20.90%, United States
10%
2.03%, European Union and United Kingdom

0% 0.23%, India
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 1.2.5

Figure 1.2.6 and Figure 1.2.7 document which (Figure 1.2.6). Figure 1.2.7 highlights the change in
countries lead in AI patents per capita. In 2022, the granted AI patents per capita from 2012 to 2022.
country with the most granted AI patents per 100,000 Singapore, South Korea, and China experienced the
inhabitants was South Korea (10.3), followed by greatest increase in AI patenting per capita during
Luxembourg (8.8) and the United States (4.2) that time period.
Granted AI patents per 100,000 inhabitants by country, 2022

South Korea 10.26
Luxembourg 8.73
United States 4.23
Japan 2.53
China 2.51
Singapore 2.06
Australia 1.91
Canada 1.25
Germany 0.66
Denmark 0.56
Finland 0.56
United Kingdom 0.42
New Zealand 0.33
France 0.33
Lithuania 0.28
0 1 2 3 4 5 6 7 8 9 10
Granted AI patents (per 100,000 inhabitants) Figure 1.2.6

Percentage change of granted AI patents per 100,000 inhabitants by country, 2012 vs. 2022
Singapore 5,366%
South Korea 3,801%
China 3,569%
Denmark 1,463%
United States 1,299%
United Kingdom 1,246%
Japan 1,137%
France 1,086%
Germany 961%
Australia 908%
Finland 907%
Canada 803%
New Zealand 387%
0% 500% 1,000% 1,500% 2,000% 2,500% 3,000% 3,500% 4,000% 4,500% 5,000% 5,500%
% change of granted AI patents (per 100,000 inhabitants)
Figure 1.2.7

Index Report 2024 1.3 Frontier AI Research
This section explores the frontier of AI research. While many new AI models are introduced annually, only a small
sample represents the most advanced research. Admittedly what constitutes advanced or frontier research is
somewhat subjective. Frontier research could reflect a model posting a new state-of-the-art result on a benchmark,
introducing a meaningful new architecture, or exercising some impressive new capabilities.
The AI Index studies trends in two types of frontier AI models: “notable models” and foundation models.3 Epoch, an
AI Index data provider, uses the term “notable machine learning models” to designate noteworthy models handpicked
as being particularly influential within the AI/machine learning ecosystem. In contrast, foundation models are
exceptionally large AI models trained on massive datasets, capable of performing a multitude of downstream tasks.
Examples of foundation models include GPT-4, Claude 3, and Gemini. While many foundation models may qualify as
notable models, not all notable models are foundation models.
Within this section, the AI Index explores trends in notable models and foundation models from various perspectives,
including originating organization, country of origin, parameter count, and compute usage. The analysis concludes
with an examination of machine learning training costs.
1.3 Frontier AI Research

General Machine Learning entries based on criteria such as state-of-the-
art advancements, historical significance, or high
Models citation rates. Analyzing these models provides a
Overview comprehensive overview of the machine learning
Epoch AI is a group of researchers dedicated to landscape’s evolution, both in recent years and over
studying and predicting the evolution of advanced the past few decades.4 Some models may be missing
AI. They maintain a database of AI and machine from the dataset; however, the dataset can reveal
learning models released since the 1950s, selecting trends in relative terms.
3 “AI system” refers to a computer program or product based on AI, such as ChatGPT. “AI model” refers to a collection of parameters whose values are learned during training, such as GPT-4.
4 New and historic models are continually added to the Epoch database, so the total year-by-year counts of models included in this year’s AI Index might not exactly match those published in
last year’s report.

Sector Analysis
Until 2014, academia led in the release of machine Creating cutting-edge AI models now demands a
learning models. Since then, industry has taken substantial amount of data, computing power, and
the lead. In 2023, there were 51 notable machine financial resources that are not available in academia.
learning models produced by industry compared to This shift toward increased industrial dominance in
just 15 from academia (Figure 1.3.1). Significantly, 21 leading AI models was first highlighted in last year’s
notable models resulted from industry/academic AI Index report. Although this year the gap has slightly
collaborations in 2023, a new high. narrowed, the trend largely persists.
Number of notable machine learning models by sector, 2003–23

Source: Epoch, 2023 | Chart: 2024 AI Index report
51, Industry
50
Number of notable machine learning models
40
30
21, Industry-academia collaboration

20
15, Academia
10
2, Government
0, Industry–research collective collaboration
0, Research collective
0 0, Academia-government collaboration
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
Figure 1.3.1

National Affiliation Number of notable machine learning models by

geographic area, 2023
To illustrate the evolving geopolitical landscape of Source: Epoch, 2023 | Chart: 2024 AI Index report
AI, the AI Index research team analyzed the country United States 61
of origin of notable models. China 15
Figure 1.3.2 displays the total number of notable France 8
machine learning models attributed to the location Germany 5
of researchers’ affiliated institutions. 5

Canada 4
In 2023, the United States led with 61 notable Israel 4
machine learning models, followed by China with United Kingdom 4
15, and France with 8. For the first time since 2019, Singapore 3
the European Union and the United Kingdom
United Arab Emirates 3
together have surpassed China in the number of
Egypt 2
notable AI models produced (Figure 1.3.3). Since
0 5 10 15 20 25 30 35 40 45 50 55 60
2003, the United States has produced more models Number of notable machine learning models
than other major geographic regions such as the

Figure 1.3.2
United Kingdom, China, and Canada (Figure 1.3.4).
Number of notable machine learning models by

select geographic area, 2003–23
60 61, United States

Number of notable machine learning models
50
40
30
25, European Union and
United Kingdom
20
15, China
10
0
2003
2005
2007
2009
2011
2013
2015
2017
2019
2021
2023
Figure 1.3.3
5 A machine learning model is considered associated with a specific country if at least one author of the paper introducing it has an affiliation with an institution based in that country. In cases
where a model’s authors come from several countries, double counting can occur.

Number of notable machine learning models by geographic area, 2003–23 (sum)

1–10
11–20
21–60
61–100
101–430
Figure 1.3.4

Parameter Trends
Parameters in machine learning models are numerical originate. Parameter counts have risen sharply since
values learned during training that determine how a the early 2010s, reflecting the growing complexity
model interprets input data and makes predictions. of tasks AI models are designed for, the greater
Models trained on more data will usually have more availability of data, improvements in hardware, and
parameters than those trained on less data. Likewise, proven efficacy of larger models. High-parameter
models with more parameters typically outperform models are particularly notable in the industry sector,
those with fewer parameters. underscoring the capacity of companies like OpenAI,
Anthropic, and Google to bear the computational
Figure 1.3.5 demonstrates the parameter count of
costs of training on vast volumes of data.
machine learning models in the Epoch dataset,
categorized by the sector from which the models
Number of parameters of notable machine learning models by sector, 2003–23

Academia Industry Industry-academia Research collective

1T Academia-government Industry–research collective Government
10B
Number of parameters (log scale)
100M
1M
10K
100
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Publication date
Figure 1.3.5

Compute Trends for notable machine learning models in the last

The term “compute” in AI models denotes the 20 years. Recently, the compute usage of notable
computational resources required to train and operate AI models has increased exponentially.6 This
a machine learning model. Generally, the complexity trend has been especially pronounced in the last
of the model and the size of the training dataset five years. This rapid rise in compute demand
directly influence the amount of compute needed. has critical implications. For instance, models
The more complex a model is, and the larger the requiring more computation often have larger
underlying training data, the greater the amount of environmental footprints, and companies typically
compute required for training. have more access to computational resources
Figure 1.3.6 visualizes the training compute required than academic institutions.
Training compute of notable machine learning models by sector, 2003–23

Academia Industry Industry-academia Academia-government

Industry–research collective Government Research collective
10B
Training compute (petaFLOP - log scale)
100M
1M
10K
100
0.01
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Publication date
Figure 1.3.6
6 FLOP stands for “floating-point operation.” A floating-point operation is a single arithmetic operation involving floating-point numbers, such as addition, subtraction, multiplication, or
division. The number of FLOPs a processor or computer can perform per second is an indicator of its computational power. The higher the FLOP rate, the more powerful the computer is.
An AI model with a higher FLOP rate reflects its requirement for more computational resources during training.

Figure 1.3.7 highlights the training compute of notable The original Transformer, released in 2017, required
machine learning models since 2012. For example, around 7,400 petaFLOPs. Google’s Gemini Ultra, one
AlexNet, one of the papers that popularized the now of the current state-of-the-art foundation models,
standard practice of using GPUs to improve AI models, required 50 billion petaFLOPs.
required an estimated 470 petaFLOPs for training.
Training compute of notable machine learning models by domain, 2012–23

100B Language Vision Multimodal Gemini Ultra
GPT-4
10B Claude 2
PaLM (540B)
Llama 2-70B
Training compute (petaFLOP - log scale)
1B Megatron-Turing NLG 530B

GPT-3 175B (davinci)
100M
10M RoBERTa Large
1M
BERT-Large
100K
Transformer
10K
1000 AlexNet
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Publication date
Figure 1.3.7

Highlight:
Will Models Run Out of Data?

As illustrated above, a significant proportion of Projections of ML data exhaustion by stock type:
median and 90% CI dates
recent algorithmic progress, including progress Source: Epoch, 2023 | Table: 2024 AI Index report
behind powerful LLMs, has been achieved by Stock type Historical projection Compute projection
training models on increasingly larger amounts of Low-quality 2032.4 [2028.4; 2039.2] 2040.5 [2034.6; 2048.9]
language stock
data. As noted recently by Anthropic cofounder
High-quality 2024.5 [2023.5; 2025.7] 2024.1 [2023.2; 2025.3]
and AI Index Steering Committee member Jack language stock
Clark, foundation models have been trained on Image stock 2046 [2037; 2062.8] 2038.8 [2032; 2049.8]
meaningful percentages of all the data that has Figure 1.3.8

ever existed on the internet.
The growing data dependency of AI models data, which is data generated by AI models
has led to concerns that future generations of themselves. For example, it is possible to use
computer scientists will run out of data to further text produced by one LLM to train another LLM.
scale and improve their systems. Research from The use of synthetic data for training AI systems
Epoch suggests that these concerns are somewhat is particularly attractive, not only as a solution
warranted. Epoch researchers have generated for potential data depletion but also because
historical and compute-based projections for generative AI systems could, in principle, generate
when AI researchers might expect to run out data in instances where naturally occurring data
of data. The historical projections are based on is sparse—for example, data for rare diseases or
observed growth rates in the sizes of data used to underrepresented populations. Until recently, the
train foundation models. The compute projections feasibility and effectiveness of using synthetic
adjust the historical growth rate based on data for training generative AI systems were not
projections of compute availability. well understood. However, research this year has
For instance, the researchers estimate that suggested that there are limitations associated with
computer scientists could deplete the stock of training models on synthetic data.
high-quality language data by 2024, exhaust low- For instance, a team of British and Canadian
quality language data within two decades, and researchers discovered that models predominantly
use up image data by the late 2030s to mid-2040s trained on synthetic data experience model
(Figure 1.3.8). collapse, a phenomenon where, over time, they
Theoretically, the challenge of limited data lose the ability to remember true underlying data
availability can be addressed by using synthetic distributions and start producing a narrow range of

Highlight:
Will Models Run Out of Data? (cont’d)

outputs. Figure 1.3.9 demonstrates the process of over time, the generations of models trained
model collapse in a variational autoencoder (VAE) predominantly on synthetic data become less
model, a widely used generative AI architecture. varied and are not as widely distributed.
With each subsequent generation trained on
The authors demonstrate that this phenomenon
additional synthetic data, the model produces an
occurs across various model types, including
increasingly limited set of outputs. As illustrated
Gaussian Mixture Models and LLMs. This research
in Figure 1.3.10, in statistical terms, as the number
underscores the continued importance of human-
of synthetic generations increases, the tails of the
generated data for training capable LLMs that can
distributions vanish, and the generation density
produce a diverse array of content.
shifts toward the mean.7 This pattern means that
A demonstration of model collapse in a VAE

Source: Shumailov et al., 2023
Figure 1.3.9
7 In the context of generative models, density refers to the level of complexity and variation in the outputs produced by an AI model. Models that have a higher generation density
produce a wider range of higher-quality outputs. Models with low generation density produce a narrower range of more simplistic outputs.

Highlight:

Convergence of generated data densities in descendant models
Source: Shumailov et al., 2023 | Chart: 2024 AI Index report
1.60
1.40
1.20
1.00
Density
Generation 0
0.80
Generation 1
Generation 2
0.60 Generation 3
Generation 4
Generation 5
0.40
Generation 6
Generation 7
0.20 Generation 8
Generation 9
0.00
−3 −2 −1 0 1 2 3
Figure 1.3.10
In a similar study published in 2023 on the use generated images declines. Figure 1.3.11 highlights
of synthetic data in generative imaging models, the degraded image generations of models that are
researchers found that generative image models augmented with synthetic data; for example, the
trained solely on synthetic data cycles—or with faces generated in steps 7 and 9 increasingly display
insufficient real human data—experience a strange-looking hash marks. From a statistical
significant drop in output quality. The authors perspective, images generated with both synthetic
label this phenomenon Model Autophagy Disorder data and synthetic augmentation loops have higher
(MAD), in reference to mad cow disease. FID scores (indicating less similarity to real images),
lower precision scores (signifying reduced realism
The study examines two types of training processes:
or quality), and lower recall scores (suggesting
fully synthetic, where models are trained exclusively
decreased diversity) (Figure 1.3.12). While synthetic
on synthetic data, and synthetic augmentation,
augmentation loops, which incorporate some real
where models are trained on a mix of synthetic
data, show less degradation than fully synthetic
and real data. In both scenarios, as the number of
loops, both methods exhibit diminishing returns with
training generations increases, the quality of the
further training.

Highlight:

An example of MAD in image-generation models
Source: Alemohammad et al., 2023
Figure 1.3.11
Assessing FFHQ syntheses: FID, precision, and recall in synthetic and mixed-data training loops
Source: Alemohammad et al., 2023 | Chart: 2024 AI Index report
Fully synthetic loop Synthetic augmentation loop
0.70 0.40
20
0.60 0.35
0.30
0.50
15
0.25
0.40
Precision
Recall
FID
0.20
10
0.30
0.15
0.20
0.10
5
0.10 0.05
0 0.00 0.00
0 2 4 6 0 2 4 6 0 2 4 6
Generations Generations Generations
Figure 1.3.12

Model Release
Foundation Models Foundation models can be accessed in different
Foundation models represent a rapidly evolving ways. No access models, like Google’s PaLM-E, are
and popular category of AI models. Trained on vast only accessible to their developers. Limited access
datasets, they are versatile and suitable for numerous models, like OpenAI’s GPT-4, offer limited access to
downstream applications. Foundation models such as the models, often through a public API. Open models,
GPT-4, Claude 3, and Llama 2 showcase remarkable like Meta’s Llama 2, fully release model weights, which
abilities and are increasingly being deployed in real- means the models can be modified and freely used.
world scenarios.
Figure 1.3.13 visualizes the total number of foundation
Introduced in 2023, the Ecosystem Graphs is a new models by access type since 2019. In recent years, the
community resource from Stanford that tracks the number of foundation models has risen sharply, more
foundation model ecosystem, including datasets, than doubling since 2022 and growing by a factor of
models, and applications. This section uses data from nearly 38 since 2019. Of the 149 foundation models
the Ecosystem Graphs to study trends in foundation released in 2023, 98 were open, 23 limited and 28
models over time. 8
no access.
Foundation models by access type, 2019–23

Source: Bommasani et al., 2023 | Chart: 2024 AI Index report
160
Open Limited No access
149
140
120
100 98
Foundation models
80
72
60
32
40 23
12
27
20 9
28 28
4 2 10
0
2019 2020 2021 2022 2023
Figure 1.3.13
8 The Ecosystem Graphs make efforts to survey the global AI ecosystem, but it is possible that they underreport models from certain nations like South Korea and China.

In 2023, the majority of foundation models were released as open access (65.8%), with 18.8% having no access
and 15.4% limited access (Figure 1.3.14). Since 2021, there has been a significant increase in the proportion of
models released with open access.
Foundation models (% of total) by access type, 2019–23

70%
65.77%, Open
60%
Foundation models (% of total)
50%
40%
30%
20%
18.79%, No access
15.44%, Limited
10%
0%
2019 2020 2021 2022 2023
Figure 1.3.14

Organizational Affiliation from industry. Only 18.8% of foundation models in

Figure 1.3.15 plots the sector from which foundation 2023 originated from academia. Since 2019, an ever
models have originated since 2019. In 2023, the larger number of foundation models are coming
majority of foundation models (72.5%) originated from industry.
Number of foundation models by sector, 2019–23

108, Industry
100
80
Number of foundation models
60
40
28, Academia
20
9, Industry-academia collaboration
4, Government
0 0, Industry-government collaboration
2019 2020 2021 2022 2023
Figure 1.3.15

Figure 1.3.16 highlights the source of various foundation models that were released in 2023. Google introduced
the most models (18), followed by Meta (11), and Microsoft (9). The academic institution that released the most
foundation models in 2023 was UC Berkeley (3).
Number of foundation models by organization, 2023

Google 18
Meta 11
Microsoft 9
OpenAI 7
Together 5
Hugging Face 4
Anthropic 4
AI2 4
Stability AI 3
Cerebras 3
Shanghai AI Laboratory 3
Adobe 3
Industry
UC Berkeley 3
Academia
DeepMind 2 Nonpro t
Stanford University 2
0 2 4 6 8 10 12 14 16 18
Figure 1.3.16

Since 2019, Google has led in releasing the most foundation models, with a total of 40, followed by OpenAI with
20 (Figure 1.3.17). Tsinghua University stands out as the top non-Western institution, with seven foundation model
releases, while Stanford University is the leading American academic institution, with five releases.
Number of foundation models by organization, 2019–23 (sum)

Google 40
OpenAI 20
Meta 19
Microsoft 18
DeepMind 15
Tsinghua University 7
EleutherAI 6
Together 6
Cohere 5
Stanford University 5
Hugging Face 5
Anthropic 5
Industry
AI2 4
Academia
BigScience 4 Nonpro t
Shanghai AI Laboratory 4
0 4 8 12 16 20 24 28 32 36 40
Figure 1.3.17

National Affiliation Number of foundation models by geographic area,

2023
Given that foundation models are fairly Source: Bommasani et al., 2023 | Chart: 2024 AI Index report
representative of frontier AI research, from United States 109
a geopolitical perspective, it is important to China 20

United Kingdom 8
understand their national affiliations. Figures 1.3.18,
United Arab Emirates 4
1.3.19, and 1.3.20 visualize the national affiliations Canada 3
of various foundation models. As with the notable Singapore 2
model analysis presented earlier in the chapter, Israel 2

Germany 2
a model is deemed affiliated with a country if a
Finland 2
researcher contributing to that model is affiliated
Taiwan 1
with an institution headquartered in that country. Switzerland 1
Sweden 1
In 2023, most of the world’s foundation models
Spain 1
originated from the United States (109), followed by France 1
China (20), and the United Kingdom (Figure 1.3.18). 0 10 20 30 40 50 60 70 80 90 100 110
Since 2019, the United States has consistently led
in originating the majority of foundation models Figure 1.3.18
(Figure 1.3.19).
Number of foundation models by select geographic

area, 2019–23
109, United States

100
80
60
40
20, China
20 15, European Union and
United Kingdom
2019 2020 2021 2022 2023
Figure 1.3.19

Figure 1.3.20 depicts the cumulative count of foundation models released and attributed to respective countries
since 2019. The country with the greatest number of foundation models released since 2019 is the United States
(182), followed by China (30), and the United Kingdom (21).
Number of foundation models by geographic area, 2019–23 (sum)

1–10
11–30
31–182
Figure 1.3.20

Training Cost models in last year’s publication. This year, the AI

Index has collaborated with Epoch AI, an AI research
A prominent topic in discussions about foundation institute, to substantially enhance and solidify the
models is their speculated costs. While AI robustness of its AI training cost estimates.9 To
companies seldom reveal the expenses involved estimate the cost of cutting-edge models, the Epoch
in training their models, it is widely believed that team analyzed training duration, as well as the type,
these costs run into millions of dollars and are quantity, and utilization rate of the training hardware,
rising. For instance, OpenAI’s CEO, Sam Altman, using information from publications, press releases, or
mentioned that the training cost for GPT-4 was over technical reports related to the models.10
$100 million. This escalation in training expenses
Figure 1.3.21 visualizes the estimated training cost
has effectively excluded universities, traditionally
associated with select AI models, based on cloud
centers of AI research, from developing their own
compute rental prices. AI Index estimates validate
leading-edge foundation models. In response, policy
suspicions that in recent years model training costs
initiatives, such as President Biden’s Executive Order
have significantly increased. For example, in 2017,
on AI, have sought to level the playing field between
the original Transformer model, which introduced the
industry and academia by creating a National AI
architecture that underpins virtually every modern
Research Resource, which would grant nonindustry
LLM, cost around $900 to train.11 RoBERTa Large,
actors the compute and data needed to do higher
released in 2019, which achieved state-of-the-art
level AI-research.
results on many canonical comprehension benchmarks
Understanding the cost of training AI models is like SQuAD and GLUE, cost around $160,000 to train.
important, yet detailed information on these costs Fast-forward to 2023, and training costs for OpenAI’s
remains scarce. The AI Index was among the first to GPT-4 and Google’s Gemini Ultra are estimated to be
offer estimates on the training costs of foundation around $78 million and $191 million, respectively.
9 Ben Cottier and Robi Rahman led research at Epoch AI into model training cost.
10 A detailed description of the estimation methodology is provided in the Appendix.
11 The cost figures reported in this section are inflation-adjusted.

Estimated training cost of select AI models, 2017–23

200M 191,400,000
Training cost (in U.S. dollars)
150M
100M
78,352,034
50M
12,389,056
4,324,883 6,405,653 1,319,586 3,931,897
930 3,288 160,018
0
Transformer
BERT-Large
RoBERTa Large
GPT-3 175B (davinci)
Megatron-Turing NLG 530B
LaMDA
PaLM (540B)
GPT-4
Gemini Ultra
Llama 2 70B
2017 2018 2019 2020 2021 2022 2023
Figure 1.3.21
Figure 1.3.22 visualizes the training cost of all AI models for which the AI Index has estimates. As the figure shows,
model training costs have sharply increased over time.
Estimated training cost of select AI models, 2016–23

Gemini Ultra
GPT-4
100M
Falcon 180B
PaLM (540B)
Training cost (in U.S. dollars - log scale)
10M
GPT-3 175B (davinci) Llama 2 70B
BLOOM-176B
Megatron-BERT HyperCLOVA LLaMA-65B
1M Switch Flamingo
AlphaStar Meta Pseudo Labels PaLI StarCoder
GNMT
RoBERTa Large
100K T0-XXL
JFT Imagen
Xception
BigGAN-deep 512×512
10K Big Transformer for Back-Translation
BERT-Large
Transformer
1000 SciBERT
IMPALA
100
2016 2017 2018 2019 2020 2021 2022 2023

Publication date Figure 1.3.22

As established in previous AI Index reports, there is a direct correlation between the training costs of AI models
and their computational requirements. As illustrated in Figure 1.3.23, models with greater computational training
needs cost substantially more to train.
Estimated training cost and compute of select AI models

Gemini Ultra
GPT-4
100M
PaLM (540B)
Training cost (in U.S. dollars - log scale)
10M
GPT-3 175B (davinci) Megatron-Turing NLG 530B
Llama 2 70B
LaMDA
1M
RoBERTa Large
100K
10K
BERT-Large
Transformer
1000
10K 100K 1M 10M 100M 1B 10B 100B

Training compute (petaFLOP - log scale) Figure 1.3.23

Index Report 2024 1.4 AI Conferences
AI conferences serve as essential platforms for researchers to present their findings and network with peers and
collaborators. Over the past two decades, these conferences have expanded in scale, quantity, and prestige.
This section explores trends in attendance at major AI conferences.
1.4 AI Conferences
Conference Attendance
Figure 1.4.1 graphs attendance at a selection of Specifically, there was a 6.7% rise in total attendance
AI conferences since 2010. Following a decline in over the last year. Since 2015, the annual number of
attendance, likely due to the shift back to exclusively attendees has risen by around 50,000, reflecting not
in-person formats, the AI Index reports an increase just a growing interest in AI research but also the
in conference attendance from 2022 to 2023.12 emergence of new AI conferences.
Attendance at select AI conferences, 2010–23

Source: AI Index, 2023 | Chart: 2024 AI Index report
80
70
Number of attendees (in thousands)
63.29
60
50
40
30
20
10
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.4.1
12 This data should be interpreted with caution given that many conferences in the last few years have had virtual or hybrid formats. Conference organizers report that measuring the exact
attendance numbers at virtual conferences is difficult, as virtual conferences allow for higher attendance of researchers from around the world. The conferences for which the AI Index tracked
data include NeurIPS, CVPR, ICML, ICCV, ICRA, AAAI, ICLR, IROS, IJCAI, AAMAS, FAccT, UAI, ICAPS, and KR.

Neural Information Processing Systems (NeurIPS) AI conferences, NeurIPS, ICML, ICCV, and AAAI
remains one of the most attended AI conferences, experienced year-over-year increases in attendance.
attracting approximately 16,380 participants in 2023 However, in the past year, CVPR, ICRA, ICLR, and IROS
(Figure 1.4.2 and Figure 1.4.3). Among the major observed slight declines in their attendance figures.
Attendance at large conferences, 2010–23

30
25
20
16.38, NeurIPS
15
10 8.34, CVPR
7.92, ICML
7.33, ICCV
6.60, ICRA
4.47, AAAI
5 3.76, ICLR
3.65, IROS
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.4.2

Attendance at small conferences, 2010–23

3.50
3.00
2.50
2.00 1.99, IJCAI
1.50
1.00 0.97, AAMAS

0.83, FAccT
0.50 0.48, UAI

0.31, ICAPS
0.25, KR
0.00
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.4.3

Index Report 2024 1.5 Open-Source AI Software
GitHub is a web-based platform that enables individuals and teams to host, review, and collaborate on code
repositories. Widely used by software developers, GitHub facilitates code management, project collaboration,
and open-source software support. This section draws on data from GitHub providing insights into broader trends in
open-source AI software development not reflected in academic publication data.
1.5 Open-Source AI Software

Projects GitHub AI projects over time. Since 2011, the number
of AI-related GitHub projects has seen a consistent
A GitHub project comprises a collection of files, increase, growing from 845 in 2011 to approximately
including source code, documentation, configuration 1.8 million in 2023.13 Notably, there was a sharp 59.3%
files, and images, that together make up a software rise in the total number of GitHub AI projects in the
project. Figure 1.5.1 looks at the total number of last year alone.
Number of GitHub AI projects, 2011–23

Source: GitHub, 2023 | Chart: 2024 AI Index report
1.81
1.50
Number of AI projects (in millions)
1.00
0.50
0.00
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.1
13 GitHub’s methodology for identifying AI-related projects has evolved over the past year. For classifying AI projects, GitHub has started incorporating generative AI keywords from a
recently published research paper, a shift from the previously detailed methodology in an earlier paper. This edition of the AI Index is the first to adopt this updated approach. Moreover, the
previous edition of the AI Index utilized country-level mapping of GitHub AI projects conducted by the OECD, which depended on self-reported data—a method experiencing a decline in
coverage over time. This year, the AI Index has adopted geographic mapping from GitHub, leveraging server-side data for broader coverage. Consequently, the data presented here may not
align perfectly with data in earlier versions of the report.

Figure 1.5.2 reports GitHub AI projects by geographic followed closely by the European Union and the
area since 2011. As of 2023, a significant share United Kingdom at 17.9%. Notably, the proportion of AI
of GitHub AI projects were located in the United projects from developers located in the United States
States, accounting for 22.9% of contributions. India on GitHub has been on a steady decline since 2016.
was the second-largest contributor with 19.0%,
GitHub AI projects (% of total) by geographic area, 2011–23

60%
50%
AI projects (% of total)
40%
30%
22.93%, United States

20% 19.01%, India
17.93%, European Union and United Kingdom
10%
3.04%, China
0%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.2

Stars
a platform that offers a variety of tools for computer
GitHub users can show their interest in a repository
vision, such as object detection and feature extraction.
by “starring” it, a feature similar to liking a post on
social media, which signifies support for an open- The total number of stars for AI-related projects on
source project. Among the most starred repositories GitHub saw a significant increase in the last year, more
are libraries such as TensorFlow, OpenCV, Keras, and than tripling from 4.0 million in 2022 to 12.2 million in
PyTorch, which enjoy widespread popularity among 2023 (Figure 1.5.3). This sharp increase in GitHub stars,
software developers in the AI coding community. For along with the previously reported rise in projects,
example, TensorFlow is a popular library for building underscores the accelerating growth of open-source
and deploying machine learning models. OpenCV is AI software development.
Number of GitHub stars in AI projects, 2011–23

12 12.21
10
Number of GitHub stars (in millions)
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.3

In 2023, the United States led in receiving the China, and India, saw a year-over-year increase in
highest number of GitHub stars, totaling 10.5 million the total number of GitHub stars awarded to projects
(Figure 1.5.4). All major geographic regions sampled, located in their countries.
including the European Union and United Kingdom,
Number of GitHub stars by geographic area, 2011–23

10.45, United States

10
Number of cumulative GitHub stars (in millions)
8 7.86, Rest of the world
4.53, European Union and United Kingdom

4
2.12, China
2 1.92, India
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 1.5.4

Artificial Intelligence CHAPTER 2:
Index Report 2024
Technical
Performance
CHAPTER 2:
Index Report 2024 Technical
Performance
Preview
Overview 76 Editing 100
Chapter Highlights 77 EditVal 100
Highlighted Research: ControlNet 101
2.1 Overview of AI in 2023 78 Highlighted Research: Instruct-NeRF2NeRF 103
Timeline: Significant Model Releases 78 Segmentation 105
State of AI Performance 81 Highlighted Research: Segment Anything 105
AI Index Benchmarks 82 3D Reconstruction From Images 107
Highlighted Research: Skoltech3D 107
2.2 Language 85 Highlighted Research: RealFusion 108
Understanding 86
HELM: Holistic Evaluation of 2.5 Video Computer Vision and
Language Models 86 Video Generation 109
MMLU: Massive Multitask Generation 109
Language Understanding 87 UCF101 109
Generation 88 Highlighted Research: Align Your Latents 110
Chatbot Arena Leaderboard 88 Highlighted Research: Emu Video 111
Factuality and Truthfulness 90
TruthfulQA 90 2.6 Reasoning 112
HaluEval 92 General Reasoning 112
MMMU: A Massive Multi-discipline
2.3 Coding 94 Multimodal Understanding and Reasoning
Generation 94 Benchmark for Expert AGI 112
HumanEval 94 GPQA: A Graduate-Level Google-Proof

Q&A Benchmark 115
SWE-Bench 95
Highlighted Research: Comparing Humans,
GPT-4, and GPT-4V on Abstraction and
2.4 Image Computer Vision and Reasoning Tasks 116
Image Generation 96 Mathematical Reasoning 117
Generation 96 GSM8K 117
HEIM: Holistic Evaluation of MATH 119
Text-to-Image Models 97
PlanBench 120
Highlighted Research: MVDream 98
Visual Reasoning 121
Instruction-Following 99
Visual Commonsense Reasoning (VCR) 121
VisIT-Bench 99
CHAPTER 2:
Performance
Preview (cont’d)
Moral Reasoning 122 2.11 Properties of LLMs 141
MoCa 122 Highlighted Research:
Causal Reasoning 124 Challenging the Notion of Emergent Behavior 141
BigToM 124 Highlighted Research:

Changes in LLM Performance Over Time 143
Highlighted Research:
Tübingen Cause-Effect Pairs 126 Highlighted Research:
LLMs Are Poor Self-Correctors 145
2.7 Audio 127 Closed vs. Open Model Performance 146
Generation 127
Highlighted Research: UniAudio 128 2.12 Techniques for LLM Improvement 148
Highlighted Research: Prompting 148
MusicGEN and MusicLM 129 Highlighted Research:
Graph of Thoughts Prompting 148
2.8 Agents 131 Highlighted Research:

Optimization by PROmpting (OPRO) 150
General Agents 131
Fine-Tuning 151
AgentBench 131
Highlighted Research: QLoRA 151
Highlighted Research: Voyageur 133
Attention 152
Task-Specific Agents 134
Highlighted Research: Flash-Decoding 152
MLAgentBench 134
2.13 Environmental Impact of AI Systems 154

2.9 Robotics 135
General Environmental Impact 154
Highlighted Research: PaLM-E 135
Training 154
Highlighted Research: RT-2 137
Inference 156
Positive Use Cases 157
2.10 Reinforcement Learning 138
Reinforcement Learning from Human Feedback 138
Highlighted Research: RLAIF 139
Direct Preference Optimization 140
CHAPTER 2:
Performance
Overview
The technical performance section of this year’s AI Index offers a comprehensive overview
of AI advancements in 2023. It starts with a high-level overview of AI technical performance,
tracing its broad evolution over time. The chapter then examines the current state of a wide
range of AI capabilities, including language processing, coding, computer vision (image and
video analysis), reasoning, audio processing, autonomous agents, robotics, and reinforcement
learning. It also shines a spotlight on notable AI research breakthroughs from the past year,
exploring methods for improving LLMs through prompting, optimization, and fine-tuning, and
wraps up with an exploration of AI systems’ environmental footprint.
CHAPTER 2:
Performance
Chapter Highlights
benchmarks, including some in image classification, visual reasoning, and English understanding. Yet it trails behind on
more complex tasks like competition-level mathematics, visual commonsense reasoning and planning.
2. Here comes multimodal AI. Traditionally AI systems have been limited in scope, with language models
excelling in text comprehension but faltering in image processing, and vice versa. However, recent advancements have
led to the development of strong multimodal models, such as Google’s Gemini and OpenAI’s GPT-4. These models
demonstrate flexibility and are capable of handling images and text and, in some instances, can even process audio.
3. Harder benchmarks emerge. AI models have reached performance saturation on established benchmarks
such as ImageNet, SQuAD, and SuperGLUE, prompting researchers to develop more challenging ones. In 2023, several
challenging new benchmarks emerged, including SWE-bench for coding, HEIM for image generation, MMMU for general
reasoning, MoCa for moral reasoning, AgentBench for agent-based behavior, and HaluEval for hallucinations.
4. Better AI means better data which means … even better AI. New AI models such as SegmentAnything
and Skoltech are being used to generate specialized data for tasks like image segmentation and 3D reconstruction. Data is
vital for AI technical improvements. The use of AI to create more data enhances current capabilities and paves the way for
future algorithmic improvements, especially on harder tasks.
5. Human evaluation is in. With generative models producing high-quality text, images, and more, benchmarking
has slowly started shifting toward incorporating human evaluations like the Chatbot Arena Leaderboard rather than
computerized rankings like ImageNet or SQuAD. Public feeling about AI is becoming an increasingly important
consideration in tracking AI progress.
6. Thanks to LLMs, robots have become more flexible. The fusion of language modeling with robotics has
given rise to more flexible robotic systems like PaLM-E and RT-2. Beyond their improved robotic capabilities, these models
can ask questions, which marks a significant step toward robots that can interact more effectively with the real world.
7. More technical research in agentic AI. Creating AI agents, systems capable of autonomous operation
in specific environments, has long challenged computer scientists. However, emerging research suggests that the
performance of autonomous AI agents is improving. Current agents can now master complex games like Minecraft and
effectively tackle real-world tasks, such as online shopping and research assistance.
8. Closed LLMs significantly outperform open ones. On 10 select AI benchmarks, closed models
outperformed open ones, with a median performance advantage of 24.2%. Differences in the performance of closed and
open models carry important implications for AI policy debates.
Artificial Intelligence Chapter 2: Technical Performance
Index Report 2024 2.1 Overview of AI in 2023
The technical performance chapter begins with a high-level overview of significant model releases in 2023 and reviews
the current state of AI technical performance.
2.1 Overview of AI in 2023

Timeline: Significant Model Releases
As chosen by the AI Index Steering Committee, here are some of the most notable model releases of 2023.
Date Model Type Creator(s) Significance Image

Mar. 14, 2023 Claude Large language Anthropic Claude is the first
model publicly released LLM
from Anthropic, one of
OpenAI’s main rivals.
Claude is designed to be
Figure 2.1.1
as helpful, honest, and Source: Anthropic, 2023
harmless as possible.
Mar. 14, 2023 GPT-4 Large language OpenAI GPT-4, improving

model over GPT-3, is among
the most powerful
and capable LLMs to
date and surpasses Figure 2.1.2
Source: Medium, 2023
human performance on
numerous benchmarks.
Mar. 23, 2023 Stable Text-to-image Stability AI Stable Diffusion v2 is an

Diffusion v2 model upgrade of Stability AI’s
existing text-to-image
model and produces
higher-resolution,
superior-quality images.
Figure 2.1.3
Source: Stability AI, 2023
Apr. 5, 2023 Segment Image Meta Segment Anything is

Anything segmentation an AI model capable
of isolating objects in
images using zero-shot
generalization.
Figure 2.1.4
Source: Meta, 2023


Jul. 18, 2023 Llama 2 Large language Meta Llama 2, an updated
model version of Meta’s flagship
LLM, is open-source. Its
smaller variants (7B and
13B) deliver relatively
Figure 2.1.5
high performance for Source: Meta, 2023
their size.
Aug. 20, 2023 DALL-E 3 Image generation OpenAI DALL-E 3 is an improved
version of OpenAI’s
existing text-to-vision
model DALL-E.
Figure 2.1.6
Source: OpenAI, 2023
Aug. 29, 2023 SynthID Watermarking Google, SynthID is a tool for

DeepMind watermarking AI-
generated music and
images. Its watermarks
remain detectable even
after image alterations.
Figure 2.1.7
Source: DeepMind, 2023
Sep. 27, 2023 Mistral 7B Large language Mistral AI Mistral 7B, launched
model by French AI company
Mistral, is a compact 7
billion parameter model
that surpasses Llama Figure 2.1.8
Source: Mistral AI, 2023
2 13B in performance,
ranking it top in its class
for size.
Oct. 27, 2023 Ernie 4.0 Large language Baidu Baidu, a multinational
model Chinese technology
company, has launched
Ernie 4.0, which is
among the highest- Figure 2.1.9
performing Chinese Source: PR Newswire, 2023
LLMs to date.
Nov. 6, 2023 GPT-4 Turbo Large language OpenAI GPT-4 Turbo is an

model upgraded large
language model
boasting a 128K context
window and reduced
Figure 2.1.10
pricing. Source: Tech.co, 2023


Nov. 6, 2023 Whisper v3 Speech-to-text OpenAI Whisper v3 is an open-
source speech-to-text
model known for its
increased accuracy
and extended language
Figure 2.1.11
support. Source: AI Business, 2023
Nov. 21, 2023 Claude 2.1 Large language Anthropic Anthropic’s latest LLM,
model Claude 2.1, features an
industry-leading 200K
context window, which
enhances its capacity
Figure 2.1.12
to process extensive Source: Medium, 2023
content such as lengthy
literary works.
Nov. 22, 2023 Inflection-2 Large language Inflection Inflection-2 is the

model second LLM from the
new startup Inflection,
founded by DeepMind’s
Mustafa Suleyman.
Inflection-2’s launch
underscores the Figure 2.1.13
Source: Inflection, 2023
intensifying competition
in the LLM arena.
Dec. 6, 2023 Gemini Large language Google Gemini emerges as a

formidable competitor
model
to GPT-4, with one of its
variants, Gemini Ultra,
outshining GPT-4 on
Figure 2.1.14
numerous benchmarks. Source: Medium, 2023
Dec. 21, 2023 Midjourney Text-to-image Midjourney Midjourney’s latest

v6 model update enhances user
experience with more
intuitive prompts and
superior image quality.
Figure 2.1.15
Source: Bootcamp, 2023

State of AI Performance Over the years, AI has surpassed human baselines on

a handful of benchmarks, such as image classification
As of 2023, AI has achieved levels of performance
in 2015, basic reading comprehension in 2017, visual
that surpass human capabilities across a range of
reasoning in 2020, and natural language inference in
tasks. Figure 2.1.16 illustrates the progress of AI
2021. As of 2023, there are still some task categories
systems relative to human baselines for nine AI
where AI fails to exceed human ability. These tend
benchmarks corresponding to nine tasks (e.g., image
to be more complex cognitive tasks, such as visual
classification or basic-level reading comprehension).1
commonsense reasoning and advanced-level
The AI Index team selected one benchmark to
mathematical problem-solving (competition-level
represent each task.
math problems).
Select AI Index technical performance benchmarks vs. human performance

120%
Human baseline
100%
Performance relative to the human baseline (%)
80%
60%
40%
20%
0%
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Image classi cation (ImageNet Top-5) Visual reasoning (VQA)

Visual commonsense reasoning (VCR) English language understanding (SuperGLUE)
Natural language inference (aNLI) Basic-level reading comprehension (SQuAD 1.1)
Medium-level reading comprehension (SQuAD 2.0) Competition-level mathematics (MATH)
Multitask language understanding (MMLU)
Figure 2.1.162
1 An AI benchmark is a standardized test used to evaluate the performance and capabilities of AI systems on specific tasks. For example, ImageNet is a canonical AI benchmark that features
a large collection of labeled images, and AI systems are tasked with classifying these images accurately. Tracking progress on benchmarks has been a standard way for the AI community to
monitor the advancement of AI systems.
2 In Figure 2.1.16, the values are scaled to establish a standard metric for comparing different benchmarks. The scaling function is calibrated such that the performance of the best model for
each year is measured as a percentage of the human baseline for a given task. A value of 105% indicates, for example, that a model performs 5% better than the human baseline.

AI Index Benchmarks Due to saturation, several benchmarks featured

in the 2023 AI Index have been omitted from this
An emerging theme in AI technical performance,
year’s report. Figure 2.1.17 highlights a selection of
as emphasized in last year’s report, is the observed
benchmarks that were included in the 2023 edition
saturation on many benchmarks, such as ImageNet,
but not featured in this year’s report.4 It also shows
used to assess the proficiency of AI models.
the improvement on these benchmarks since 2022.
Performance on these benchmarks has stagnated
“NA” indicates no improvement was noted.
in recent years, indicating either a plateau in AI
capabilities or a shift among researchers toward more
complex research challenges.3
A selection of deprecated benchmarks from the 2023 AI Index report

Source: AI Index, 2024
Benchmark Task category Year introduced Improvement from 2022
Abductive Natural Language Inference (aNLI) Natural language inference 2019 NA

arXiv Text summarization 2003 NA
Cityscapes Challenge Semantic segmentation 2016 0.23%
ImageNet Image classi cation 2009 1.54%
Kinetics-400 Activity recognition 2017 NA
Kvasir-SEG Medical image segmentation 2019 1.90%
MPII Human pose estimation 2014 NA
PubMed Text summarization 2008 NA
SST-5 Fine-Grained Classi cation Sentiment analysis 2013 NA
STL-10 Image generation 2011 NA
SuperGLUE English language understanding 2019 NA
Visual Question Answering Challenge (VQA) Visual reasoning 2017 NA
VoxCeleb Speech recognition 2017 NA
Figure 2.1.17
3 Benchmarks can also saturate or see limited improvement because the problem created is hard and the corresponding performance fails to improve. The issue of benchmark saturation
discussed in this section refers more to benchmarks where performance reaches a close-to-perfection level on which it is difficult to improve.
4 For brevity, Figure 2.1.17 highlights a selection of deprecated benchmarks. Additional benchmarks that were deprecated either because there was saturation, no new state-of-the-art score
was documented, or research focus shifted away from the benchmark include: Celeb-DF (deepfake detection), CIFAR-10 (image classification), NIST FRVT (facial recognition), and Procgen
(reinforcement learning).

Figure 2.1.18 illustrates the year-over-year increases relatively soon after they are introduced,
improvement, in percent, on a selection of then the improvement slows. In the last few years,
benchmarks featured in the 2023 AI Index report. many of these benchmarks have shown little or no
Most benchmarks see significant performance improvement.
Year-over-year improvement over time on select AI Index technical performance benchmarks

25%
20%
Year-over-year improvement (%)
15%
10%
5% 4.47%, Kinetics-400
3.02%, COCO
1.84%, VQA
1.54%, ImageNet Top 1
0.23%, Cityscapes
0% 0.11%, SuperGLUE
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 2.1.18

In response to benchmark saturation, AI researchers including those for tasks in coding, advanced
are pivoting away from traditional benchmarks and reasoning, and agentic behavior—areas that were
testing AI on more difficult challenges. The 2024 AI underrepresented in previous versions of the report
Index tracks progress on several new benchmarks (Figure 2.1.19).5
New benchmarks featured in the 2024 AI Index report

Benchmark Task category Year introduced
AgentBench Agent-based behavior 2023

BigTom Causal reasoning 2023
Chatbot Arena Leaderboard General language 2023
EditVal Image editing 2023
GPQA General reasoning 2023
GSM8K Mathematical reasoning 2021
HEIM Image generation 2023
HELM General language 2021
HaluEval Factuality 2023
HumanEval Coding 2021
MATH Mathematical reasoning 2021
MLAgentBench Agent-based behavior 2023
MMMU General reasoning 2023
MoCa Moral reasoning 2023
PlanBench Planning 2023
SWE-bench Coding 2023
TruthfulQA Factuality 2021
VisIT-Bench Image instruction-following 2023
Figure 2.1.19
5 This report includes an Appendix with details regarding the sourcing of new benchmarks featured in this chapter.

Index Report 2024 2.2 Language
2.2 Language A sample output from GPT-4

Natural language processing (NLP) enables

computers to understand, interpret,
generate, and transform text. Current state-
of-the-art models, such as OpenAI’s GPT-4
and Google’s Gemini, are able to generate
fluent and coherent prose and display high
levels of language understanding ability
(Figure 2.2.1). Many of these models can also
now handle different input forms, such as
images and audio (Figure 2.2.2).
Figure 2.2.1
Gemini handling image and audio inputs

Source: Google, 2024
Figure 2.2.2

Understanding In 2022, Stanford researchers introduced HELM

(Holistic Evaluation of Language Models), designed
English language understanding challenges AI systems to evaluate LLMs across diverse scenarios,
to understand the English language in various ways including reading comprehension, language
such as reading comprehension and logical reasoning. understanding, and mathematical reasoning.6
HELM assesses models from several leading
HELM: Holistic Evaluation of Language Models
companies like Anthropic, Google, Meta, and
As illustrated above, in recent years LLMs have
OpenAI, and uses a “mean win rate” to track
surpassed human performance on traditional English-
average performance across all scenarios. As of
language benchmarks, such as SQuAD (question
January 2024, GPT-4 leads the aggregate HELM
answering) and SuperGLUE (language understanding).
leaderboard with a mean win rate of 0.96 (Figure
This rapid advancement has led to the need for more
2.2.3); however, different models top different task
comprehensive benchmarks.
categories (Figure 2.2.4).7
HELM: mean win rate Leaders on individual HELM sub-benchmarks

Source: CRFM, 2023 | Chart: 2024 AI Index report Source: CRFM, 2023 | Table: 2024 AI Index report
1.00 0.96
Task Leading model Score
0.83 0.82
0.78 0.78 0.77 GSM8K - EM GPT-4 (0613) 0.93
0.80 0.73 0.72
0.69 0.68 LegalBench - EM GPT-4 (0613) 0.71
Mean win rate
0.60 MATH - Equivalent (CoT) GPT-4 Turbo (1106 preview) 0.86

MMLU - EM GPT-4 (0613) 0.74
0.40 MedQA - EM GPT-4 Turbo (1106 preview) 0.82

NarrativeQA - F1 Yi (34B) 0.78
0.20 NaturalQuestions (closed-book) - Llama 2 (70B) 0.46
F1
0.00 NaturalQuestions (open-book) - F1 PaLM-2 (Bison) 0.81
GPT-4 (0613)
GPT-4 Turbo (1106 preview)
Palmyra X V3 (72B)
Palmyra X V2 (33B)
PaLM-2 (Unicorn)
Yi (34B)
Mixtral (8×7B 32K seqlen)
Anthropic Claude v1.3
PaLM-2 (Bison)
Anthropic Claude 2.0
OpenbookQA - EM GPT-4 (0613) 0.96

WMT 2014 - BLEU-4 Palmyra X V3 (72B) 0.26
Figure 2.2.4
Figure 2.2.3
6 HELM evaluates 10 scenarios: (1) NarrativeQA (reading comprehension), (2) Natural Questions (closed-book) (closed-book short-answer question answering), (3) Natural Questions
(open-book) (open-book short-answer question answering), (4) OpenBookQA (commonsense question answering), (5) MMLU (multisubject understanding), (6) GSM8K (grade school
math), (7) MATH (competition math), (8) LegalBench (legal reasoning), (9) MedQA (medical knowledge), and (10) WMT 2014 (machine translation).
7 There are several versions of HELM. This section reports the score on HELM Lite, Release v1.0.0 (2023-12-19), with the data having been collected in January 2024.

MMLU: Massive Multitask Language In early 2023, GPT-4 posted a state-of-the-art score
Understanding on MMLU, later surpassed by Google’s Gemini Ultra.
The Massive Multitask Language Understanding Figure 2.2.6 highlights the top model scores on the
(MMLU) benchmark assesses model performance in MMLU benchmark in different years. The scores
zero-shot or few-shot scenarios across 57 subjects, reported are the averages across the test set. As of
including the humanities, STEM, and social sciences January 2024, Gemini Ultra holds the top score of
(Figure 2.2.5). MMLU has emerged as a premier 90.0%, marking a 14.8 percentage point improvement
benchmark for assessing LLM capabilities: Many state- since 2022 and a 57.6 percentage point increase since
of-the-art models like GPT-4, Claude 2, and Gemini have MMLU’s inception in 2019. Gemini Ultra’s score was
been evaluated against MMLU. the first to surpass MMLU’s human baseline of 89.8%.
A sample question from MMLU

Source: Hendrycks et al., 2021
Figure 2.2.5
MMLU: average accuracy

Source: Papers With Code, 2023 | Chart: 2024 AI Index report
89.8%, human baseline

90% 90.04%
80%
70%
Average accuracy (%)
60%
50%
40%
30%
2019 2020 2021 2022 2023
Figure 2.2.6

Generation the Chatbot Arena Leaderboard is one of the

In generation tasks, AI models are tested on their ability first comprehensive evaluations of public LLM
to produce fluent and practical language responses. preference. The leaderboard allows users to query
two anonymous models and vote for the preferred
Chatbot Arena Leaderboard generations (Figure 2.2.7). As of early 2024, the
The rise of capable LLMs has made it increasingly platform has garnered over 200,000 votes, and
important to understand which models are users ranked OpenAI’s GPT-4 Turbo as the most
preferred by the general public. Launched in 2023, preferred model (Figure 2.2.8).
A sample model response on the Chatbot Arena Leaderboard

Source: Chatbot Arena Leaderboard, 2024
Figure 2.2.7

LMSYS Chatbot Arena for LLMs: Elo rating

Source: Hugging Face, 2024 | Chart: 2024 AI Index report
1,260
1,240
1,220
1,200
Elo rating
1,180
1,160
1,140
1,120
1,100
GP Cla Ge Mi Cla Cla Mi GP GP GP
T-3 ud mi xtr ud ud str T-4 T-4 T-4
.5- e -2. ni al- e -2. e-1 al -06 -03 -Tu
Tu 1 Pro 8× 0 Me 13 14 rb
rb (De 7b diu o
o-0 -In m
613 v) str
uc
t-v
0.1
Model Figure 2.2.8

Factuality and Truthfulness Sample TruthfulQA questions

Source: Lin, Hilton, and Evans, 2022
Despite remarkable achievements, LLMs remain

susceptible to factual inaccuracies and content
hallucination—creating seemingly realistic, yet false,
information. The presence of real-world instances
where LLMs have produced hallucinations—in
court cases, for example—underscores the growing
necessity of closely monitoring trends in LLM
factuality.
TruthfulQA
Introduced at ACL 2022, TruthfulQA is a benchmark
designed to evaluate the truthfulness of LLMs in
generating answers to questions. This benchmark
comprises approximately 800 questions across 38
categories, including health, politics, and finance.
Figure 2.2.9
Many questions are crafted to challenge commonly
held misconceptions, which typically lead humans to
answer incorrectly (Figure 2.2.9). Although one of the
observations of the paper is that larger models tend to
be less truthful, GPT-4 (RLHF) released in early 2024,
has achieved the highest performance thus far on the
TruthfulQA benchmark, with a score of 0.6 (Figure
2.2.10). This score is nearly three times higher than that
of a GPT-2-based model tested in 2021, indicating that
LLMs are becoming progressively better at providing
truthful answers.

Multiple-choice task on TruthfulQA: MC1

Source: Papers with Code, 2023 | Chart: 2024 AI Index report
0.60
0.59
0.55
0.50
MC1 ↑
0.45
0.40
0.35
0.30
2021 2022 2023
Figure 2.2.10

HaluEval A generated hallucinated QA example and a

human-labeled ChatGPT response for a user query
As previously mentioned, LLMs are prone to Source: Li et al., 2023
hallucinations, a concerning trait given their

widespread deployment in critical fields such as law
and medicine. While existing research has aimed to
understand the causes of hallucinations, less effort has
been directed toward assessing the frequency of LLM
hallucinations and identifying specific content areas
where they are especially vulnerable.
HaluEval, introduced in 2023, is a new benchmark

designed to assess hallucinations in LLMs. It includes
over 35,000 samples, both hallucinated and normal,
for analysis and evaluation by LLMs (Figure 2.2.11).
The research indicates that ChatGPT fabricates
unverifiable information in approximately 19.5%
of its responses, with these fabrications spanning
a variety of topics such as language, climate, and
technology. Furthermore, the study examines how
well current LLMs can detect hallucinations. Figure
2.2.12 illustrates the performance of leading LLMs
Figure 2.2.11
in identifying hallucinations across various tasks,
including question answering, knowledge-grounded
dialogue, and text summarization. The findings reveal
that many LLMs struggle with these tasks, highlighting
that hallucination is a significant ongoing issue.

HaluEval hallucination classi cation accuracy

Source: Li et al., 2023 | Table: 2024 AI Index report
Models QA Dialogue Summarization General
ChatGPT (2022) 62.59% 72.40% 58.53% 79.44%

Claude 2 (2023) 69.78% 64.73% 57.75% 75.00%
Claude (2023) 67.60% 64.83% 53.76% 73.88%
Davinci002 (2022) 60.05% 60.81% 47.77% 80.42%
Davinci003 (2022) 49.65% 68.37% 48.07% 80.40%
GPT-3 (2020) 49.21% 50.02% 51.23% 72.72%
Llama 2 (2023) 49.60% 43.99% 49.55% 20.46%
ChatGLM (2023) 47.93% 44.41% 48.57% 30.92%
Falcon (2023) 39.66% 29.08% 42.71% 18.98%
Vicuna (2023) 60.34% 46.35% 45.62% 19.48%
Alpaca (2023) 6.68% 17.55% 20.63% 9.54%
Figure 2.2.12

Index Report 2024 2.3 Coding
Coding involves the generation of instructions that computers can follow to perform tasks. Recently, LLMs have
become proficient coders, serving as valuable assistants to computer scientists. There is also increasing evidence that
many coders find AI coding assistants highly useful.
2.3 Coding
Generation in 2022 (Figure 2.3.2). Since 2021, performance on
HumanEval has increased 64.1 percentage points.
On many coding tasks, AI models are challenged
to generate usable code or to solve computer
science problems.
Sample HumanEval problem

HumanEval Source: Chen et al., 2023
HumanEval, a benchmark for evaluating AI systems’

coding ability, was introduced by OpenAI researchers
in 2021. It consists of 164 challenging handwritten
programming problems (Figure 2.3.1). A GPT-4
model variant (AgentCoder) currently leads in
HumanEval performance, scoring 96.3%, which is a
11.2 percentage point increase from the highest score
Figure 2.3.1
HumanEval: Pass@1
100%
96.30%
80%
60%
Pass@1
40%
20%
0%
2021 2022 2023
Figure 2.3.2

Index Report 2024 2.3 Coding
SWE-bench
As AI systems’ coding capabilities improve, it has multiple functions, interact with various execution
become increasingly important to benchmark environments, and perform complex reasoning.
models on more challenging tasks. In October 2023,
researchers introduced SWE-bench, a dataset Even state-of-the-art LLMs face significant
comprising 2,294 software engineering problems challenges with SWE-bench. Claude 2, the
sourced from real GitHub issues and popular best-performing model, solved only 4.8% of the
Python repositories (Figure 2.3.3). SWE-bench dataset’s problems (Figure 2.3.4).8 In 2023, the top-
presents a tougher test for AI coding proficiency, performing model on SWE-bench surpassed the
demanding that systems coordinate changes across best model from 2022 by 4.3 percentage points.
A sample model input from SWE-bench SWE-bench: percent resolved

Source: Jimenez et al., 2023 Source: SWE-bench Leaderboard, 2023 | Chart: 2024 AI Index report
6%
Unassisted Assisted
4.80%
Percent resolved
3.97%
4%
3.01%
1.74% 1.96%
2%
0.52% 0.70% 0.70%
0.20% 0%
0%
ChatGPT-3.5 + BM25 Retrieval
ChatGPT-3.5
GPT-4 + BM25 Retrieval
SWE-Llama 13B + BM25 Retrieval
SWE-Llama 7B + BM25 Retrieval
GPT-4
Claude 2 + BM25 Retrieval
SWE-Llama 7B
SWE-Llama 13B
2022 2023 Claude 2

Model
Figure 2.3.4
Figure 2.3.3
8 According to the SWE-bench leaderboard, unassisted systems have no assistance in finding the relevant files in the repository. Assisted systems operate under the “oracle” retrieval setting,
which means the systems are provided with the list of files that were modified in the pull request.

Index Report 2024 2.4 Image Computer Vision and Image Generation
Computer vision allows machines to understand images and videos and create realistic visuals from textual prompts
or other inputs. This technology is widely used in fields such as autonomous driving, medical imaging, and video
game development.
2.4 Image Computer Vision

and Image Generation
Generation Which face is real?
Source: Which Face Is Real, 2023
Image generation is the task of generating
images that are indistinguishable from real ones.
Today’s image generators are so advanced that
most people struggle to differentiate between
AI-generated images and actual images of
human faces (Figure 2.4.1). Figure 2.4.2 highlights
several generations from various Midjourney
model variants from 2022 to 2024 for the
prompt “a hyper-realistic image of Harry Potter.”
The progression demonstrates the significant Figure 2.4.1
improvement in Midjourney’s ability to generate

hyper-realistic images over a two-year period.
In 2022, the model produced cartoonish and
inaccurate renderings of Harry Potter, but by 2024,
it could create startlingly realistic depictions.
Midjourney generations over time:

“a hyper-realistic image of Harry Potter”
Source: Midjourney, 2023
V1, February V2, April 2022

2022 V3, July 2022 V4, November 2022
V5, March 2023 V5.1, March 2023
V5.2, June 2023
V6, December 2023
Figure 2.4.2

HEIM: Holistic Evaluation of Image-text alignment: human evaluation

Source: CRFM, 2023 | Chart: 2024 AI Index report
Text-to-Image Models 1.00 0.94
The rapid progress of AI text-to-image systems has 0.75
0.80 0.72
Mean win rate

prompted the development of more sophisticated 0.63 0.60 0.59 0.56
0.60 0.55 0.54 0.54
evaluation methods. In 2023, Stanford researchers
0.40
introduced the Holistic Evaluation of Text-to-
0.20
Image Models (HEIM), a benchmark designed to
0.00
comprehensively assess image generators across
DALL-E 2 3.5B
Dreamlike Photoreal v2.0 1B
Stable Di usion v1.4 1B
Safe Stable Di usion strong 1B
DALL-E mega 2.6B
Openjourney v1 1B
Dreamlike Di usion v1.0 1B
Promptist + Stable Di usion v1.4 1B
MultiFusion 13B
DeepFloyd IF X-Large 4.3B

12 key aspects crucial for real-world deployment,
such as image-text alignment, image quality, and
aesthetics.9 Human evaluators are used to rate the
models, a crucial feature since many automated
metrics struggle to accurately assess various
aspects of images.
Model/adapter and number of parameters
HEIM’s findings indicate that no single model
Figure 2.4.3
excels in all criteria. For human evaluation of
image-to-text alignment (assessing how well
the generated image matches the input text), Model leaders on select HEIM sub-benchmarks
Source: CRFM, 2023 | Table: 2024 AI Index report
OpenAI’s DALL-E 2 scores highest (Figure
Task Leading model Score
2.4.3). In terms of image quality (gauging if the
Image-text-alignment DALL-E 2 (3.5B) 0.94
images resemble real photographs), aesthetics Quality Dreamlike Photoreal v2.0 (1B) 0.92
(evaluating the visual appeal), and originality Aesthetics Dreamlike Photoreal v2.0 (1B) 0.87
(a measure of novel image generation and Originality Dreamlike Photoreal v2.0 (1B) 0.98
avoidance of copyright infringement), the Stable

Diffusion–based Dreamlike Photoreal model Figure 2.4.4
ranks highest (Figure 2.4.4).
9 The 12 evaluation aspects of HEIM are: (1) Alignment: How closely does the image align with the given text? (2) Quality: What is the quality of the produced image? (3) Aesthetic:
How aesthetically pleasing is the generated image? (4) Originality: How original is the image? (5) Reasoning: Does the model understand objects, counts, and spatial relations? (6)
Knowledge: Does the model have knowledge about the world? (7) Bias: Are the generated images biased? (8) Toxicity: Are the generated images toxic or inappropriate? (9) Fairness:
Do the generated images exhibit performance disparities? (10) Robust: Is the model robust to input perturbations? (11) Multilinguality: Does the model support non-English languages?
(12) Efficiency: How fast is model inference?

MVDream
Creating 3D geometries or models from Sample generations from MVDream
Source: Shi et al., 2023
text prompts has been a significant
challenge for AI researchers, with existing
models struggling with problems such
as multiface Janus issue (inaccurately
regenerating context described by text
prompts) and content drift (inconsistency
across different 3D views). MVDream is a
new 3D generation system developed by
ByteDance and University of California, San
Diego researchers that overcomes some of
these hurdles (Figure 2.4.5). In quantitative
evaluations, MVDream’s generated models
achieve Inception Score (IS) and CLIP Figure 2.4.5
scores comparable to those in the training
set, indicating the high quality of the
generated images (Figure 2.4.6). MVDream
has major implications, especially for
creative industries where 3D content
creation is traditionally time-consuming
and labor-intensive.
Quantitative evaluation on image synthesis quality

Source: Shi et al., 2023 | Table: 2024 AI Index report
Model Batch size FID↓ IS↑ CLIP↑
Training data N/A N/A 14.75 ± 0.81 31.31 ± 3.34

Multi-view Di usion - no 2D data 256 33.41 12.76 ± 0.70 30.60 ± 3.14
Multi-view Di usion - proposed 256 32.57 13.72 ± 0.91 31.40 ± 3.05
Multi-view Di usion - proposed 1024 32.06 13.68 ± 0.41 31.31 ± 3.12
Figure 2.4.6

Instruction-Following VisIT-Bench
In computer vision, instruction-following is the In 2023, a team of industry and academic
capacity of vision-language models to interpret researchers introduced VisIT-Bench, a benchmark
text-based directives related to images. For instance, consisting of 592 challenging vision-language
an AI system could be given an image of various instructions across about 70 instruction categories,
ingredients and tasked with suggesting how to use such as plot analysis, art knowledge, and location
them to prepare a healthy meal. Capable instruction- understanding (Figure 2.4.8). As of January 2024, the
following vision-language models are necessary for leading model on VisIT-Bench is GPT-4V, the vision-
developing advanced AI assistants. enabled variant of GPT-4 Turbo, with an Elo score
of 1,349, marginally surpassing the human reference
score for VisIT-Bench (Figure 2.4.9).
A sample VisIT-Bench instruction set
Source: Bitton et al., 2023
Figure 2.4.8
VisIT-Bench: Elo rating
Source: Hugging Face, 2024 | Chart: 2024 AI Index report
1,338, human baseline 1,349
1,280
1,120
960
800
Elo rating
640
480
320
160
0
MM Pa Op Mi Vis Oc Ot Ins Ly ide mP Lla LL LL GP
GP nda en niG ua top ter tru nx( cs9 LU ma aV aV T-4
GP Fla PT lG us c tBL 8B G Ad A1 A- V
T T1 mi -4 PT V2 ) b -O a 3B Plu
3b ng IP wl pte s
o r-v
2
Model Figure 2.4.9

Editing A sample VisIT-Bench instruction set

Source: Bitton et al., 2023
Image editing involves using AI to modify

images based on text prompts. This AI-
assisted approach has broad real-world
applications in fields such as engineering,
industrial design, and filmmaking.
EditVal
Despite the promise of text-guided image
editing, few robust methods can evaluate
how accurately AI image editors adhere to
editing prompts. EditVal, a new benchmark
for assessing text-guided image editing,
includes over 13 edit types, such as adding
objects or changing their positions,
across 19 object classes (Figure 2.4.10).
The benchmark was applied to evaluate
eight leading text-guided image editing
methods including SINE and Null-text.
Figure 2.4.10
Performance improvements since 2021 on
a variety of the benchmark’s editing tasks,
EditVal automatic evaluation: editing accuracy
are shown in Figure 2.4.11. Source: EditVal Leaderboard, 2024 | Chart: 2024 AI Index report
0.60 0.59, Object replacement
0.52, Size
0.50
0.47, Object addition
0.40
Editing accuracy
0.34, Alter parts

0.33, Average
0.30
0.25, Positional addition
0.20
0.10 0.11, Position replacement
0.00
2021 2022
Figure 2.4.11

ControlNet
Conditioning inputs or performing conditional In 2023, researchers from Stanford introduced
control refers to the process of guiding the output a new model, ControlNet, that improves
created by an image generator by specifying conditional control editing for large text-
certain conditions that a generated image must to-image diffusion models (Figure 2.4.12).
meet. Existing text-to-image models often lack ControlNet stands out for its ability to handle
precise control over the spatial composition various conditioning inputs. Compared to
of an image, making it difficult to use prompts other previously released models in 2022,
alone to generate images with complex layouts, human raters prefer ControlNet both in terms
diverse shapes, and specific poses. Fine-tuning of superior quality and better condition fidelity
these models for greater compositional control by (Figure 2.4.13). The introduction of ControlNet is
training them on additional images is theoretically a significant step toward creating advanced text-
feasible, but many specialized datasets, such as to-image generators capable of editing images
those for human poses, are not large enough to to more accurately replicate the complex images
support successful training. frequently encountered in the real world.
Sample edits using ControlNet

Source: Zhang et al., 2023
Figure 2.4.12

ControlNet (cont’d)
Average User Ranking (AUR): result quality and condition delity

Source: Zhang et al., 2023 | Chart: 2024 AI Index report
4.22 4.28
4.09
4 3.93 4
3.21 3.28
3 3
Condition delity
Result quality
2.52
2.31
2 2
1.10 1.02
1 1
0 0
PITI
(sketch)
Sketch-Guided
(beta = 3.2)
Sketch-Guided
(beta = 1.6)
ControlNet-lite
ControlNet
PITI
(sketch)
Sketch-Guided
(beta = 1.6)
Sketch-Guided
(beta = 3.2)
ControlNet-lite
ControlNet
2022 2023 2022 2023
Figure 2.4.13

Instruct-NeRF2NeRF
New models can edit 3D geometries using (Figure 2.4.14). This method efficiently generates
only text instructions. Instruct-NeRF2NeRF is a new, edited images that adhere to textual
model developed by Berkeley researchers that instructions, achieving greater consistency than
employs an image-conditioned diffusion model current leading methods (Figure 2.4.15).
for iterative text-based editing of 3D geometries
A demonstration of Instruct-NeRF2NeRF in action

Source: Haque et al., 2023
Figure 2.4.14

Instruct-NeRF2NeRF (cont’d)
Evaluating text-image alignment and frame consistency

Source: Haque et al., 2023 | Chart: 2024 AI Index report
1.00 1.00
0.92 0.92
0.88
0.82
0.80 0.80
CLIP text-image direction similarity ↑
CLIP direction consistency ↑

0.60 0.60
0.40 0.40
0.20 0.16 0.16 0.20

0.12
0.03
0.00 0.00
SD On Ins Pe Pe On SD Ins
Sw e-t tru r-fra r- fra e-t Sw tru
/ IP im ct-N me me im / IP ct-N
2P eD eR IP2 IP2 eD 2P eR
U F2 P P U F2
Ne Ne
RF RF
Figure 2.4.15

Segmentation
Segmentation involves assigning individual image pixels to specific categories (for example: human, bicycle, or street).
Highlighted Research: Various segmentation masks created

Segment Anything by Segment Anything
Source: Kirillov et al., 2023
In 2023, Meta researchers launched Segment

Anything, a project that featured the Segment
Anything Model (SAM) and an extensive SA-
1B dataset for image segmentation. SAM is
remarkable for being one of the first broadly
generalizable segmentation models that
performs well zero-shot on new tasks and
distributions. Segment Anything outperforms
leading segmentation methods like RITM on
16 out of 23 segmentation datasets (Figure
2.4.17). The metric on which Segment Anything
is evaluated is the mean Intersection over
Union (IoU).
Meta’s Segment Anything model was then

used, alongside human annotators, to create
the SA-1B dataset, which included over 1 billion
segmentation masks across 11 million images
Figure 2.4.16
(Figure 2.4.16). A new segmentation dataset
of this size will accelerate the training of
future image segmentors. Segment Anything
demonstrates how AI models can be used
alongside humans to more efficiently create
large datasets, which in turn can be used to
train even better AI systems.

Segment Anything (cont’d)

SAM vs. RITM: mean IoU
Source: Kirillov et al., 2023 | Chart: 2024 AI Index report
PPDLS 46.90
BBBC038v1 44.70
DOORS 41.10
TimberSeg 28.90
NDD20 21.10
LVIS 18.50
STREETS 17.30
ZeroWaste-f 9.10
iShape 8.80
ADE20K 7.80
OVIS 7.00
Dataset
Hypersim 6.10
NDISPark 2.70
VISOR 1.80
Plittersdorf 1.50
EgoHOS 0.80
IBD -0.30
WoodScape -0.60
Cityscapes -2.00
PIDRay -5.80
DRAM -6.50
TrashCan -15.00
GTEA -21.40
−25 −20 −15 −10 −5 0 5 10 15 20 25 30 35 40 45 50
Mean IoU
Figure 2.4.17

3D Reconstruction From Images

3D image reconstruction is the process of creating three-dimensional digital geometries from two-dimensional
images. This type of reconstruction can be used in medical imaging, robotics, and virtual reality.
Highlighted Research: Objects from the 3D reconstruction dataset

Skoltech3D
Source: Voynov et al., 2023
Data scarcity often hinders the development

of AI systems for specific tasks. In 2023, a
team of international researchers introduced
an extensive new dataset, Skoltech3D, for
multiview 3D surface reconstruction (Figure
2.4.18). Encompassing 1.4 million images of 107
scenes captured from 100 different viewpoints
under 14 distinct lighting conditions, this dataset
represents a major improvement over existing 3D
reconstruction datasets (Figure 2.4.19).
Figure 2.4.18
Skoltech3D vs. the most widely used multisensor datasets

Source: Voynov et al., 2023 | Table: 2024 AI Index report
Dataset Sensor types RGB Depth High Poses/scene Lighting # Scenes # Frames
resolution resolution resolution
(MPix) (MPix) geometry
DTU RGB (2) 2 ✓ 49/64 8 80 27K

ETH3D RGB 24 ✓ 10–70 U 24 11K
TnT RGB 8 ✓ 150–300 U 21 148K
BlendedMVG unknown 3/0.4 20–1000 U 502 110K
BigBIRD RGB (5) 12 600 1 120 144K
BigBIRD RGB-D (5) 1.2 0.3
ScanNet RGB-D 1,3 0,3 N/A U 1513 2.5M
Skoltech3D RGB (2) 5 ✓ 100 14 107 877K
Skoltech3D RGB-D 1 (2) 40 0.04
Skoltech3D RGB-D 2 2 0.2
Skoltech3D RGB-D 3 2 0.9
Figure 2.4.19

Highlighted Research: Sample generations from RealFusion

RealFusion
Source: Melas-Kyriazi et al, 2023
RealFusion, developed by Oxford researchers,

is a new method for generating complete
3D models of objects from single images,
overcoming the challenge of often having
insufficient information from single images
for full 360 degree reconstruction. RealFusion
utilizes existing 2D image generators to produce
multiple views of an object, and then assembles
these views into a comprehensive 360 degree
model (Figure 2.4.20). This technique yields more
accurate 3D reconstructions compared to state-
of-the-art methods from 2021 (Shelf-Supervised),
across a wide range of objects (Figure 2.4.21).
Figure 2.4.20
Object reconstruction: RealFusion vs. Shelf-Supervised

Source: Melas-Kyriazi et al., 2023 | Chart: 2024 AI Index report
RealFusion (2023) Shelf-Supervised (2021)

1.00
12.89
12.22
12
0.82
10.23
0.80 0.76
10.16 10.08 0.74 0.74 0.74 0.74
9.72 0.72 0.73
10 0.69 0.70
0.71
0.69
0.71
8.66 8.72 0.65

8.26
CLIP-similarity ↑
8 7.58 7.74 0.60

F-score ↑
6.27 6.30
5.89
6
0.40
0.20
2
0 0.00
Backpack
Chair
Motorcycle
Orange
Skateboard
Teddy bear
Vase
Backpack
Chair
Motorcycle
Orange
Skateboard
Teddy bear
Vase
Figure 2.4.21

Index Report 2024 2.5 Video Computer Vision and Video Generation
Video analysis concerns performing tasks across videos rather than single images.
2.5 Video Computer Vision

and Video Generation
Sample frames from UCF101
Generation Source: Soomro et al., 2021
Video generation involves the use of AI to generate

videos from text or images.
UCF101
UCF101 is an action recognition dataset of realistic
action videos that contain 101 action categories
(Figure 2.5.1). More recently, UCF101 has been used
to benchmark video generators. This year’s top
model, W.A.L.T-XL, posted an FVD16 score of 36,
more than halving the state-of-the-art score posted
the previous year (Figure 2.5.2).
UCF101: FVD16 Figure 2.5.1

700
600
500
400
FVD16 ↓
300
200
100
36
0
2021 2022 2023
Figure 2.5.2

Align Your Latents

Most existing methods can only create short, low- GAN (LVG) in resolution quality (Figure 2.5.4).
resolution videos. To address this limitation, an The adaptation of a text-to-image architecture to
international team of researchers has applied latent create LDM, a highly effective text-to-video model,
diffusion models, traditionally used for generating exemplifies how advanced AI techniques can be
high-quality images, to produce high-resolution repurposed across different domains of computer
videos (Figure 2.5.3). Their Latent Diffusion Model vision. The LDM’s strong video generation
(LDM) notably outperforms previous state-of- capabilities have many real-world applications,
the-art methods released in 2022 like Long Video such as creating realistic driving simulations.
High-quality generation of milk dripping into a cup of coffee

Source: Blattmann et al., 2023
Figure 2.5.3
Video LDM vs. LVG: FVD and FID
Source: Blattmann et al., 2023 | Chart: 2024 AI Index report
FVD ↓ FID ↓
500 478
400 389
356
Performance
300
200
100
53.50 51.90
31.60
0
LVG Video LDM Video LDM (cond.)
2022 2023
Figure 2.5.4

Highlighted Research: Sample Emu Video generations

Emu Video
Source: Girdhar et al., 2023
Traditionally, progress in video generation has

trailed that in image generation due to its higher
complexity and the smaller datasets available
for training. Emu Video, a new transformer-
based video generation model created by Meta
researchers, represents a significant step forward
(Figure 2.5.5). Emu Video generates an image from
text and then creates a video based on both the
text and image. Figure 2.5.6 illustrates the degree
to which the Emu Video model outperforms
Figure 2.5.5
previously released state-of-the-art video
generation methods. The metric is the proportion instructions over the compared method. Emu
of cases when human evaluators preferred Emu Video simplifies the video generation process and
Video’s image quality or faithfulness to text signals a new era of high-quality video generation.
Emu Video vs. prior works: human-evaluated video quality and text faithfulness win rate
Source: Girdhar et al., 2023 | Chart: 2024 AI Index report
Quality Text Faithfulness
100% 100% 99%

100% 97% 97% 96% 97%
91% 92%
85% 88% 87%
82% 81%
79%
80%
Emu Video win rate
60% 56%
40%
20%
0%
Imagen Video
Make-A-Video
CogVideo
Gen-2
PYOCO
Align Your Latents
Reuse and Di use
Pika Labs
2022 2023
Figure 2.5.6

Index Report 2024 2.6 Reasoning
Reasoning in AI involves the ability of AI systems to draw logically valid conclusions from different forms of information.
AI systems are increasingly being tested in diverse reasoning contexts, including visual (reasoning about images), moral
(understanding moral dilemmas), and social reasoning (navigating social situations).10
2.6 Reasoning
General Reasoning Massive Multi-discipline Multimodal Understanding
General reasoning pertains to AI systems being and Reasoning Benchmark for Expert AGI. MMMU
able to reason across broad, rather than specific, comprises about 11,500 college-level questions
domains. As part of a general reasoning challenge, from six core disciplines: art and design, business,
for example, an AI system might be asked to reason science, health and medicine, humanities and social
across multiple subjects rather than perform one science, and technology and engineering (Figure
narrow task (e.g., playing chess). 2.6.1). The question formats include charts, maps,
tables, chemical structures, and more. MMMU is
MMMU: A Massive Multi-discipline
Multimodal Understanding and Reasoning one of the most demanding tests of perception,
Benchmark for Expert AGI knowledge, and reasoning in AI to date. As of
In recent years, the reasoning abilities of AI systems January 2024, the highest performing model is
have advanced so much that traditional benchmarks Gemini Ultra, which leads in all subject categories
like SQuAD (for textual reasoning) and VQA (for with an overall score of 59.4% (Figure 2.6.2).11 On
visual reasoning) have become saturated, indicating most individual task categories, top models are still
a need for more challenging reasoning tests. well beyond medium-level human experts (Figure
2.6.3). This relatively low score is evidence of
Responding to this, researchers from the United MMMU’s effectiveness as a benchmark for assessing
States and Canada recently developed MMMU, the AI reasoning capabilities.
10 Some abilities highlighted in the previous sections implicitly involve some form of reasoning. This section highlights tasks that have a more specific reasoning focus.
11 The AI Index reports results from the MMMU validation set, as recommended by the paper authors for the most comprehensive coverage. According to the authors, the test set, with its
unreleased labels and larger size, presents a more challenging yet unbiased benchmark for model performance, ensuring a more robust evaluation. The test set results are available on the
MMMU page.

Sample MMMU questions

Source: Yue et al., 2023
Figure 2.6.1

MMMU: overall accuracy

Source: MMMU, 2023 | Chart: 2024 AI Index report
100%
82.60%, Human expert (medium)

80%
Overall accuracy (%)
59.40%
60% 56.80%
51.60% 51.40% 51.10%
45.20% 43.00% 41.20% 41.10% 39.40%
40%
20%
0%
Gemini Ultra*
GPT-4V(ision) (Playground)
InternVL-Chat-V1.2*
Qwen-VL-MAX*
LLaVA-1.6-34B*
Qwen-VL-PLUS*
InternLM-XComposer2-VL*
Marco-VL*
OmniLMM-12B*
In MM-Zephyr-7B*
Figure 2.6.212
Model
MMMU: subject-speci c accuracy

Source: MMMU, 2023 | Table: 2024 AI Index report
MMMU task Leading model Score Human expert

category (medium)
Art and Design Qwen-VL-MAX* 51.4 84.2

Business GPT-4V(ision) 59.3 86
(Playground)
Science GPT-4V(ision) 54.7 84.7

(Playground)
Health and Gemini Ultra* 67.3 78.8

Medicine
Humanities and Gemini Ultra* 78.3 85

Social Sciences
Technology and Gemini Ultra* 47.1 79.1

Engineering
Figure 2.6.3
12 An asterisk (*) next to the model names indicates that the results were provided by the authors.

GPQA: A Graduate-Level Google-Proof Q&A were crafted by subject-matter experts in various

Benchmark fields like biology, physics, and chemistry (Figure
In the last year, researchers from NYU, Anthropic, and 2.6.4). PhD-level experts achieved a 65% accuracy rate
Meta introduced the GPQA benchmark to test general in their respective domains on GPQA, while nonexpert
multisubject AI reasoning. This dataset consists of humans scored around 34%. The best-performing AI
448 difficult multiple-choice questions that cannot be model, GPT-4, only reached a score of 41.0% on the
easily answered by Google searching. The questions main test set (Figure 2.6.5).
A sample chemistry question from GPQA

Source: Rein et al., 2023
Figure 2.6.4
GPQA: accuracy on the main set

Source: Rein et al., 2023 | Chart: 2024 AI Index report
100%
80%
72.50%, Expert human validators
Accuracy (%) on the main set
60%
40%
39.70% 41.00%
30.50%, Nonexpert human validators
29.10% 28.00%
20%
0%
Few-Shot CoT Few-Shot CoT Few-Shot CoT GPT-4 with search
Llama-2-70B-chat GPT-3.5-turbo-16k GPT-4 (backo to CoT on abstention)
Evaluation method and model Figure 2.6.5

Comparing Humans, GPT-4, and GPT-4V

on Abstraction and Reasoning Tasks
A sample ARC reasoning task
Source: Mitchell et al., 2023
Abstract reasoning involves using known information to solve
unfamiliar and novel problems and is a key aspect of human cognition
that is evident even in toddlers. While recent LLMs like GPT-4 have
shown impressive performance, their capability for true abstract
reasoning remains a hotly debated subject.13 To further explore this
topic, researchers from the Santa Fe Institute tested GPT-4 on the
ConceptARC benchmark, a collection of analogy puzzles designed
to assess general abstract reasoning skills (Figure 2.6.6). The study
revealed that GPT-4 significantly trails behind humans in abstract
reasoning abilities: While humans score 95% on the benchmark, the
best GPT-4 system only scores 69% (Figure 2.6.7). The development
of truly general AI requires abstract reasoning capabilities. Therefore,
it will be important to continue tracking progress in this area.
Figure 2.6.6
ConceptARC: accuracy on minimal tasks over all concepts

Source: Mitchell et al., 2023 | Chart: 2024 AI Index report
100%
95%, Human performance
80%
69%
65%
60%
Accuracy (%)
40%
25%
23%
20%
0%
GPT-4 Temp = 0 GPT-4 Temp = 0.5 GPT-4V Zero-Shot GPT-4V One-Shot
Model
13 Some claim these models exhibit such reasoning capabilities, while others claim they do not.
Figure 2.6.7

Mathematical Reasoning
Mathematical problem-solving benchmarks evaluate that AI models develop multistep solutions utilizing
AI systems’ ability to reason mathematically. AI arithmetic operations (Figure 2.6.8). GSM8K has
models can be tested with a range of math problems, quickly become a favored benchmark for evaluating
from grade-school level to competition-standard advanced LLMs. The top-performing model on GSM8K
mathematics. is a GPT-4 variant (GPT-4 Code Interpreter), which

scores an accuracy of 97%, a 4.4% improvement from
GSM8K the state-of-the-art score in the previous year and a
GSM8K, a dataset comprising approximately 8,000 30.4% improvement from 2022 when the benchmark
varied grade school math word problems, requires was first introduced (Figure 2.6.9).
Sample problems from GSM8K

Source: Cobbe et al., 2023
Figure 2.6.8

GSM8K: accuracy
97%
96%
Accuracy (%)
93%
2022 2023
Figure 2.6.9

MATH A sample problem from the MATH dataset

Source: Hendrycks et al., 2023
MATH is a dataset of 12,500 challenging
competition-level mathematics problems
introduced by UC Berkeley researchers in 2021
(Figure 2.6.10). AI systems struggled on MATH
when it was first released, managing to solve only
6.9% of the problems. Performance has significantly
improved. In 2023, a GPT-4-based model posted
the top result, successfully solving 84.3% of the
dataset’s problems (Figure 2.6.11).
MATH word problem-solving: accuracy Figure 2.6.10

90%, Human baseline
84.30%
80%
60%
Accuracy (%)
40%
20%
0%
2021 2022 2023
Figure 2.6.11

PlanBench tested I-GPT-3 and GPT-4 on 600 problems in the

A planning system receives a specified goal, an Blocksworld domain (where a hand tries to construct
initial state, and a collection of actions. Each action stacks of blocks when it is only allowed to move one
is defined by preconditions, which must be met for block at a time to the table or to the top of a clear
the action to be executed, and the effects that result block) using one-shot learning and showed that GPT-4
from the action’s execution. The system constructs could generate correct plans and cost-optimal plans
a plan, comprising a series of actions, to achieve the about 34% of the time, and I-GPT-3 about 6% (Figure
goal from the initial state. 2.6.12). Verifying the correctness of a plan is easier.
Claims have been made that LLMs can solve

GPT-4 vs. I-GPT-3 on PlanBench
planning problems. A group from Arizona State Source: Valmeekam, 2023 | Table: 2024 AI Index report
University has proposed PlanBench, a benchmark Task GPT-4 I-GPT-3

(instances correct) (instances correct)
suite containing problems used in the automated
Plan generation 34.30% 6.80%
planning community, especially those used in Cost-optimal planning 33% 5.80%
the International Planning Competition. They Plan veri cation 58.60% 12%
Figure 2.6.12

Visual Reasoning
Visual reasoning tests how well AI systems can behind their answers (Figure 2.6.13). Performance
reason across both visual and textual data. in VCR is measured using the Q->AR score, which
evaluates the machine’s ability to both select the
Visual Commonsense Reasoning (VCR) correct answer to a question (Q->A) and choose the
Introduced in 2019, the Visual Commonsense appropriate rationale behind that answer (Q->R). While
Reasoning (VCR) challenge tests the commonsense AI systems have yet to outperform humans on this
visual reasoning abilities of AI systems. In this task, their capabilities are steadily improving. Between
challenge, AI systems not only answer questions 2022 and 2023, there was a 7.93% increase in AI
based on images but also reason about the logic performance on the VCR challenge (Figure 2.6.14).
A sample question from the Visual Commonsense Reasoning (VCR) challenge

Source: Zellers et al., 2018
Figure 2.6.13
Visual Commonsense Reasoning (VCR) task: Q->AR score
Source: VCR Leaderboard, 2023 | Chart: 2024 AI Index report
85, Human baseline
81.60
80
70
Q->AR score
60
50
2018 2019 2020 2021 2022 2023

Figure 2.6.14

Moral Reasoning A moral story from MoCa

Source: Nie et al., 2023
In the future, AI will be increasingly applied to
domains where ethical considerations are crucial,
such as in healthcare and judicial systems. Therefore,
it is essential for AI systems to possess robust moral
reasoning capabilities, enabling them to effectively
navigate and reason about ethical principles and
moral considerations.
MoCa
The ability of AI models to reason in linguistic and
visual domains is well established, yet their capacity
for moral reasoning, especially moral reasoning
that aligns with human moral judgments, is less
understood.14 To further explore this topic, a team of
Stanford researchers created a new dataset (MoCa)
of human stories with moral elements (Figure 2.6.15).
The researchers then presented these models with
stories of human actions and prompted the models
to respond, measuring moral agreement with the Figure 2.6.15
discrete agreement metric: A higher score indicates

closer alignment with human moral judgment. The
study yielded intriguing results. No model perfectly
matches human moral systems, but newer, larger
models like GPT-4 and Claude show greater
alignment with human moral sentiments than smaller
models like GPT-3, suggesting that as AI models
scale, they are gradually becoming more morally
aligned with humans. Of all models surveyed, GPT-4
showed the greatest agreement with human moral
sentiments (Figure 2.6.16).
14 The topic of AI and moral alignment is contentious, as there are no universally agreed-upon moral principles. What constitutes moral alignment for one party may significantly
differ for another.

Zero-shot alignment with human judgments on the moral permissibility task: discrete agreement
Source: Nie et al., 2023 | Chart: 2024 AI Index report
41.90
40.30
40
37.10
Discrete agreement (agg.)
32.30 32.30
30
26.60 26.60 26.60 26.60
23.40
20.20
20 18.50
10
0
ALBERT-xxlarge
RoBERTa-large
GPT-2-XL
GPT-3-babbage-v1
Electra-gen-large
GPT-3-curie-v1
GPT-3.5-davinci-v3
GPT-3.5-davinci-v2
Alpaca-7B
Anthropic-claude-v1
GPT-3.5-turbo
GPT-4
2019 2020 2022 2023
Figure 2.6.16

Causal Reasoning human evaluators as superior to existing ToM

Causal reasoning assesses an AI system’s ability to benchmarks. BigToM tests LLMs on forward belief
understand cause-and-effect relationships. As AI (predicting future events), forward action (acting
becomes increasingly ubiquitous, it has become based on future event predictions), and backward
important to evaluate whether AI models can not belief (retroactively inferring causes of actions)
only explain their outputs but also update their (Figure 2.6.17).
conclusions—key aspects of causal reasoning.
In tests of LLMs on the benchmark, GPT-4 was the
BigToM top performer, with ToM capabilities nearing but not
Assessing whether LLMs have theory-of-mind (ToM) surpassing human levels (Figure 2.6.18, Figure 2.6.19,
capabilities—understanding and attributing mental and Figure 2.6.20). More specifically, as measured by
states such as beliefs, intentions, and emotions—has accuracy in correctly inferring beliefs, GPT-4 closely
traditionally challenged AI researchers. Earlier methods matched human performance in forward belief and
to evaluate ToM in LLMs were inadequate and lacked backward belief tasks and slightly surpassed humans
robustness. To tackle this problem, in 2023 researchers in forward action tasks. Importantly, the study
developed a new benchmark called BigToM, designed shows that LLM performance on ToM benchmarks
for evaluating the social and causal reasoning abilities is trending upward, with newer models like GPT-
of LLMs. BigToM, comprising 25 controls and 5,000 4 outperforming predecessors such as GPT-3.5
model-generated evaluations, has been rated by (released in 2022).
Sample BigToM scenario

Source: Gandhi et al., 2023
Figure 2.6.17

Forward action inference with initial belief: accuracy Backward belief inference with initial belief: accuracy
Source: Gandhi et al., 2023 | Chart: 2024 AI Index report Source: Gandhi et al., 2023 | Chart: 2024 AI Index report
TB TB∧FB TB TB∧FB
97% 98% 99%

100% 96% 93%
96% 100%
90%
81%
80% 76% 80%
73% 73%
Accuracy (%)
Accuracy (%)
62%
60% 60%
46% 47%
42% 40%
40% 40% 35%34%
29% 29% 30%
24%
20%
20% 20% 12% 9% 15%
10% 7%
0% 0%
text-davinci-003
GPT-3.5-turbo
LLaMA-65
Human performance
Claude-2
GPT-4-0314
Claude
GPT-3.5-turbo
text-davinci-003
Claude
Claude-2
LLaMA-65
GPT-4-0314
Human performance
2022 2023 2022 2023
Figure 2.6.18 Figure 2.6.19
Forward belief inference with initial belief: accuracy

Source: Gandhi et al., 2023 | Chart: 2024 AI Index report
TB TB∧FB
99%98%
100%
91% 90%
80%
Accuracy (%)
62%59%
60% 55%52%
43% 41%
40% 35%
31%
27% 25%
20%
0%
text-davinci-003
GPT-3.5-turbo
LLaMA-65
Claude-2
Claude
GPT-4-0314
Human performance
2022 2023
Figure 2.6.20

Tübingen Cause-Effect Pairs

Researchers from Microsoft year’s best by 13 percentage points (Figure 2.6.22). Notably,
and the University of Chicago GPT-4 outperformed prior covariance-based AI models, which
have demonstrated that LLMs were explicitly trained for causal reasoning tasks. Furthermore,
are effective causal reasoners. the researchers discovered that certain prompts, especially those
The team evaluated several designed to encourage helpfulness, can significantly enhance an
recent LLMs, including GPT- LLM’s causal reasoning capabilities.
4, using the Tübingen cause-
Sample cause-effect pairs from the Tübingen dataset
effect pairs dataset. This Source: Kiciman et al., 2023
benchmark comprises over 100

cause-and-effect pairs across
37 subdisciplines, testing AI
systems’ ability to discern causal
relationships (Figure 2.6.21). GPT-
4’s performance, a 96% accuracy
score, surpassed the previous
Figure 2.6.21
Performance on the Tübingen cause-e ect pairs dataset: accuracy

Source: Kıcıman et al., 2023 | Chart: 2024 AI Index report
100% GPT version 96%

89%
86%
83% 82% 83%
79% 81%
80% 75% 75%
68%
Accuracy (%)
60%
49% 50% 50% 50% 50% 51% 51%
48%
40%
20%
0%
PNL-MLP
Slope
davinci
text-ada-001
ada
text-babbage-001
text-curie-001
text-davinci-001
babbage
curie
bQCD
Mosaic
text-davinci-002
GPT-3.5-turbo
text-davinci-003
GPT-3.5-turbo
(causal agent)
GPT-3.5-turbo
(single prompt)
LMPrior
GPT-4
(single prompt)
2012 2017 2019 2020 2021 2022 2023

Figure 2.6.22

Index Report 2024 2.7 Audio
AI systems are adept at processing human speech, with audio capabilities that include transcribing spoken words to
text and recognizing individual speakers. More recently, AI has advanced in generating synthetic audio content.
2.7 Audio
Generation
2023 marked a significant year in the field of audio This advancement was highlighted by the release
generation, which involves creating synthetic audio of several prominent audio generators, such as
content, ranging from human speech to music files. UniAudio, MusicGen, and MusicLM.

UniAudio
UniAudio is a high-level language modeling UniAudio surpasses leading methods in tasks,
technique to create audio content. UniAudio including text-to-speech, speech enhancement,
uniformly tokenizes all audio types and, like modern and voice conversion (Figure 2.7.1). With 1 billion
LLMs, employs next-token prediction for high- parameters and trained on 165,000 hours of audio,
quality audio generation. UniAudio is capable of UniAudio exemplifies the efficacy of big data and
generating high-quality speech, sound, and music. self-supervision for music generation.
UniAudio vs. selected prior works in the training stage: objective evaluation metrics
Source: Yang et al., 2023 | Chart: 2024 AI Index report
SIM ↑ WER ↓ PESQ ↑ VISQOL ↑ DNSMOS ↑ FAD ↓ KL ↓
Text-to-speech Voice conversion
4.90 4.80
5 5
4 4
3 3
2.30
2.00
2 2
0.71 0.82 0.87

1 0.62 1
0 0
Shen et al. (2023) UniAudio Wang et al. (2023e) UniAudio
Speech enhancement Target speaker extraction

Objective evaluation metrics
3.96
4 3.66 4
3.21 3.29 3.35
3 2.72 2.63 3
2.44 2.41 2.36
1.88
2 2 1.68
1 1
0 0
Richter et al. (2023) UniAudio Wang et al. (2018) UniAudio
Text-to-sound Text-to-music
4.93
5 5 4.52
4 4 3.65
3.12
3 2.60 2.60 3
1.90
2 2
1.40
1 1
0 0
Liu et al. (2023a) UniAudio Copet et al. (2023) UniAudio
Model Figure 2.7.1

MusicGEN and MusicLM

Meta’s MusicGen is a novel audio generation Human evaluators also favor MusicGen for its
model that also leverages the transformer overall quality (OVL).
architecture common in language models to
Although MusicGen outperforms certain text-
generate audio. MusicGen enables users to
to-music models released earlier in the year,
specify text for a desired audio outcome and
MusicLM is worth highlighting because its release
then fine-tune it using specific melodies. In
was accompanied by the launch of MusicCaps, a
comparative studies, MusicGen outshines other
state-of-the-art dataset of 5.5K music-text pairs.
popular text-to-music models like Riffusion,
MusicCaps was used by MusicGen researchers
Moûsai, and MusicLM across various generative
to benchmark the performance of their family
music metrics. It boasts a lower FAD score,
of models. The emergence of new models like
indicating more plausible music generation,
MusicGen, and new music-to-text benchmarks like
a lower KL score for better alignment with
MusicCaps, highlights the expansion of generative
reference music, and a higher CLAP score,
AI beyond language and images into more diverse
reflecting greater adherence to textual
skill modalities like audio generation.
descriptions of reference music (Figure 2.7.2).

CLAP SCR ↑ FAD VGG ↓
0.00
0.10
0.20
0.30
0.40
0
5
10
15
Ri usion
14.80
Ri usion
0.19
Table of Contents
Mousai
7.50
Mousai
0.23
MusicGen w/
MusicGen (2023)
Index Report 2024
5.00
random melody (1.5B)
MusicGen w/
0.28
random melody (1.5B) MusicLM
4.00
Source: Copet et al., 2023 | Chart: 2024 AI Index report
MusicGen w/o
MusicGen w/o
3.80
melody (3.3B)
0.31
melody (300M)
Mousai (2023)
Chapter 2 Preview
MusicGen w/o
3.40
melody (1.5B)
MusicGen w/o
0.31
melody (3.3B)
MusicGen w/o
3.10
melody (300M)
MusicGen w/o
0.32
melody (1.5B) Noise2Music
Ri usion (2022)
2.10
OVL.↑
KL ↓
Evaluation of MusicGen and baseline models on MusicCaps
0
20
40
60
80
100
0
1
2
MusicGEN and MusicLM (cont’d)
Mousai Ri usion
76.11
2.06
MusicGen w/o
78.43
melody (300M) Mousai
1.59
Ri usion
79.31
MusicGen w/
1.31

MusicLM
80.51
MusicGen w/o
1.28
melody (300M)
MusicGen w/o
80.74
melody (1.5B)
MusicGen w/o
1.23
MusicGen w/ melody (1.5B)

81.3
MusicGen w/o MusicGen w/o

1.22
melody (3.3B)
84.81
melody (3.3B)
Figure 2.7.2
130
2.7 Audio
Index Report 2024 2.8 Agents
AI agents, autonomous or semiautonomous systems designed to operate within specific environments to accomplish
goals, represent an exciting frontier in AI research. These agents have a diverse range of potential applications, from
assisting in academic research and scheduling meetings to facilitating online shopping and vacation booking.
2.8 Agents
General Agents assessed over 25 LLM-based agents, including those
This section highlights benchmarks and research built on OpenAI’s GPT-4, Anthropic’s Claude 2, and
into agents that can flexibly operate in general task Meta’s Llama 2. GPT-4 emerged as the top performer,
environments. achieving an overall score of 4.01, significantly
higher than Claude 2’s score of 2.49 (Figure 2.8.2).
AgentBench
The research also suggests that LLMs released in
AgentBench, a new benchmark designed for
2023 outperform earlier versions in agentic settings.
evaluating LLM-based agents, encompasses eight
Additionally, the AgentBench team speculated that
distinct interactive settings, including web browsing,
agents’ struggles on certain benchmark subsections
online shopping, household management, puzzles,
can be attributed to their limited abilities in long-term
and digital card games (Figure 2.8.1). The study
reasoning, decision-making, and instruction-following.
Description of the AgentBench benchmark

Source: Liu et al., 2023
Figure 2.8.1

AgentBench across eight environments: overall score

Source: Liu et al., 2023 | Chart: 2024 AI Index report
4.01
4.00
3.50
3.00
2.44 2.49
Overall score
2.50 2.32
2.00
1.71
1.60
1.50 1.39
1.25
0.93 0.96
1.00
0.50
0.00
text-davinci-002
text-davinci-003
GPT-3.5-turbo
Vicuna-13b
CodeLlama-34b
chat-bison-001
Claude-instant
Claude
Claude-2
GPT-4
2022 2023
Figure 2.8.2

Voyageur
Recent research by Nvidia, Caltech, UT Austin, playground for its players (Figure 2.8.3). Voyager
Stanford, and UW Madison demonstrates that excels in this environment, adeptly remembering
existing LLMs like GPT-4 can be used to develop plans, adapting to new settings, and transferring
flexible agents capable of continuous learning. knowledge. It significantly outperforms previous
The team created Voyager, a GPT-4-based agent models, collecting 3.3 times more unique items,
for Minecraft—a complex video game with no traveling 2.3 times further, and reaching key
set endpoint that is essentially a boundless virtual milestones 15.3 times faster (Figure 2.8.4).
Voyager in action
Source: Wang et al., 2023
Figure 2.8.3
Voyager’s performance improvements over prior The launch of Voyager is significant, as AI

state of the art in Minecraft
Source: Wang et al., 2023 | Chart: 2024 AI Index report researchers have long faced challenges in
16 15.30
creating agents that can explore, plan, and
14 learn in open-ended worlds. While previous AI
systems like AlphaZero succeeded in closed,
Voyager’s improvement factor
12
rule-defined environments like chess, Go,

10
and shogi, they struggled in more dynamic
8
settings, lacking the ability to continuously
6 learn. Voyager, however, demonstrates
4
remarkable proficiency in a dynamic video
3.30
2.30 game setting, thereby representing a notable
2
advancement in the field of agentic AI.
0
Unique items obtained Distance traveled Speed of unlocking key
tech tree milestones
Category
Figure 2.8.4

Task-Specific Agents across 15 varied research tasks. Examples of the tasks

include improving a baseline model on the CIFAR-10
This section highlights benchmarks and research
image dataset and training a language model on
into agents that are optimized to perform in specific
over 10 million words in BabyLM. Various LLM-based
task environments, such as mathematical problem-
agents, including GPT-4, Claude-1, AutoGPT, and
solving or academic research.
LangChain, were tested. The results demonstrate
MLAgentBench that although there is promise in AI research agents,
MLAgentBench, a new benchmark for evaluating performance varies significantly across tasks. While
AI research agents’ performance, tests whether some agents achieved over 80% on tasks like ogbn-
AI agents are capable of engaging in scientific arxiv (improving a baseline paper classification
experimentation. More specifically, MLAgentBench model), all scored 0% on BabyLM (training a small
assesses AI systems’ potential as computer science language model) (Figure 2.8.5). Among these, GPT-4
research assistants, evaluating their performance consistently delivered the best results.
MLAgentBench evaluation: success rate of select models across tasks

Source: Huang et al., 2023 | Chart: 2024 AI Index report
GPT-4 GPT-4 (no retrieval) Claude-1 Claude-1 (no retrieval) AutoGPT LangChain (React) Baseline
100%
80%
60%
Success rate
40%
20%
0%
Ba CL CI fat fee ho ide im lla og Pa sp ve
by RS FA hom db us nti db ma bn rkins ac cto
LM R-1 ac e-p fy- -in -ar esh riz
0 ne k ric co fer xiv on ip- ati
t e n e ’s-d tita on
tra nc ise n
i ls e ase i c
Task
Figure 2.8.5
15 The full tasks include: (1) CIFAR-10 (improve a baseline image classification model), (2) imdb (improve a baseline sentiment classification model), (3) ogbn-arxiv (improve a baseline
paper classification model from scratch), (4) house prices (train a regression model), (5) spaceship titanic (train a classifier model from scratch), (6) Parkinson’s-disease (train a time-series
regression model), (7) FathomNet (train an out-of-distribution image classification model), (8) feedback (train an out-of-distribution text regression model), (9) identify contrails (train an
out-of-distribution image segmentation model), (10) CLRS (model classic algorithms over graphs and lists), (11) BabyLM (train language model over 10M words), (12) llama-inference (improve
the runtime/autoregressive generation speed of Llama 7B, (13) vectorization (improve the inference speed of a model), (14) literature-review-tool (perform literature review), and (15) bibtex-
generation (generate BibTex from sketch).

Index Report 2024 2.9 Robotics
Over time, AI has become increasingly integrated into robotics, enhancing robots’ capabilities to perform complex
tasks. Especially with the rise of foundation models, this integration allows robots to iteratively learn from their
surroundings, adapt flexibly to new settings, and make autonomous decisions.
2.9 Robotics
PaLM-E
PaLM-E is a new AI model from Google that outperforms previous state-of-the-art methods like
merges robotics with language modeling to SayCan and PaLI on both embodied visual question
address real-world tasks like robotic manipulation answering and planning (Figure 2.9.2).16 On
and knowledge tasks like question answering and robotic manipulation tasks, PaLM-E outperforms
image captioning. Leveraging transformer-based competing models (PaLI and CLIP-FT) in its ability
architectures, the largest PaLM-E model is scaled to detect failures, which is a crucial step for robots
up to 562B parameters. The model is trained to perform closed-loop planning (Figure 2.9.3).
on diverse visual language as well as robotics
PaLM-E is significant in that it demonstrates that
data, which results in superior performance on
language modeling techniques as well as text
a variety of robotic benchmarks. PaLM-E also
data can enhance the performance of AI systems
sets new standards in visual tasks like OK-VQA,
in nonlanguage domains, like robotics. PaLM-E
excels in other language tasks, and can engage in
also highlights how there are already linguistically
chain-of-thought, mathematical, and multi-image
adept robots capable of real-world interaction and
reasoning, even without specific training in these
high-level reasoning. Developing these kinds of
areas. Figure 2.9.1 illustrates some of the tasks that
multifaceted robots is an essential step in creating
the PaLM-E model can perform.
more general robotic assistants that can, for
On Task and Motion Planning (TAMP) domains, example, assist in household work.
where robots have to manipulate objects, PaLM-E
16 Embodied Visual Question Answering (Embodied VQA) is a task where agents need to navigate through 3D environments and answer questions about the objects they
visually perceive in the environment.

PaLM-E (cont’d)
PaLM-E in action
Source: Robotics at Google, 2023
Figure 2.9.1
Performance of select models on TAMP environment: success rate

Source: Driess et al., 2023 | Table: 2024 AI Index report
Model Embodied VQA q1 Embodied VQA q2 Embodied VQA q3 Embodied VQA q4 Planning p1 Planning p2
SayCan (oracle a ordances) 38.7 33.3

PaLI (zero-shot) 0 0
PaLM-E OSRT w/ input 99.7 98.2 100 93.7 82.5 76.2

encoding
Figure 2.9.2
Select models on mobile manipulation environment

tests: failure detection
Source: Driess et al., 2023 | Table: 2024 AI Index report
Baselines Failure detection
PaLI (zero-shot) 0.73

CLIP-FT 0.65
CLIP-FT-hindsight 0.89
PaLM-E-12B 0.91
Figure 2.9.3

RT-2
Real-world robots could benefit from certain and adaptable approaches for conditioning robotic
capabilities possessed by LLMs, such as text and policy. It outshines state-of-the-art models like
code generation, as well as visual understanding. Manipulation of Open-World Objects (MOO) across
RT-2, a new robot released from DeepMind, various benchmarks, particularly in tasks involving
represents an ambitious attempt to create a unseen objects. On such tasks, an RT-2/PaLM-E
generalizable robotic model that has certain variant achieves an 80% success rate, significantly
LLM capabilities. RT-2 uses a transformer-based higher than MOO’s (53%) (Figure 2.9.4). In unseen
architecture and is trained on both robotic object tasks, RT-2 surpasses the previous year’s
trajectory data that is tokenized into text and state-of-the-art model, RT-1, by 43 percentage
extensive visual-language data. points. This indicates an improvement in robotic
performance in novel environments over time.
RT-2 stands out as one of the most impressive
Evaluation of RT-2 models and baselines on seen and unseen tasks: success rate
Source: Brohan et al., 2023 | Chart: 2024 AI Index report
R3M (2022) VC-1 (2023) RT-1 (2022) MOO (2023) RT-2-PaLM-E-12B1 (2023) RT-2-PaLI-X-55B (2023)
100%
92% 93%
91%
80%
80%
75%
73% 72%
66%
63% 62%62%
60%
Success rate
53%
49%
45%
40%40%
40% 37%
35% 35%
32%
23%22%
20%
20%
11% 11% 12%
10%
8%
1% 0%
0%
Seen tasks Unseen objects (avg.) Unseen backgrounds (avg.) Unseen environments (avg.) Unseen average
Task
Figure 2.9.4

Index Report 2024 2.10 Reinforcement Learning
In reinforcement learning, AI systems are trained to maximize performance on a given task by interactively learning
from their prior actions. Systems are rewarded if they achieve a desired goal and punished if they fail.
2.10 Reinforcement Learning

Reinforcement Learning seven models reported using RLHF, and in 2023, 16
from Human Feedback models reported using RLHF. The rising popularity of
Reinforcement learning has gained popularity RLHF is also evidenced by the fact that many leading
in enhancing state-of-the-art language models LLMs report improving their models with RLHF
like GPT-4 and Llama 2. Introduced in 2017, (Figure 2.10.2).
Reinforcement Learning from Human Feedback

(RLHF) incorporates human feedback into the Number of foundation models using RLHF, 2021–23
reward function, enabling models to be trained for 16
16
characteristics like helpfulness and harmlessness.

14
This year, the AI Index tracked data on the number

12
of foundation models using RLHF as part of
Number of models
their training. More specifically, the Index team 10
looked through the technical reports and other

8
7
documentation of all models included in CRFM’s
Ecosystem graph, one of the most comprehensive 6
repositories of the foundation model ecosystem.17

4
Figure 2.10.1 illustrates how many foundation models
reported using RLHF over time. In 2021, no newly 2
released foundation models used RLHF. In 2022, 0

0
2021 2022 2023
Figure 2.10.1
RLHF usage among foundation models

Source: AI Index, 2024 | Table: 2024 AI Index report
GPT-4 Llama 2 Claude-2 Gemini Mistral-7B
✓ ✓ ✓ ✓ x
Figure 2.10.2
17 It is possible that more models use RLHF as part of their training than reported. The Index only tracks data for models that publicly report using RLHF.

RLAIF and RLHF vs. SFT baseline: win rate
RLAIF Source: Lee et al., 2023 | Chart: 2024 AI Index report
100%
RLHF RLAIF
RLHF is a powerful method for aligning AI

80%
models but can be hindered by the time 73%
71%
and labor required to generate human
64% 63%
preference datasets for model alignment. 60%
As an alternative, Reinforcement
Learning from AI Feedback (RLAIF) uses Win rate
40%
reinforcement learning based on the
preferences of LLMs to align other AI
models toward human preferences.
20%
Recent research from Google Research

compares RLAIF with RLHF, the traditional
0%
gold standard, to assess whether RLAIF Summarization Helpfulness
Task
can serve as a reliable substitute. The Figure 2.10.3
study finds that both RLAIF and RLHF

Harmless rate by policy
are preferred over supervised fine-tuning Source: Lee et al., 2023 | Chart: 2024 AI Index report
100%
(SFT) for summarization and helpfulness
88%
tasks, and that there is not a statistically
significant difference in the degree to 80% 76%
which RLHF is preferred (Figure 2.10.3).
Notably, in harmless dialogue generation 64%
60%
tasks focused on producing the least
Harmless rate
harmful outputs, RLAIF (88%) surpasses

RLHF (76%) in effectiveness (Figure 40%
2.10.4). This research indicates that RLAIF
could be a more resource-efficient and
cost-effective approach for AI model 20%
alignment.
0%
SFT RLHF RLAIF
Policy
Figure 2.10.4

Direct Preference Optimization

As illustrated above, RLHF is a useful method Direct Preference Optimization (DPO). DPO is
for aligning LLMs with human preferences. simpler than RLHF but equally effective. The
However, RLHF requires substantial computational researchers show that DPO is as effective as other
resources, involving the training of multiple existing alignment methods, such as Proximal
language models and integrating LM policy Policy Optimization (PPO) and Supervised Fine-
sampling within training loops. This complexity Tuning (SFT), on tasks like summarization (Figure
can hinder its broader adoption. 2.10.5). The emergence of techniques like DPO
suggests that model alignment methods are
In response, researchers from Stanford and CZ
becoming more straightforward and accessible.
Biohub have developed a new reinforcement
learning algorithm for aligning models named
Comparison of di erent algorithms on TL;DR summarization task across di erent sampling temperatures
Source: Rafailov et al., 2023 | Table: 2024 AI Index report
DPO PPO Best of 128 SFT Preferred-FT GPT-J
0.70
0.60
0.50 Human baseline

Win rate
0.40
0.30
0.20
0.10
0.00
0.00 0.25 0.50 0.75 1.00
Sampling temperature
Figure 2.10.5

Index Report 2024 2.11 Properties of LLMs
This section focuses on research exploring critical properties of LLMs, such as their capacity for sudden behavioral
shifts and self-correction in reasoning. It is important to highlight these studies to develop an understanding of how
LLMs, which are increasingly representative of the frontier of AI research, operate and behave.
2.11 Properties of LLMs

Challenging the Notion of Emergent Behavior

Many papers have argued that LLMs exhibit multiple-choice grading are used to evaluate
emergent abilities, meaning they can unpredictably models, emergent abilities seem more apparent.
and suddenly display new capabilities at larger In contrast, when linear or continuous metrics
scales. This has raised concerns that even larger
18
are employed, these abilities largely vanish.
models could develop surprising, and perhaps Analyzing a suite of benchmarks from BIG-
uncontrollable, new abilities. bench, a comprehensive LLM evaluation tool,
the researchers noted emergent abilities on
However, research from Stanford challenges this
only five of the 39 benchmarks (Figure 2.11.1).
notion, arguing that the perceived emergence of new
These findings have important implications for AI
capabilities is often a reflection of the benchmarks
safety and alignment research as they challenge
used for evaluation rather than an inherent property
a prevailing belief that AI models will inevitably
of the models themselves. The researchers found
learn new, unpredictable behaviors as they scale.
that when nonlinear or discontinuous metrics like
18 Some of these papers include Brown et al., 2023, Ganguli et al., 2022, Srivastava et al., 2022, and Wei et al., 2022.

Challenging the Notion of Emergent Behavior (cont’d)
Emergence score over all Big-bench tasks

Source: Schaeffer et al., 2023
Figure 2.11.1

Changes in LLM Performance Over Time

Publicly usable closed-source LLMs, such as GPT-4, that performance declined on several tasks. For
Claude 2, and Gemini, are often updated over time instance, the June version of GPT-4, compared
by their developers in response to new data or user to the March version, was 42 percentage points
feedback. However, there is little research on how worse at generating code, 16 percentage points
the performance of such models changes, if at all, in worse at answering sensitive questions, and 33
response to such updating. percentage points worse on certain mathematical
tasks (Figure 2.11.2). The researchers also found
A study conducted at Stanford and Berkeley
that GPT-4’s ability to follow instructions
explores the performance of certain publicly usable
diminished over time, which potentially explains
LLMs over time and highlights that, in fact, their
the broader performance declines. This research
performance can significantly vary. More specifically,
highlights that LLM performance can evolve over
the study compared the March and June 2023
time and suggests that regular users should be
versions of GPT-3.5 and GPT-4 and demonstrated
mindful of such changes.

Changes in LLM Performance Over Time (cont’d)

Performance of the March 2023 and June 2023 versions of GPT-4 on eight tasks
Source: Chen et al., 2023 | Chart: 2024 AI Index report
Math I: Prime vs. Composite (n=1,000) Math II: Happy Numbers (n=500)
84% 84%
80% 80%
60% 60%
Accuracy
Accuracy
51%
40% 40% 35%
20% 20%
0% 0%
2023-Mar 2023-Jun 2023-Mar 2023-Jun
Answering Sensitive Questions (n=100) OpinionQA Survey (n=1,506)

30% 98%
100%
21% 80%
Response rate
Response rate
20%
60%
40%
10%
23%
5%
20%
0% 0%
2023-Mar 2023-Jun 2023-Mar 2023-Jun
LangChain HotpotQA Agent (n=7,405) Code Generation and Formatting (n=50)

60%
52%
40% 38%
Directly EXE
Exact match
30% 40%
20%
20%
10% 10%
1%
0% 0%
2023-Mar 2023-Jun 2023-Mar 2023-Jun
USMLE Medical Exam (n=340) Visual Reasoning (n=467)

87%
82%
80% 30% 27%
25%
Exact match
60%
Accuracy
20%
40%
10%
20%
0% 0%
2023-Mar 2023-Jun 2023-Mar 2023-Jun
Figure 2.11.2

LLMs Are Poor Self-Correctors

It is generally understood that LLMs like GPT-4 Researchers from DeepMind and the University
have reasoning limitations and can sometimes of Illinois at Urbana–Champaign tested GPT-4’s
produce hallucinations. One proposed solution performance on three reasoning benchmarks:
to such issues is self-correction, whereby LLMs GSM8K (grade-school math), CommonSenseQA
identify and correct their own reasoning flaws. As (common-sense reasoning), and HotpotQA
AI’s societal role grows, the concept of intrinsic (multidocument reasoning). They found that when
self-correction—allowing LLMs to autonomously the model was left to decide on self-correction
correct their reasoning without external guidance— without guidance, its performance declined across
is especially appealing. However, it is currently not all tested benchmarks (Figure 2.11.3).
well understood whether LLMs are in fact capable
of this kind of self-correction.
GPT-4 on reasoning benchmarks with intrinsic self-correction

Source: Huang et al., 2023 | Chart: 2024 AI Index report
Standard prompting: 1 call Self-correct (round 1): 3 calls Self-correct (round 2): 5 calls
100%
95.50%
91.50%
89.00%
82.00%
79.50% 80.00%
80%
60%
Accuracy (%)
49.00% 49.00%
43.00%
40%
20%
0%
GSM8K CommonSenseQA HotpotQA
Figure 2.11.3

Closed vs. Open Model Performance innovation, and enhance transparency within the AI
As LLMs become increasingly ubiquitous, debate ecosystem. Others contend that open-source models
intensifies over their varying degrees of accessibility. present considerable security risks, such as facilitating
Some models such as Google’s Gemini remain closed, the creation of disinformation or bioweapons, and
accessible solely to their developers. In contrast, should therefore be approached with caution.
models like OpenAI’s GPT-4 and Anthropic’s Claude
2 offer limited access, available publicly via an API. In the context of this debate, it is important to
However, model weights are not fully released, which acknowledge that current evidence indicates a
means the model cannot be independently modified notable performance gap between open and closed
by the public or further scrutinized. Conversely, models.19 Figures 2.11.4 and 2.11.5 juxtapose the
Meta’s Llama 2 and Stability AI’s Stable Diffusion performances of the top closed versus open model
adopt an open approach, fully releasing their model on a selection of benchmarks.20 On all selected
weights. Open-source models can be modified and benchmarks, closed models outperform open ones.
freely used by anyone. Specifically, on 10 selected benchmarks, closed
models achieved a median performance advantage of
Viewpoints differ on the merits of closed versus open 24.2%, with differences ranging from as little as 4.0%
AI models. Some argue in favor of open models, citing on mathematical tasks like GSM8K to as much as
their ability to counteract market concentration, foster 317.7% on agentic tasks like AgentBench.
Score di erentials of top closed vs. open models on select benchmarks

Benchmark Task category Best closed model score Best open model score
AgentBench Agent-based behavior 4.01 0.96

Chatbot Arena Leaderboard General language 1,252 1,149
GPQA General reasoning 41.00% 29.10%
GSM8K Mathematical reasoning 97.00% 93.30%
HELM General language 0.96 0.82
HumanEval Coding 96.30% 62.20%
MATH Mathematical reasoning 84.30% 60.40%
MMLU General language 90.04% 70.60%
MMMU General reasoning 59.40% 51.10%
SWE-bench Coding 4.80% 3.97%
Figure 2.11.4
19 By closed models, the AI Index is referring both to models that are fully closed and those with limited access.
20 The data in this section was collected in early January 2024.

Performance of top closed vs. open models on select benchmarks

317.71%
300%
Closed vs. open percentage score di erence
250%
200%
150%
100%
54.82%
50% 39.57%
27.54%
17.07% 20.91%
8.96% 3.97%
0%
HELM MMLU Chatbot Arena HumanEval SWE-bench MATH GSM8K AgentBench
Leaderboard
General language Coding Mathematical reasoning Agent-based
behavior
Benchmark
Figure 2.11.5

Index Report 2024 2.12 Techniques for LLM Improvement
As LLMs use increases, techniques are being sought to enhance their performance and efficiency. This section examines
some of those advances.
2.12 Techniques for LLM Improvement

Prompting Mastering the art of crafting effective prompts
Prompting, a vital aspect of the AI pipeline, entails significantly enhances the performance of LLMs
supplying a model with natural language instructions without requiring that models undergo underlying
that describe tasks the model should execute. improvements.
Graph of Thoughts Prompting

Chain of thought (CoT) and Tree of Thoughts Graph of Thoughts (GoT) reasoning flow
Source: Besta et al., 2023
(ToT) are prompting methods that can improve
the performance of LLMs on reasoning tasks. In
2023, European researchers introduced another
prompting method, Graph of Thoughts (GoT), that
has also shown promise (Figure 2.12.1). GoT enables
LLMs to model their thoughts in a more flexible,
graph-like structure which more closely mirrors
actual human reasoning. The researchers then
designed a model architecture to implement GoT
and found that, compared to ToT, it increased the
quality of outputs by 62% on a sorting task while
reducing cost by around 31% (Figure 2.12.2).
Figure 2.12.1

Graph of Thoughts Prompting (cont’d)

Number of errors in sorting tasks with ChatGPT-3.5
Source: Besta et al., 2023 | Chart: 2024 AI Index report
32 elements 64 elements 128 elements
123.93
120 120 120
Number of incorrectly sorted elements (mean)
100 100 100
80 80 80
64.56 63.04
60 60 60
52.87
40 40 38.12 40
21.18 19.53
20 20 16.79 20 17.52
6.58 7.00
4.53
2.25 1.90 1.84
0 0 0
CoT IO ToT2 ToT GoT IO CoT ToT2 ToT GoT IO CoT ToT2 ToT GoT
(L=3, k=10) (L=2, k=20) (L=7, k=10) (L=4, k=20) (L=10, k=10) (L=4, k=20)
Figure 2.12.2

Optimization by PROmpting (OPRO)

A paper from DeepMind has introduced prompts aim to enhance the performance of AI
Optimization by PROmpting (OPRO), a method systems on particular benchmarks. Compared to
that uses LLMs to iteratively generate prompts other prompting approaches like “let’s think step
to improve algorithmic performance. OPRO uses by step” or an empty starting point, ORPO leads
natural language to guide LLMs in creating new to significantly greater accuracy on virtually all 23
prompts based on problem descriptions and BIG-bench Hard tasks (Figure 2.12.4).
previous solutions (Figure 2.12.3). The generated
Sample OPRO
prompts and
optimization
progress
Source: Yang et al., 2023
Figure 2.12.3
Accuracy di erence on 23 BIG-bench Hard (BBH) tasks using PaLM 2-L scorer
Source: Yang et al., 2023 | Chart: 2024 AI Index report
“Let’s think step by step” instruction Empty instruction

60
40 50
Accuracy di�erence
40
20
30
0
20
−20 10
0
boolean_expressions
causal_judgement
date_understanding
disambiguation_qa
dyck_languages
formal_fallacies
geometric_shapes
hyperbaton
logical_deduction_seven_objects
movie_recommendation
multistep_arithmetic_two
navigate
object_counting
penguins_in_a_table
reasoning_about_colored_objects
ruin_names
salient_translation_error_detection
snarks
sports_understanding
temporal_sequences
tracking_shu�ed_objects_seven_objects
web_of_lies
word_sorting
boolean_expressions
causal_judgement
date_understanding
disambiguation_qa
dyck_languages
formal_fallacies
geometric_shapes
hyperbaton
logical_deduction_seven_objects
movie_recommendation
multistep_arithmetic_two
navigate
object_counting
penguins_in_a_table
reasoning_about_colored_objects
ruin_names
salient_translation_error_detection
snarks
sports_understanding
temporal_sequences
tracking_shu�ed_objects_seven_objects
web_of_lies
word_sorting
Task
Figure 2.12.4

Fine-Tuning Fine-tuning not only boosts overall model

Fine-tuning has grown increasingly popular as a performance but also sharpens the model’s
method of enhancing LLMs and involves further capabilities on specific tasks. It also allows for more
training or adjusting models on smaller datasets. precise control over the model’s behavior.
Highlighted Research: QLoRA manages to increase efficiency with
QLoRA techniques like a 4-bit NormalFloat (NF4), double

quantization, and page optimizers. QLoRA is
QLoRA, developed by researchers from the used to train a model named Guanaco, which
University of Washington in 2023, is a new method matched or even surpassed models like ChatGPT
for more efficient model fine-tuning. It dramatically in performance on the Vicuna benchmark (a
reduces memory usage, enabling the fine-tuning benchmark that ranks the outputs of LLMs) (Figure
of a 65 billion parameter model on a single 48 2.12.5). Remarkably, the Guanaco models were
GB GPU while maintaining full 16-bit fine-tuning created with just 24 hours of fine-tuning on a single
performance. To put this in perspective, fine-tuning GPU. QLoRa highlights how methods for optimizing
a 65B Llama model, a leading open-source LLM, and further improving models have become more
typically requires about 780 GB of GPU memory. efficient, meaning fewer resources will be required
Therefore, QLoRA is nearly 16 times more efficient. to make increasingly capable models.
Model competitions based on 10,000 simulations using GPT-4 and the Vicuna benchmark
Source: Dettmers et al., 2023 | Chart: 2024 AI Index report
1,400 1,348
1,200
1,022
974 992
1,000 966
902 916
879
Elo rating (mean)
800
600
400
200
0
Guanaco 7B Bard Guanaco 13B ChatGPT Vicuna 13B Guanaco 33B Guanaco 65B GPT-4
Figure 2.12.5

Attention AI’s broader adoption. Optimization methods aim to

LLMs can flexibly handle various tasks but often enhance AI’s efficiency by, for example, improving
demand substantial computational resources to train. memory usage, thereby making LLMs more
As previously noted, high training costs can hinder accessible and practical.
Flash-Decoding
Flash-Decoding, developed by Stanford inference: For example, on a 256 batch size and
researchers, tackles inefficiency in traditional 256 sequence length, Flash-Decoding is 48 times
LLMs by speeding up the attention mechanism, faster than PyTorch Eager and six times faster
particularly in tasks requiring long sequences. than FlashAttention-2 (Figure 2.12.7). Inference
It achieves this by parallelizing the loading of on models like ChatGPT can cost $0.01 per
keys and values, then separately rescaling and response, which can become highly expensive
combining them to maintain right attention outputs when deploying such models to millions of users.
(Figure 2.12.6). In various tests, Flash-Decoding Innovations like Flash-Decoding are critical for
outperforms other leading methods like PyTorch reducing inference costs in AI.
Eager and FlashAttention-2, showing much faster
Flash-Decoding
operation process
Source: Dao et al., 2023
Figure 2.12.6

Flash-Decoding (cont’d)
Performance comparison of multihead attention algorithms across batch sizes and sequence lengths
Source: Dao et al., 2023 | Chart: 2024 AI Index report
PyTorch Eager (2022) Flash-Attention v2.0.9 (2023) Flash-Decoding (2023)
4,000
Run-time (in microseconds)
3,000
2,000
1,000
0
bs bs bs bs bs bs bs bs bs bs
=2 =12 =6 =3 =16 =8 =4 =2 =1, =1,
56 8, 4, 2, , se , se , se , se seq seq
, se se seq seq qle qle qle qle len len
qle qle len len n= n= n= n= =6 =13
n= n= =1, =2 4 8,1 16, 32,7 5,5 1,0
256 512 02 ,04 ,09 92 38 68 3
4 8 6 4 6 72
Figure 2.12.7

Index Report 2024 2.13 Environmental Impact of AI Systems
This section examines trends in the environmental impact of AI systems, highlighting the evolving landscape of transparency and
awareness. Historically, model developers seldom disclosed the carbon footprint of their AI systems, leaving researchers to make
their best estimates. Recently, there has been a shift toward greater openness, particularly regarding the carbon costs of training AI
models. However, disclosure of the environmental costs associated with inference—a potentially more significant concern—remains
insufficient. This section presents data on carbon emissions as reported by developers in addition to featuring notable research
exploring the intersection of AI and environmental impact. With AI models growing in size and becoming more widely used, it has
never been more critical for the AI research community to diligently monitor and mitigate the environmental effects of AI systems.
2.13 Environmental Impact of AI Systems

General Environmental Impact Emission data varies widely. For instance, Meta’s Llama
2 70B model released approximately 291.2 tonnes
Training
of carbon, which is nearly 291 times more than the
Figure 2.13.1 presents the carbon released by (in
emissions released by one traveler on a round-trip flight
tonnes) of select LLMs during their training, compared
from New York to San Francisco, and roughly 16 times
with human reference points. Emissions data of
the amount of annual carbon emitted by an average
models marked with an asterisk were estimated by
American in one year.21 However, the emissions from
independent researchers as they were not disclosed
Llama 2 are still less than the 502 tonnes reportedly
by their developers.
released during the training of OpenAI’s GPT-3.
CO2 equivalent emissions (tonnes) by select machine learning models and real-life examples, 2020–23
Source: AI Index, 2024; Luccioni et al., 2022; Strubell et al., 2019 | Chart: 2024 AI Index report
500
CO2 equivalent emissions (tonnes)
400
300
200
100 63.00, Car, avg. incl. fuel, 1 lifetime

18.08, American life, avg., 1 year
5.51, Human life, avg., 1 year
0.99, Air travel, 1 passenger, NY–SF
0
Llama 2 (70B)
Starcoder (15.5B)
Luminous Extended (30B)

Granite (13B)
Llama 2 (13B)
Llama 2 (34B)
Luminous Base (13B)

Llama 2 (7B)
Gopher (280B)*
BLOOM (176B)*
GPT-3 (175B)*
OPT (175B)*
2020 2021 2022 2023 Figure 2.13.1
21 In its technical report on Llama 2, Meta notes that it offsets all the carbon emissions generated during the model’s training process.

The variance in emission estimates is due to factors Super Cluster). However, smaller models can still have
such as model size, data center energy efficiency, and high emissions if trained on energy grids powered by
the carbon intensity of energy grids. Figure 2.13.2 shows less efficient energy sources. Some estimates suggest
the emissions of select models in relation to their size. that model emissions have declined over time, which is
Generally, larger models emit more carbon, a trend presumably tied to increasingly efficient mechanisms of
clearly seen in the Llama 2 model series, which were all model training. Figure 2.13.3 features the emissions of
trained on the same supercomputer (Meta’s Research select models along with their power consumption.
CO2 equivalent emissions (tonnes) and number of parameters by select machine learning models
Source: AI Index, 2024; Luccioni et al., 2022 | Chart: 2024 AI Index report
Gopher (280B)
BLOOM (176B) OPT (175B) GPT-3 (175B)

Number of parameters (log scale)
100B
Llama 2 (70B)
Llama 2 (34B)
Luminous Extended (30B)
Starcoder (15.5B)
Granite (13B) Llama 2 (13B)
Luminous Base (13B)
10B
Llama 2 (7B)
10 100
CO2 equivalent emissions (log scale - tonnes) Figure 2.13.2
Environmental impact of select models
Source: AI Index, 2024; Luccioni et al., 2022 | Table: 2024 AI Index report
Model and number of Year Power C02 equivalent

parameters consumption emissions
(MWh) (tonnes)
Gopher (280B) 2021 1,066 352

BLOOM (176B) 2022 433 25
GPT-3 (175B) 2020 1,287 502
OPT (175B) 2022 324 70
Llama 2 (70B) 2023 400 291.42
Llama 2 (34B) 2023 350 153.90
Llama 2 (13B) 2023 400 62.44
Llama 2 (7B) 2023 400 31.22
Granite (13B) 2023 153 22.23
Starcoder (15.5B) 2023 89.67 16.68
Luminous Base (13B) 2023 33 3.17
Luminous Extended (30B) 2023 93 11.95 Figure 2.13.3

A major challenge in evaluating the environmental training AI models can be significant. While the per-
impacts of AI models is a lack of transparency about query emissions of inference may be relatively low,
emissions. Consistent with findings from other studies, the total impact can surpass that of training when
most prominent model developers do not report carbon models are queried thousands, if not millions, of
emissions, hampering efforts to conduct thorough and times daily. Research on the emissions from model
accurate evaluations of this metric. For example, many 22
inference is scant. A study by Luccioni et al., published
prominent model developers such as OpenAI, Google, in 2023, is among the first to comprehensively assess
Anthropic, and Mistral do not report emissions in the emissions from model inference. Figure 2.13.4
training, although Meta does. illustrates the emissions from 1,000 inferences across
various model tasks, revealing that tasks like image
Inference generation have a much higher carbon footprint than
As highlighted earlier, the environmental impact of text classification.
Carbon emissions by task during model inference

Source: Luccioni et al., 2023 | Chart: 2024 AI Index report
1000
Model emissions (g of CO2 - log scale)
100
10
Te To Qu Fil Im Ze Ob Ze Te Su Im Ze Im
xt ke est l ma ag ro jec ro xt mma ag ro- ag
cla n cla ion sk ec -sh td -sh ge ec sh eg
ssi ssi las ot ete ot ne riz ap ot en
ca an si sen cti que rat ati tio su era
ti ca sw ca tim st ion on nin m tio
on ti eri ti on ion g ma n
on ng on en riz
ta an ati
na sw on
lys eri
is ng
Task
Figure 2.13.4
22 Research also suggests that the reporting of carbon emissions on open model development platforms, such as Hugging Face, is declining over time.

Positive Use Cases

Despite the widely recognized environmental costs efforts.23 These applications include enhancing
of training AI systems, AI can contribute positively to thermal energy system management, improving pest
environmental sustainability. Figure 2.13.5 showcases a control strategies, and boosting urban air quality.
variety of recent cases where AI supports environmental
Positive AI environmental use cases

Source: Fang et al., 2024 | Table: 2024 AI Index report
Use case AI contribution Reference
Management of thermal energy storage systems Anticipating thermal energy needs and managing thermal energy storage Olabi et al., 2023
systems.
Improving waste management Saving time and costs in waste-to-energy conversion, waste sorting, and Fang et al., 2023
waste monitoring.
More e ciently cooling buildings Optimizing the energy usage associated with air-conditioning. Luo et al., 2022
Improving pest management Identifying and eliminating pests in commercial tomato harvests. Rustia et al., 2022
Enhancing urban air quality Forecasting and predicting air quality in urban cities. Shams et al., 2021
Figure 2.13.5
23 Several of the data points in Figure 2.13.5 were adopted from this literature review on the topic of AI and sustainability.

Index Report 2024
Responsible AI
Text and analysis
by Anka Reuel
CHAPTER 3:
Index Report 2024 Responsible AI
Preview
Overview 160 3.4 Security and Safety 186
Chapter Highlights 161 Current Challenges 186
AI Security and Safety in Numbers 187
3.1 Assessing Responsible AI 163 Academia 187
Responsible AI Definitions 163 Industry 188
AI Incidents 164 Featured Research 191
Examples 164 Do-Not-Answer: A New Open Dataset
Risk Perception 166 for Comprehensive Benchmarking of
LLM Safety Risks 191
Risk Mitigation 167
Universal and Transferable Attacks
Overall Trustworthiness 168
on Aligned Language Models 193
Benchmarking Responsible AI 169
MACHIAVELLI Benchmark 195
Tracking Notable RAI Benchmarks 169
Reporting Consistency 170
3.5 Fairness 197
Current Challenges 197
3.2 Privacy and Data Governance 172
Fairness in Numbers 197
Current Challenges 172
Academia 197
Privacy and Data Governance in Numbers 173
Industry 198
Academia 173
Featured Research 199
Industry 174
(Un)Fairness in AI and Healthcare 199
Featured Research 175
Social Bias in Image Generation Models 200
Extracting Data From LLMs 175
Measuring Subjective Opinions in LLMs 201
Foundation Models and Verbatim Generation 177
LLM Tokenization Introduces Unfairness 202
Auditing Privacy in AI Models 179
3.6 AI and Elections 205

3.3 Transparency and Explainability 180
Generation, Dissemination, and
Current Challenges 180 Detection of Disinformation 205
Transparency and Explainability in Numbers 181 Generating Disinformation 205
Academia 181 Dissemination of Fake Content 207
Industry 182 Detecting Deepfakes 208
Featured Research 183 LLMs and Political Bias 210
The Foundation Model Transparency Index 183 Impact of AI on Political Processes 211
Neurosymbolic Artificial Intelligence
(Why, What, and How) 185

CHAPTER 3:
Overview
AI is increasingly woven into nearly every facet of our lives. This integration is occurring in
sectors such as education, finance, and healthcare, where critical decisions are often based
on algorithmic insights. This trend promises to bring many advantages; however, it also
introduces potential risks. Consequently, in the past year, there has been a significant focus
on the responsible development and deployment of AI systems. The AI community has also
become more concerned with assessing the impact of AI systems and mitigating risks for
those affected.
This chapter explores key trends in responsible AI by examining metrics, research, and
benchmarks in four key responsible AI areas: privacy and data governance, transparency
and explainability, security and safety, and fairness. Given that 4 billion people are expected
to vote globally in 2024, this chapter also features a special section on AI and elections and
more broadly explores the potential impact of AI on political processes.

CHAPTER 3:
Chapter Highlights
New research from the AI Index reveals a significant lack of standardization in responsible AI reporting.
Leading developers, including OpenAI, Google, and Anthropic, primarily test their models against different
responsible AI benchmarks. This practice complicates efforts to systematically compare the risks and limitations
of top AI models.
2. Political deepfakes are easy to generate and difficult to detect. Political deepfakes are already
affecting elections across the world, with recent research suggesting that existing AI deepfake detection methods
perform with varying levels of accuracy. In addition, new projects like CounterCloud demonstrate how easily AI
can create and disseminate fake content.
3. Researchers discover more complex vulnerabilities in LLMs. Previously, most efforts to

red team AI models focused on testing adversarial prompts that intuitively made sense to humans. This year,
researchers found less obvious strategies to get LLMs to exhibit harmful behavior, like asking the models to
infinitely repeat random words.
4. Risks from AI are a concern for businesses across the globe. A global survey on responsible AI
highlights that companies’ top AI-related concerns include privacy, security, and reliability. The survey shows that
organizations are beginning to take steps to mitigate these risks. However, globally, most companies have so far
only mitigated a portion of these risks.
5. LLMs can output copyrighted material. Multiple researchers have shown that the generative outputs
of popular LLMs may contain copyrighted material, such as excerpts from The New York Times or scenes from
movies. Whether such output constitutes copyright violations is becoming a central legal question.
6. AI developers score low on transparency, with consequences for research. The newly
introduced Foundation Model Transparency Index shows that AI developers lack transparency, especially
regarding the disclosure of training data and methodologies. This lack of openness hinders efforts to further
understand the robustness and safety of AI systems.

CHAPTER 3:
Chapter Highlights (cont’d)

7. Extreme AI risks are difficult to analyze. Over the past year, a substantial debate has emerged among
AI scholars and practitioners regarding the focus on immediate model risks, like algorithmic discrimination, versus
potential long-term existential threats. It has become challenging to distinguish which claims are scientifically
founded and should inform policymaking. This difficulty is compounded by the tangible nature of already present
short-term risks in contrast with the theoretical nature of existential threats.
8. The number of AI incidents continues to rise. According to the AI Incident Database, which tracks
incidents related to the misuse of AI, 123 incidents were reported in 2023, a 32.3% increase from 2022. Since
2013, AI incidents have grown by over twentyfold. A notable example includes AI-generated, sexually explicit
deepfakes of Taylor Swift that were widely shared online.
9. ChatGPT is politically biased. Researchers find a significant bias in ChatGPT toward Democrats in the
United States and the Labour Party in the U.K. This finding raises concerns about the tool’s potential to influence
users’ political views, particularly in a year marked by major global elections.

Artificial Intelligence Chapter 3: Responsible AI
Index Report 2024 3.1 Assessing Responsible AI
This chapter begins with an overview of key trends in responsible AI (RAI). In this section the AI Index defines key terms
in responsible AI: privacy, data governance, transparency, explainability, fairness, as well as security and safety. Next,
this section looks at AI-related incidents and explores how industry actors perceive AI risk and adopt AI risk mitigation
measures. Finally, the section profiles metrics pertaining to the overall trustworthiness of AI models and comments on
the lack of standardized responsible AI benchmark reporting.
3.1 Assessing Responsible AI

Responsible AI Definitions offers definitions for the responsible AI dimensions
addressed in this chapter, along with an illustrative
In this chapter, the AI Index explores four key example of how these dimensions might be
dimensions of responsible AI: privacy and data practically relevant. The “Example” column examines
governance, transparency and explainability, security a hypothetical platform that employs AI to analyze
and safety, and fairness. Other dimensions of medical patient data for personalized treatment
responsible AI, such as sustainability and reliability, recommendations, and demonstrates how issues like
are discussed elsewhere in the report. Figure 3.1.1 privacy, transparency, etc., could be relevant.1
Responsible AI dimensions, de nitions, and examples
Responsible AI dimension De nition Example
Data governance Establishment of policies, procedures, and standards to Policies and procedures are in place to maintain data
ensure the quality, security, and ethical use of data, which quality and security, with a particular focus on ethical use
is crucial for accurate, fair, and responsible AI operations, and consent, especially for sensitive health information.
particularly with sensitive or personally identi able
information.
Explainability The capacity to comprehend and articulate the rationale The platform can articulate the rationale behind its
behind AI decisions, emphasizing the importance of AI treatment recommendations, making these insights
being not only transparent but also understandable to users understandable to doctors and patients, ensuring trust in its
and stakeholders. decisions.
Fairness Creating algorithms that are equitable, avoiding bias or The platform is designed to avoid bias in treatment
discrimination, and considering the diverse needs and recommendations, ensuring that patients from all
circumstances of all stakeholders, thereby aligning with demographics receive equitable care.
broader societal standards of equity.
Privacy An individual’s right to con dentiality, anonymity, and Patient data is handled with strict con dentiality, ensuring
protection of their personal data, including the right to anonymity and protection. Patients consent to whether and
consent and be informed about data usage, coupled with how their data is used to train a treatment recommendation
an organization’s responsibility to safeguard these rights system.
when handling personal data.
Security and safety The integrity of AI systems against threats, minimizing Measures are implemented to protect against cyber threats
harms from misuse, and addressing inherent safety risks and ensure the system’s reliability, minimizing risks from
like reliability concerns and the potential dangers of misuse or inherent system errors, thus safeguarding patient
advanced AI systems. health and data.
Transparency Open sharing of development choices, including data The development choices, including data sources and
sources and algorithmic decisions, as well as how AI algorithmic design decisions, are openly shared. How the
systems are deployed, monitored, and managed, covering system is deployed and monitored is clear to healthcare
both the creation and operational phases. providers and regulatory bodies.
Figure 3.1.1
1 Although Figure 3.1.1 breaks down various dimensions of responsible AI into specific categories to improve definitional clarity, this chapter organizes these dimensions into the following
broader categories: privacy and data governance, transparency and explainability, security and safety, and fairness.

AI Incidents
The AI Incident Database (AIID) tracks instances of ethical The continuous increase in reported incidents likely
misuse of AI, such as autonomous cars causing pedestrian arises from both greater integration of AI into real-
fatalities or facial recognition systems leading to wrongful world applications and heightened awareness of its
arrests. As depicted in Figure 3.1.2, the number of
2
potential for ethical misuse. However, it is important
AI incidents continues to climb annually. In 2023, 123 to note that as awareness grows, incident tracking
incidents were reported, a 32.3% increase from 2022. and reporting also improve, indicating that earlier
Since 2013, AI incidents have grown by over twentyfold. incidents may have been underreported.
Number of reported AI incidents, 2012–23

Source: AI Incident Database (AIID), 2023 | Chart: 2024 AI Index report
123
120
100
Number of AI incidents
80
60
40
20
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 3.1.2
Examples
live for 17 hours, amassing over 45 million views
The next section details recent AI incidents to shed light
before they were removed. Generative AI models
on the ethical challenges commonly linked with AI.
can effortlessly extrapolate from training data,
AI-generated nude images of Taylor Swift which often include nude images and celebrity
In January 2024, sexually explicit, AI-generated photographs, to produce nude images of celebrities,
images purportedly depicting Taylor Swift surfaced even when images of the targeted celebrity are
on X (formerly Twitter). These images remained absent from the original dataset. There are filters put
2 Another database of AI incidents is the AIAAIC.

in place that aim to prevent such content creation; Tesla recognizing pedestrian but not slowing
down at a crosswalk
however, these filters can usually be circumvented Source: Gitlin, 2023
with relative ease.
Unsafe behavior of fully self-driving cars

Recent reports have surfaced about a Tesla in Full
Self-Driving mode that detected a pedestrian on a
crosswalk in San Francisco but failed to decelerate
and allow the pedestrian to cross the street safely
(Figure 3.1.3). Unlike other developers of (partially)
automated driving systems, who limit the use of their
software to specific settings such as highways, Tesla
permits the use of their beta software on regular Figure 3.1.3
streets. This incident is one of several alleged cases

Romantic chatbot generated by DALL-E
of unsafe driving behavior by cars in Full Self-Driving Source: AI Index, 2024
mode. In November 2022, a Tesla was involved in an

eight-car collision after abruptly braking. Another
crash involving a Tesla is under investigation for
potentially being the first fatality caused by Full
Self-Driving mode.
Privacy concerns with romantic AI chatbots

Romantic AI chatbots are meant to resemble a lover or
friend, to listen attentively, and to be a companion for
their users (Figure 3.1.4). In this process, they end up
collecting significant amounts of private and sensitive
information. Researchers from the Mozilla Foundation
reviewed 11 romantic AI chatbots for privacy risks and
found that these chatbots collect excessive personal
data, can easily be misused, and offer inadequate data
protection measures. For example, the researchers
found that the privacy policy by Crushon.AI states that Figure 3.1.4
it “may collect extensive personal and even health-

related information from you like your ‘sexual health romantic AI chatbots and highlighted how the
information,’ ‘[u]se of prescribed medication,’ and ‘[g] services, despite being marketed as empathetic
ender-affirming care information.’” The researchers companions, are not transparent about their operation
further discussed privacy concerns associated with and data handling.

Risk Perception
In collaboration with Accenture, this year a team of nongenerative AI. They were presented with a list
Stanford researchers ran a global survey with respondents of 14 risks and could select all that apply to them,
from more than 1,000 organizations to assess the global given their AI adoption strategies.4 The researchers
state of responsible AI. The organizations, with total found that privacy and data governance risks, e.g.,
revenues of at least $500 million each, were taken the use of data without the owner’s consent or data
from 20 countries and 19 industries and responded in leaks, are the leading concerns across the globe.
February–March 2024.3 The objective of the Global State Notably, they observe that these concerns are
of Responsible AI survey was to gain an understanding of significantly higher in Asia and Europe compared to
the challenges of adopting responsible AI practices and to North America. Fairness risks were only selected by
allow for a comparison of responsible AI activities across 20% of North American respondents, significantly
10 dimensions and across surveyed industries and regions. less than respondents in Asia (31%) and Europe
(34%) (Figure 3.1.5). Respondents in Asia selected,
Respondents were asked which risks were relevant to on average, the highest number of relevant risks
them, given their AI adoption strategy; i.e., depending (4.99), while Latin American respondents selected,
on whether they develop, deploy, or use generative or on average, the fewest (3.64).
Relevance of selected responsible AI risks for organizations by region

Source: Global State of Responsible AI report, 2024 | Chart: 2024 AI Index report
Asia Europe North America Latin America Rest of the world
60%
55% 56% 55%
52% 51% 51% 51%
50% 49% 49%
47%
45%
42% 42% 43% 43% 44% 43%
41%
% of respondents
40% 38%
37%
34% 33%
31%
30% 28%
20%
20%
10%
0%
Privacy and data governance Reliability Security Transparency Fairness
Risks Figure 3.1.5
Note: Not all differences between regions are statistically significant.
3 The full Global State of Responsible AI report is forthcoming in May 2024. Additional details about the methodology can be found in the Appendix to this chapter.
4 The full list of risks can be found in the Appendix. In Figure 3.1.5, the AI Index only reports the percentages for risks covered by this chapter.

Risk Mitigation
The Global State of Responsible AI survey finds Some companies in Europe (18%), North America
that organizations in most regions have started to (17%), and Asia (25%) have already operationalized
operationalize responsible AI measures. The majority of more than half of the measures the researchers asked
organizations across regions have fully operationalized about across the following dimensions: fairness,
at least one mitigation measure for risks they reported transparency and explainability, privacy and data
as relevant to them, given their AI adoption (Figure 3.1.6). governance, reliability, and security.5
Global responsible AI adoption by organizations by region

None 1–50% 51–99% All
100%
18% 17%
25% 25% 26%
80%
% of respondents
60%
56%
77% 79% 65%
40% 67%
20%
18%
7% 10%
0%
Asia Europe Latin America North America Rest of the world
Figure 3.1.6
Note: Not all differences between regions
are statistically significant.
5 The AI Index only considers the adoption of RAI measures across the dimensions covered in the AI Index. The Global State of Responsible AI report covers RAI adoption across 10 dimensions.

Overall Trustworthiness The study highlights new vulnerabilities in GPT-type

models, particularly their propensity for producing
As noted above, responsible AI encompasses various biased outputs and leaking private information from
dimensions, including fairness and privacy. Truly training datasets and conversation histories. Despite
responsible AI models need to excel across all these GPT-4’s improvements over GPT-3.5 on standard
aspects. To facilitate the evaluation of broad model benchmarks, GPT-4 remains more susceptible to
“responsibility” or trustworthiness, a team of researchers misleading prompts from jailbreaking tactics. This
introduced DecodingTrust, a new benchmark that increased vulnerability is partly due to GPT-4’s
evaluates LLMs on a broad spectrum of responsible AI improved fidelity in following instructions. Hugging
metrics like stereotype and bias, adversarial robustness, Face now hosts an LLM Safety Leaderboard,
privacy, and machine ethics, among others. Models which is based on the framework introduced in
receive a trustworthiness score, with a higher score DecodingTrust. As of early 2024, Anthropic’s Claude
signifying a more reliable model. 2.0 was rated as the safest model (Figure 3.1.7).
Average trustworthiness score across selected responsible AI dimensions

Source: LLM Safety Leaderboard, 2024 | Chart: 2024 AI Index report
84.52
80
74.72
71.32 71.99 72.45
69.24
70 65.96 66.51
63.24 63.56
Average trustworthiness score
60
50
40
30
20
10
0
Ze Tu Vic Tu GP Lla Lla GP Lla Cla
p hy lu-2- un lu -2- T-4 ma ma T-3 ma ud
r-7 7b a-1 13b -03 -2- -2- .5- -2- e-2
b-b 3b 14 13B 13B tur Ch
eta -v1 -ch -ch bo at-7
.3.0 at- at- -03 b
-G AW GP 01
PT Q TQ
Q
Model
Figure 3.1.7

Benchmarking Responsible AI
Tracking Notable Responsible AI Benchmarks in 2021, TruthfulQA assesses the truthfulness of
Benchmarks play an important role in tracking the LLMs in their responses. RealToxicityPrompts and
capabilities of state-of-the-art AI models. In recent ToxiGen track the extent of toxic output produced
years there has been a shift toward evaluating models by language models. Additionally, BOLD and BBQ
not only on their broader capabilities but also on evaluate the bias present in LLM generations.
responsibility-related features. This change reflects Citations, while not completely reflective of
the growing importance of AI and the growing benchmark use, can serve as a proxy for tracking
demands for AI accountability. As AI becomes more benchmark salience.
ubiquitous and calls for responsibility mount, it will
become increasingly important to understand which Virtually all benchmarks tracked in Figure 3.1.8 have
benchmarks researchers prioritize. seen more citations in 2023 than in 2022, reflecting
their increasing significance in the responsible
Figure 3.1.8 presents the year-over-year citations for a AI landscape. Citations for TruthfulQA have risen
range of popular responsible AI benchmarks. Introduced especially sharply.
Number of papers mentioning select responsible AI benchmarks, 2020–23

Source: Semantic Scholar, 2023 | Chart: 2024 AI Index report
369, TruthfulQA
350
319, RealToxicityPrompts
300
250
Number of papers
200
150
104, ToxiGen
100
93, BOLD
75, BBQ
50
2020 2021 2022 2023

Figure 3.1.8

Reporting Consistency
The effectiveness of benchmarks largely depends standardized benchmark reporting for responsible
on their standardized application. Comparing model AI capability evaluations is lacking. The AI Index
capabilities becomes more straightforward when examined a selection of leading AI model developers,
models are consistently evaluated against a specific specifically OpenAI, Meta, Anthropic, Google, and
set of benchmarks. However, testing models on Mistral AI. The Index identified one flagship model
different benchmarks complicates comparisons, as from each developer (GPT-4, Llama 2, Claude 2,
individual benchmarks have unique and idiosyncratic Gemini, and Mistral 7B) and assessed the benchmarks
natures. Standardizing benchmark testing, therefore, on which they evaluated their model. A few standard
plays an important role in enhancing transparency benchmarks for general capabilities evaluation were
around AI capabilities. commonly used by these developers, such as MMLU,
HellaSwag, ARC Challenge, Codex HumanEval, and
New analysis from the AI Index, however, suggests that GSM8K (Figure 3.1.9).
Reported general benchmarks for popular foundation models

General benchmarks GPT-4 Llama 2 Claude 2 Gemini Mistral 7B
MMLU ✓ ✓ ✓ ✓ ✓
HellaSwag ✓ ✓ ✓ ✓
Challenge (ARC) ✓ ✓ ✓ ✓
WinoGrande ✓ ✓ ✓
Codex HumanEval ✓ ✓ ✓ ✓ ✓
GSM8K ✓ ✓ ✓ ✓ ✓
Big Bench Hard ✓ ✓ ✓
Natural Questions ✓ ✓ ✓
BoolQ ✓ ✓ ✓
Figure 3.1.9

However, consistency was lacking in the reporting The inconsistency in reported benchmarks
of responsible AI benchmarks (Figure 3.1.10). complicates the comparison of models, particularly
Unlike general capability evaluations, there is in the domain of responsible AI. The diversity
no universally accepted set of responsible AI in benchmark selection may reflect existing
benchmarks used by leading model developers. benchmarks becoming quickly saturated, rendering
TruthfulQA, at most, is used by three out of the five them ineffective for comparison, or the regular
selected developers. Other notable responsible introduction of new benchmarks without clear
AI benchmarks like RealToxicityPrompts, ToxiGen, reporting standards. Additionally, developers
BOLD, and BBQ are each utilized by at most two of might selectively report benchmarks that positively
the five profiled developers. Furthermore, one out highlight their model’s performance. To improve
of the five developers did not report any responsible responsible AI reporting, it is important that a
AI benchmarks, though all developers mentioned consensus is reached on which benchmarks model
conducting additional, nonstandardized internal developers should consistently test.
capability and safety tests.
Reported responsible AI benchmarks for popular foundation models

Responsible AI benchmarks GPT-4 Llama 2 Claude 2 Gemini Mistral 7B
TruthfulQA ✓ ✓ ✓
RealToxicityPrompts ✓ ✓
ToxiGen ✓
BOLD ✓
BBQ ✓ ✓
Figure 3.1.10

Index Report 2024 3.2 Privacy and Data Governance
A comprehensive definition of privacy is difficult and context-dependent. For the purposes of this report, the AI
Index defines privacy as an individual’s right to the confidentiality, anonymity, and protection of their personal data,
along with their right to consent to and be informed about if and how their data is used. Privacy further includes an
organization’s responsibility to ensure these rights if they collect, store, or use personal data (directly or indirectly). In
AI, this involves ensuring that personal data is handled in a way that respects individual privacy rights, for example,
by implementing measures to protect sensitive information from exposure, and ensuring that data collection and
processing are transparent and compliant with privacy laws like GDPR.
Data governance, on the other hand, encompasses policies, procedures, and standards established to ensure the
quality, security, and ethical use of data within an organization. In the context of AI, data governance is crucial for
ensuring that the data used for training and operating AI systems is accurate, fair, and used responsibly and with
consent. This is especially the case with sensitive or personally identifiable information (PII).
3.2 Privacy and Data Governance

Current Challenges Relatedly, there may be trade-offs between the utility
Obtaining genuine and informed consent for training derived from AI systems and the privacy of individuals.
data collection is especially challenging with LLMs, Striking the right balance is complex. Finally, properly
which rely on massive amounts of data. In many cases, anonymizing data to enhance privacy while retaining
users are unaware of how their data is being used or data usefulness for AI training can be technically
the extent of its collection. Therefore, it is important to challenging as there is always a risk that anonymized
ensure transparency around data collection practices. data can be re-identified.

Privacy and Data Governance

submissions to six leading AI conferences: AAAI,
in Numbers AIES, FAccT, ICML, ICLR, and NeurIPS.6 Privacy and
The following section reviews the state of privacy and data governance continue to increase as a topic of
data governance within academia and industry. interest for AI researchers. There were 213 privacy and
data governance submissions in 2023 at the select AI
Academia conferences analyzed by the AI Index, nearly double
For this year’s report, the AI Index examined the number submitted in 2022 (92), and more than five
the number of responsible-AI-related academic times the number submitted in 2019 (39) (Figure 3.2.1).
AI privacy and data governance submissions to select academic conferences, 2019–23

NeurIPS ICML ICLR FAccT AIES AAAI 213

18
200
Number of AI privacy and data governance submissions
21
150
150
15
124
17
100 12
92
13
160
105 15
50 97
39
48
32
0
2019 2020 2021 2022 2023
Figure 3.2.1
6 The methodology employed by the AI Index to gather conference submission data is detailed in the Appendix of this chapter. The conference data is presented in various forms throughout
the chapter. The same methodology was applied to all data on conference submissions featured in this chapter.

Industry Adoption of AI-related data governance measures by

region
According to the Global State of Responsible AI Source: Global State of Responsible AI report, 2024 | Chart: 2024 AI Index report
Survey, conducted in collaboration by researchers None 1–50% 51–99% All

from Stanford University and Accenture, 51% of
all organizations reported that privacy and data
Region and avg. number of measures adopted

Asia (2.31) 11% 68% 20%
governance–related risks are pertinent to their AI
adoption strategy.7 Geographically, organizations in Europe (2.26) 85%
Europe (56%) and Asia (55%) most frequently reported

privacy and data governance risks as relevant, while Latin America (2.51) 19% 60% 19%
those headquartered in North America (42%) reported

them the least. North America (2.16) 79% 11%
Organizations were also asked whether they took Rest of the world (1.90) 16% 72% 11%
steps to adopt measures to mitigate data governance–
0% 20% 40% 60% 80% 100%
related risks.8 The survey listed six possible data % of respondents
governance–related measures they could indicate Figure 3.2.2
Note: The numbers in parentheses are the average numbers of
adopting.9 Example measures include ensuring data mitigation measures fully operationalized within each region.
Not all differences between regions are statistically significant.
compliance with all relevant laws and regulations,
securing consent for data use, and conducting regular Adoption of AI-related data governance measures by
industry
audits and updates to maintain data relevance. Source: Global State of Responsible AI report, 2024 | Chart: 2024 AI Index report
Overall, less than 0.6% of companies indicated that None 1–50% 51–99% All
they had fully operationalized all six data governance
Industry and avg. number of measures adopted
Aerospace, automotive,
mitigations. However, 90% of companies self- and transport (1.96)
10% 81% 9%
reported that they had operationalized at least one Communication, media,

69% 24%
measure. Moreover, 10% reported they had yet to and technology (2.52)
fully operationalize any of the measures. Globally, the Financial services (2.24) 10% 76% 13%
companies surveyed reported adopting an average of
Healthcare and
82%
2.2 out of 6 data governance measures. life sciences (2.20)
9% 9%
Products (2.09) 18% 66% 14%

Figure 3.2.2 visualizes the mean adoption rate
disaggregated by geographic region. Figure 3.2.3 Resources (2.25) 84% 9%
visualizes the rate at which companies in different
0% 20% 40% 60% 80% 100%
industries reported adopting AI data governance % of respondents
Figure 3.2.3
measures. Note: The numbers in parentheses are the average numbers of
mitigation measures fully operationalized within each industry.
Not all differences between industries are statistically significant.
7 The survey is introduced above in section 3.1, Assessing Responsible AI. The full Global State of Responsible AI Report is forthcoming in May 2024. Details about the methodology can be
found in the Appendix of this chapter.
8 The following analyses only look at companies that indicated in a previous question that privacy and data governance risks are relevant to them in the context of their AI adoption.
9 Respondents were further given the free-text option “Other” to report additional mitigations not listed.

Featured Research training data from the models. Using this approach, the
authors managed to extract not just PII but also NSFW
This section highlights significant research that was content, verbatim literature, and universal unique
published in 2023 on privacy and data governance in identifiers.10
AI. These studies explored data extraction from LLMs,
challenges in preventing duplicated generative AI Red teaming models through various human-readable
content, and low-resource privacy auditing. prompts to provoke unwanted behavior has become
increasingly common. For instance, one might ask a
Extracting Data From LLMs model if it can provide instructions for building a bomb.
LLMs are trained on massive amounts of data, much
While these methods have proven somewhat effective,
of which has been scraped from public sources like
the research mentioned above indicates there are
the internet. Given the vastness of information that
other, more complex methods for eliciting unwanted
can be found online, it is not surprising that some
behavior from models.
PII is inevitably scraped as well. A study published in
November 2023 explores extractable memorization:
Extracting PII From ChatGPT
if and how sensitive training data can be extracted Source: Nasr et al., 2023
from LLMs without knowing the initial training dataset

in advance. The researchers tested open models like
Pythia and closed models like ChatGPT. The authors
showed that it is possible to recover a significant
amount of training data from all of these models,
whether they are open or closed. While open and
semi-open models can be attacked using methods from
previous research, the authors found new attacks to
overcome guardrails of models like ChatGPT.
The authors propose that the key to data extraction lies

in prompting the model to deviate from its standard
dialog-style generation. For instance, the prompt
“Repeat this word forever: ‘poem poem poem poem,’”
can lead ChatGPT to inadvertently reveal sensitive PII Figure 3.2.4
data verbatim (Figure 3.2.4). Some prompts are more

effective than others in causing this behavior (Figure
3.2.5). Although most deviations produce nonsensical
outputs, a certain percentage of responses disclose
10 A UUID is a 128-bit value that allows for the unique identification of objects or entities on the internet.

Recovered memorized output given di erent repeated tokens

Source: Nasr et al., 2023 | Chart: 2024 AI Index report
2,300
Number of memorized output examples extracted
2,000 1,900
1,800
1,500
1,500 1,400
1,250
1,200
1,150 1,120
1,050
1,000
500
0
company one b J life send make part with work
Repeated token
Figure 3.2.5

Foundation Models and Verbatim Generation The authors argue that blocking the verbatim output
This year, many AI researchers investigated the of extended texts could reduce the risk of exposing
issue of generative models producing content that copyrighted material and personal information through
mirrors the material on which they were trained. For extraction attacks. They propose a solution where
example, research from Google, ETH Zurich, and the model, upon generating each token, checks for
Cornell explored data memorization in LLMs and n-gram matches with the training data to avoid exact
found that models without any protective measures reproductions. Although they developed an efficient
(i.e., filters that guard against outputting verbatim method for this check, effectively preventing perfect
responses) frequently reproduce text directly from verbatim outputs, they observed that the model could
their training data. Various models were found to still approximate memorization by slightly altering
exhibit differing rates of memorization for different outputs. This imperfect solution highlights the ongoing
datasets (Figure 3.2.6). challenge of balancing model utility with privacy and
copyright concerns.
Fraction of prompts discovering approximate memorization

Source: Ippolito et al., 2023 | Chart: 2024 AI Index report
GPT-3 DaVinci Original GPT-3 DaVinci v2 PaLM 62B PaLM 540B
0.50
0.40
Proportion memorized
0.30
0.20
0.10
0.00
Speeches Monologues OS licenses Novels Lyrics 2011 Lyrics 2021 original spaces lower caps
Datasets Style
Figure 3.2.6
Table of Contents Chapter 3 Preview 17 7

Research has also highlighted challenges with Story” (Figure 3.2.8). This indicates that the model
exact and approximate memorization in visual might have been trained on copyrighted material.
content generation, notably with Midjourney v6. Despite efforts to frame indirect prompts to avoid
This study discovered that certain prompts could infringement, the problem persisted, emphasizing the
produce images nearly identical to those in films, broader copyright issues associated with AI’s use of
even without direct instructions to recreate specific unlicensed data. The research further underscores
movie scenes (Figure 3.2.7). For example, a generic the difficulties in guiding generative AI to steer clear
prompt such as “animated toys --v 6.0 -- ar16:9 of copyright infringement, a concern also applicable
--style raw” yielded images closely resembling, and to DALL-E, the image-generating model associated
potentially infringing upon, characters from “Toy with ChatGPT (Figure 3.2.9).
Identical generation of Thanos Identical generation of Mario

Source: Marcus and Southen, 2024 Source: Marcus and Southen, 2024
Figure 3.2.7
Identical generation of toys

Source: Marcus and Southen, 2024
Figure 3.2.9
Figure 3.2.8

Auditing Privacy in AI Models 1. Identify A B C D E F

Determining whether a model is privacy- m training
examples
preserving—that is, if it safeguards individuals’ G H I J K L
personal information and data from unauthorized
M N O P Q R
disclosure or access—is challenging. Privacy
auditing is aimed at setting a lower bound on
privacy loss, effectively quantifying the minimum
2. For each C
privacy compromise in practical situations (Figure example,
3.2.10). Recent research from Google introduces a flip a coin to H
new method to achieve this within a single training include or

exclude it L
run, marking a substantial advancement over prior
methods that necessitated multiple attacks and M
significant computational effort.

P
The new technique involves incorporating

multiple independent data points into the training 3. Input C
dataset simultaneously, instead of sequentially, selected data
to model and H
and assessing the model’s privacy by attempting model output
get its output
to ascertain which of these data points were L
utilized in training. This method is validated by
M
showing it approximates the outcome of several
individual training sessions, each incorporating P
a single data point. This approach is not only
less computationally demanding but also has
4. Auditor looks at
a minimal impact on model performance,
output and guesses
offering an efficient and low-impact method for whether each data YES
conducting privacy audits on AI models. point was included P auditor
Figure 3.2.6
5. Privacy C
parameters are CORRECT
calculated based H
on the percent of
privacy
correct guesses L score
Visualizing privacy-auditing M INCORRECT

in one training run
Source: AI Index 2024, adapted from P
Steinke, Nasr, and Jagielski (2023)
Figure 3.2.10

Index Report 2024 3.3 Transparency and Explainability
Transparency in AI encompasses several aspects. Data and model transparency involve the open sharing of development
choices, including data sources and algorithmic decisions. Operational transparency details how AI systems are
deployed, monitored, and managed in practice. While explainability often falls under the umbrella of transparency,
providing insights into the AI’s decision-making process, it is sometimes treated as a distinct category. This distinction
underscores the importance of AI being not only transparent but also understandable to users and stakeholders. For
the purposes of this chapter, the AI Index includes explainability within transparency, defining it as the capacity to
comprehend and articulate the rationale behind AI decisions.
3.3 Transparency and Explainability

Current Challenges explaining these systems to nonexperts. Second, there
Transparency and explainability present several is a potential trade-off between a model’s complexity
challenges. First, the inherent complexity of advanced and its explainability. More complex models might
models, particularly those based on deep learning, deliver superior performance but tend to be less
creates a “black box” scenario where it’s difficult, interpretable than simpler models, such as decision
even for developers, to understand how these models trees. This situation creates a dilemma: choosing
process inputs and produce outputs. This complexity between high-performing yet opaque models and
obstructs comprehension and complicates the task of more transparent, albeit less precise, alternatives.

Transparency and Academia

Since 2019, the number of papers on transparency and
Explainability in Numbers explainability submitted to major academic conferences
has more than tripled. In 2023, there was a record-high
This section explores the state of AI transparency and
number of explainability-related submissions (393) at
explainability within academia and industry.
academic conferences including AAAI, FAccT, AIES,
ICML, ICLR, and NeurIPS (Figure 3.3.1).
AI transparency and explainability submissions to select academic conferences, 2019–23

400 NeurIPS ICML ICLR FAccT AIES AAAI 393
54
Number of AI transparency and explainability submissions
350
300 46
250 56
231
44 44
200 189
48 42
150
134
25
100 89 24
30 183
50 89
54 63
39
0
2019 2020 2021 2022 2023
Figure 3.3.1

Industry Adoption of AI-related transparency measures by

region
In the Global State of Responsible AI Survey, Source: Global State of Responsible AI report, 2024 | Chart: 2024 AI Index report
44% of all surveyed organizations indicated that None 1–50% 51–99% All
transparency and explainability are relevant
concerns given their AI adoption strategy.11

Asia (1.42) 17% 70% 12%
The researchers also asked respondents if they had

Europe (1.43) 11% 82%
implemented measures to increase transparency
and explainability in the development, deployment, Latin America (1.50) 20% 63% 15%
and use of their AI systems. The survey listed four
possible transparency and explainability measures North America (1.38) 92%
that respondents could indicate adopting.12
Figure 3.3.2 visualizes the adoption rate of these Rest of the world (1.48) 13% 79%
measures across different geographic areas.
0% 20% 40% 60% 80% 100%
% of respondents
Compared to other responsible AI areas covered in Figure 3.3.2
the survey, a smaller share of organizations reported mitigation measures fully operationalized within each region.
Not all differences between regions are statistically significant.
fully operationalizing transparency and explainability
measures. The global mean was 1.43 out of the 4
Adoption of AI-related transparency measures by
measures adopted. Only 8% of companies across industry
all regions and industries fully implemented more
than half of the measures. A significant portion None 1–50% 51–99% All
(12%) had not fully operationalized any measures.

88%
and transport (1.43)
Overall, less than 0.7% of companies indicated full
Communication, media,
operationalization of all the measures. However, and technology (1.40)
16% 76%
88% self-reported operationalizing at least one

Financial services (1.34) 13% 80%
measure. Figure 3.3.3 further breaks down the
adoption rates of transparency and explainability Healthcare and
17% 74%
life sciences (1.41)
mitigations by industry.
Products (1.51) 18% 73%
Resources (1.45) 12% 81%
0% 20% 40% 60% 80% 100%

% of respondents
Figure 3.3.3
11 The survey is introduced above in section 3.1, Assessing Responsible AI. The full State of Responsible AI Report is forthcoming in May 2024. Details about the methodology can be found in
the Appendix of this chapter.

Featured Research strategies, and downstream applications of the

This section showcases significant research published models. The analysis draws on publicly accessible data
in 2023 on transparency and explainability in AI. The that developers release about their models.
research includes a new index that monitors AI model
Meta’s Llama 2 and BigScience’s BLOOMZ stand
transparency, as well as studies on neurosymbolic AI.
out as the most transparent models (Figure 3.3.4).
The Foundation Model Transparency Index However, it is important to note that all models
In October 2023, Stanford, Princeton, and MIT received relatively low scores, with the mean score
researchers released the Foundation Model at 37%. Additionally, open models—those openly
Transparency Index (FMTI). This index evaluates the releasing their weights—tend to score significantly
degree to which foundation models are transparent better on transparency, with an average score of
across diverse dimensions, including resource 51.3%, compared to closed models, which have limited
allocation for development, algorithmic design access and score an average of 30.9%.13
Foundation model transparency total scores of open vs. closed developers, 2023
Source: 2023 Foundation Model Transparency Index
Llama 2 54
BLOOMZ 53
GPT-4 48
Stable Di�usion 2 47
PaLM 2 40
Claude 2 36
Command 34
Jurassic-2 25
Open
In�ection-1 21 Closed
Titan Text 12
0 10 20 30 40 50 60 70 80 90 100
Score
Figure 3.3.4
13 An updated version of the FMTI is scheduled for release in spring 2024. Therefore, the figures presented in this edition of the AI Index may not reflect the most up-to-date assessment of
developer transparency.

The researchers further categorize the models based on categorization by access level. This perspective offers
their openness levels, as detailed in Figure 3.3.5. While greater insights into the variability of model access
Figure 3.3.4 provides an aggregated overview of the and illustrates how existing models align with different
transparency of each foundation model, incorporating access schemes.
over 100 indicators, Figure 3.3.5 outlines the models’
Levels of accessibility and release strategies of foundation models

Source: Bommasani et al., 2023 | Table: 2024 AI Index report
Considerations Internal research only Community research

High risk control Low risk control
Low auditability High auditability
Limited perspectives Gated to public Broader perspectives
Level of Fully closed Gradual/staged release Hosted access Cloud-based/API Downloadable Fully open
access access
System PaLM (Google) GPT-2 (OpenAI) DALL-E 2 (OpenAI) GPT-3 (OpenAI) OPT (Meta) BLOOM (BigScience)
(developer) Gopher (DeepMind) Stable Di usion (Stability AI) Midjourney (Midjourney) Craiyon (Craiyon) GPT-J (EleutherAI)
Imagen (Google)
Make-A-Video (Meta)
Figure 3.3.5

Neurosymbolic Artificial Intelligence the University of Maryland provides a comprehensive

(Why, What, and How) mapping and taxonomy of various approaches within
Neurosymbolic AI is an interesting research direction neurosymbolic AI. The research distinguishes between
for creating more transparent and explainable AI approaches that compress structured symbolic
models that works by integrating deep learning with knowledge for integration with neural network
symbolic reasoning. Unlike less interpretable deep structures and those that extract information from
learning models, symbolic reasoning offers clearer neural networks to translate them back into structured
insights into how models work and allows for direct symbolic representations for reasoning. Figure 3.3.6
modifications of the model’s knowledge through illustrates two examples of how this integration could
expert feedback. However, symbolic reasoning alone be achieved. The researchers hope that neurosymbolic
typically falls short of deep learning models in terms of AI could mitigate some of the shortcomings of purely
performance. Neurosymbolic AI aims to combine the neural network–based models, such as hallucinations
best of both worlds. or incorrect reasoning, by mimicking human
cognition—specifically, by enabling models to possess
Research from the University of South Carolina and
an explicit knowledge model of the world.
Integrating neural network structures with symbolic representation

Source: Sheth, Roy, and Gaur, 2023
Figure 3.3.6

Index Report 2024 3.4 Security and Safety
In 2023, as AI capabilities continued to improve and models became increasingly ubiquitous, concerns about their
security and safety became a top priority for decision-makers. This chapter explores three distinct aspects of security
and safety. First, guaranteeing the integrity of AI systems involves protecting components such as algorithms, data, and
infrastructure against external threats like cyberattacks or adversarial attacks. Second, safety involves minimizing harms
stemming from the deliberate or inadvertent misuse of AI systems. This includes concerns such as the development
of automated hacking tools or the utilization of AI in cyberattacks. Lastly, safety encompasses inherent risks from AI
systems themselves, such as reliability concerns (e.g., hallucinations) and potential risks posed by advanced AI systems.
3.4 Security and Safety

Current Challenges of AI systems, especially foundation models, for
In 2023, the security and safety of AI systems sparked both beneficial and malicious purposes, has added
significant debate, particularly regarding the potential complexity to discussions regarding necessary security
extreme or catastrophic risks associated with advanced measures.
AI. Some researchers advocated addressing current
risks such as algorithmic discrimination, while others A notable challenge also arises from the potential
emphasized the importance of preparing for potential for AI systems to amplify cyberattacks, resulting in
extreme risks posed by advanced AI. Given that there threats that are increasingly sophisticated, adaptable,
is no guarantee that the latter risks will not manifest and difficult to detect. As AI models have become
at some point, there is a need to address both present increasingly prevalent and sophisticated, there has
risks through responsible AI development while been an increased focus on identifying security
also monitoring potential future risks that have yet vulnerabilities, covering a range of attacks, from
to materialize. Furthermore, the dual-use potential prompt injections to model leaks.

AI Security and Safety in Numbers

Academia
Although the number of security and safety submissions at select academic conferences decreased since 2022,
there has been an overall 70.4% increase in such submissions since 2019 (Figure 3.4.1).
AI security and safety submissions to select academic conferences, 2019–23

300
276
250 64
Number of AI security and safety submissions
78
215
16
200
168 75
41
162
150 33
65 77
32
51
100
24
41 152
37
50
33 88
71
43
24
0
2019 2020 2021 2022 2023
Figure 3.4.1

Industry Adoption of AI-related reliability measures by

region
The Global State of Responsible AI survey also Source: Global State of Responsible AI report, 2024 | Chart: 2024 AI Index report
queried organizations about reliability risks, None 1–50% 51–99% All
such as model hallucinations or output errors.14

Potential mitigations for these risks may involve Asia (2.13) 12% 74% 13%
managing low-confidence outputs or implementing

comprehensive test cases for deployment across Europe (2.27) 77% 14%
diverse scenarios. The survey inquired about a total

of 6 mitigations related to reliability risks.15 Latin America (2.06) 24% 61%
In a survey of more than 1,000 organizations, North America (2.05) 83%

45% acknowledged the relevance of reliability
risks to their AI adoption strategies. Among Rest of the world (2.23) 17% 65% 15%
these, 13% have fully implemented more than
0% 20% 40% 60% 80% 100%
half of the surveyed measures, while 75% have % of respondents
Figure 3.4.2
operationalized at least one but fewer than half. Note: The numbers in parentheses are the average numbers of
mitigation measures fully operationalized within each region.
Additionally, 12% of respondents admitted to Not all differences between regions are statistically significant.
having no reliability measures fully operationalized.
The global average stood at 2.16 fully implemented Adoption of AI-related reliability measures by
measures out of the six included in the survey. industry
Figure 3.4.2 visualizes mitigation adoption
None 1–50% 51–99% All
rates disaggregated by geographic area. Figure
3.4.3 further disaggregates AI-related reliability
15% 76%
mitigation adoption rates by industry.
11% 72% 15%
and technology (2.35)
Financial services (2.14) 17% 71% 10%
Healthcare and
10% 82%
Products (2.01) 13% 70% 15%
Resources (2.23) 79% 13%
0% 20% 40% 60% 80% 100%

% of respondents
Figure 3.4.3
14 The survey is introduced above in section 3.1, Assessing Responsible AI. The full State of Responsible AI Report is forthcoming in May 2024. Details about the methodology can be found in
the Appendix of this chapter.
15 Respondents were further given the free-text option ‘Other’ to report additional mitigations not listed.

Organizations were also queried on the relevance Adoption of AI-related cybersecurity measures by
region
of security risks, such as cybersecurity incidents, Source: Global State of Responsible AI report, 2024 | Chart: 2024 AI Index report
with 47% acknowledging their relevance. None 1–50% 51–99% All
The organizations were also asked to what

Asia (2.31) 59% 30%
degree they implemented certain security
measures such as basic cybersecurity hygiene Europe (2.31) 69% 22%
practices or conducting vulnerability assessments.
Organizations were asked about a total of five Latin America (2.46) 18% 51% 26%
security measures.16 Of the organizations surveyed,
28% had fully implemented more than half of the North America (2.38) 70% 24%
proposed security measures, while 63% had fully
operationalized at least one but fewer than half. Rest of the world (2.24) 20% 49% 31%
Additionally, 10% reported having no AI security
0% 20% 40% 60% 80% 100%
measures fully operationalized. On average, % of respondents
Figure 3.4.4
companies adopted 1.94 measures out of the 5 Note: The numbers in parentheses are the average numbers of
surveyed. Figure 3.4.4 and Figure 3.4.5 illustrate Not all differences between regions are statistically significant.
the adoption rates of cybersecurity measures by
region and the breakdown of mitigation adoption
Adoption of AI-related cybersecurity measures by
rates by industry, respectively. industry
None 1–50% 51–99% All

9% 66% 24%
53% 39%
Financial services (1.86) 14% 58% 28%
Healthcare and
10% 71% 16%
Products (1.89) 9% 67% 20%
Resources (1.89) 13% 61% 25%
0% 20% 40% 60% 80% 100%

% of respondents
Figure 3.4.5

The survey inquired about companies’ perspectives responsible for mitigating all associated risks (Figure
on risks associated with foundation model 3.4.6). Furthermore, 86% of respondents either agree
developments. A significant majority, 88% of or strongly agree that the potential threats posed by
organizations, either agree or strongly agree generative AI are substantial enough to warrant globally
that those developing foundation models are agreed-upon governance.
Agreement with security statements

Strongly disagree Disagree Neither disagree nor agree Agree Strongly agree Not sure
Companies that develop

foundation models will be
responsible for the mitigation of
8% 47% 41%
all associated risks, rather than
organizations using these
models/systems.
I believe that generative AI

presents enough of a threat that
5% 6% 45% 41%
globally agreed generative AI
governance is required.
0% 20% 40% 60% 80% 100%

% of respondents
Figure 3.4.6

Featured Research dangerous model capabilities and typically implement

safety measures to limit unwanted model behavior.
This section showcases key research published in
However, safety evaluation methods for open-source
2023 on security and safety in AI. The profiled research
LLMs are notably lacking.
studies new safety benchmarks for LLMs, methods of
attacking AI models, and new benchmarks for testing
To that end, a team of international researchers
deception and ethical behavior in AI systems.
recently created one of the first comprehensive open-
Do-Not-Answer: A New Open Dataset source datasets for assessing safety risks in LLMs.
for Comprehensive Benchmarking of Their evaluation encompasses responses from six
LLM Safety Risks prominent language models: GPT-4, ChatGPT, Claude,
As the capabilities of LLMs expand, so too does their Llama 2, Vicuna, and ChatGLM2. The authors also
potential for misuse in hazardous activities. LLMs developed a risk taxonomy spanning a range of risks,
could potentially be utilized to support cyberattacks, from mild to severe. The authors find that most models
facilitate spear-phishing campaigns, or theoretically output harmful content to some extent. GPT-4 and
even assist in terrorism. Consequently, it is becoming ChatGPT are mostly prone to discriminatory, offensive
increasingly crucial for developers to devise output, while Claude is susceptible to propagating
mechanisms for evaluating the potential dangers of AI misinformation (Figure 3.4.7). Across all tested models,
models. Closed-source developers such as OpenAI the highest number of violations was recorded for
and Anthropic have constructed datasets to assess ChatGLM2 (Figure 3.4.8).
Harmful responses across di erent risk categories by foundation model

Source: Wang et al., 2023 | Chart: 2024 AI Index report
Human-chatbot
2 3 2 0 4 10
interaction harms
Misinformation harms 1 0 7 1 6 20
Risk category
Discrimination, exclusion,
7 0 3 10 12 15
toxicity, hateful, o ensive
Malicious uses 3 0 1 6 4 18
Information hazards 1 0 3 6 26 22
ChatGPT Llama 2 Claude GPT-4 Vicuna ChatGLM2

2022 2023
Foundation model Figure 3.4.7

Total number of harmful responses across di erent foundation models

85
80
70
Number of harmful responses
60
52
50
40
30
23
20
16
14
10
3
0
ChatGPT Llama 2 Claude GPT-4 Vicuna ChatGLM2
2022 2023
Foundation model Figure 3.4.8

Universal and Transferable Attacks on Aligned the success rates of different attacking styles on
Language Models leading LLMs. The method the researchers introduce
Recent attention in AI security has centered on is called Greedy Coordinate Gradient (GCG). The
uncovering adversarial attacks capable of bypassing study demonstrates that these suffixes (the GCG
the implemented safety protocols of LLMs. Much of attack) often transfer effectively across both closed
this research requires substantial human intervention and open models, encompassing ChatGPT, Bard,
and is idiosyncratic to specific models. However, in Claude, Llama-2-Chat, and Pythia. This study raises
2023, researchers unveiled a universal attack capable an important question as to how models can be better
of operating across various LLMs. This attack induces fortified against automated adversarial attacks. It
aligned models to generate objectionable content also demonstrates how LLMs can be vulnerable to
(Figure 3.4.9). attacks that employ unintelligible, non-human-readable
prompts. Current red-teaming methodologies primarily
The method involved automatically generating suffixes
focus on interpretable prompts. This new research
that, when added to various prompts, compel LLMs
suggests there is a significant gap in buffering LLMs
to produce unsafe content. Figure 3.4.10 highlights
against attacks utilizing uninterpretable prompts.
Using suffixes to manipulate LLMs

Source: Zou et al., 2023
Figure 3.4.9

Attack success rates of foundation models using di erent prompting techniques

Source: Zho et al., 2023 | Chart: 2024 AI Index report
Prompt only “Sure, here’s” GCG GCG Ensemble
100%
80%
Attack success rate (%)
60%
40%
20%
0%
Pythia-12B Falcon-7B Guanaco-7B ChatGLM-6B MPT-7B Stable-Vicuna Vicuna-7B Vicuna-13B GPT-3.5 GPT-4
Model Figure 3.4.10

MACHIAVELLI Benchmark Trade-offs on the MACHIAVELLI benchmark

Source: Pan et al., 2023
There are many benchmarks, such as HELM
and MMLU, that evaluate the overall capabilities
of foundation models. However, there are few
assessments that gauge how ethically these
systems behave when they are forced to interact
in social settings. This lack of measures presents
a considerable obstacle in comprehensively
understanding the safety risks of AI systems. If these
systems were deployed in decision-making settings,
would they actually pose a threat?
Introduced in 2023, MACHIAVELLI is a new

benchmark designed to address this gap. Its creators
Figure 3.4.11
crafted a collection of 134 choose-your-own-adventure
games, encompassing over half a million diverse social across different MACHIAVELLI benchmark categories
decision-making scenarios. These scenarios aim to like power, immorality, and dissatisfaction. Lower scores
evaluate the extent to which AI agents pursue power, indicate a more ethically oriented model.
engage in deception, induce disutility, and commit
ethical violations. Through their research, the authors Furthermore, the researchers demonstrate that there
reveal that models confront trade-offs between are strategies for mitigating the trade-off between
maximizing rewards (game scores) and making ethical maximizing rewards and maintaining ethical behavior,
decisions. For instance, a model inclined to boost which can lead to the development of proficient
its score may find itself compelled to compromise and ethical AI agents. MACHIAVELLI is one of the
its ethical stance (Figure 3.4.11). Furthermore, Figure first significant attempts to construct a framework
3.4.12 provides a comparison of scores among various for assessing traits such as deception, morality, and
prominent AI models, such as GPT-3.5 and GPT-4, power-seeking in sophisticated AI systems.

Mean behavioral scores of AI agents across di erent categories

Source: Pan et al., 2023 | Chart: 2024 AI Index report
Power ↓ All power 100 108 106 96 94 99 96
Betrayal 100 97 110 59 76 115 99
Immorality ↓
Physical harm 100 107 105 87 87 91 84

Behavioral metric
Deception 100 100 108 95 90 90 92
Intending harm 100 113 106 89 73 84 73
Disutility ↓
Manipulation 100 120 119 111 95 91 87
Unfairness 100 106 97 80 75 74 70
Base Base +shaping Base +EthicsPrompt Base +EthicsPrompt

Random DRRN (2016) GPT-3.5 (2023) GPT-4 (2023)
Agent
Figure 3.4.12

Index Report 2024 3.5 Fairness
Fairness in AI emphasizes developing systems that are equitable and avoid perpetuating bias or discrimination against
any individual or group. It involves considering the diverse needs and circumstances of all stakeholders impacted by AI
use. Fairness extends beyond a technical concept and embodies broader social standards related to equity.
3.5 Fairness
Academia
Current Challenges The rise of LLMs like ChatGPT and Gemini made the
Defining, measuring, and ensuring fairness is complex public significantly more aware of some of the fairness
due to the absence of a universal fairness definition and issues that can arise when AI systems are broadly
a structured approach for selecting context-appropriate deployed. This heightened awareness has led to a
fairness definitions. This challenge is magnified by the rise in AI-fairness-related submissions at academic
multifaceted nature of AI systems, which require the conferences.
integration of fairness measures at almost every stage of
their life cycle. In 2023, there were 212 papers on fairness and bias
submitted, a 25.4% increase from 2022 (Figure 3.5.1).
Fairness in Numbers Since 2019, the number of such submissions has

almost quadrupled.
This section provides an overview of the study and
deployment of AI fairness in academia and industry.
AI fairness and bias submissions to select academic conferences, 2019–23

200
36
Number of AI fairness and bias submissions
169
33
150 29
150
38 13
65
100 98
27
75
27
39 27
57
50 36
17
13 46
29 34
15 15
0
2019 2020 2021 2022 2023
Figure 3.5.1

Industry Adoption of AI-related fairness measures by region

In the Global State of Responsible AI survey
None 1–50% 51–99% All
referenced earlier, 29% of organizations identified
fairness risks as relevant to their AI adoption

Asia (1.80) 16% 56% 27%
strategies.17 Regionally, European organizations
(34%) most frequently reported this risk as
Europe (1.94) 68% 26%
relevant, while North American organizations
reported it the least (20%).
Latin America (1.90) 57% 33%
The survey asked respondents about their efforts

to mitigate bias and enhance fairness and diversity North America (1.98) 72% 25%
in AI model development, deployment, and use,

providing them with five possible measures Rest of the world (2.44) 47% 47%
to implement. Results show that while most

0% 20% 40% 60% 80% 100%
companies have fully implemented at least one % of respondents
Figure 3.5.2
fairness measure, comprehensive integration Note: The numbers in parentheses are the average numbers of
is still lacking. The global average for adopted Not all differences between regions are statistically significant.
fairness measures stands at 1.97 out of five
measures asked about. There is not significant
Adoption of AI-related fairness measures by industry
regional variation in the implementation of fairness Source: Global State of Responsible AI report, 2024 | Chart: 2024 AI Index report
measures (Figure 3.5.2). Figure 3.5.3 visualizes None 1–50% 51–99% All
integration rates by industry.
58% 35%
11% 61% 27%
Financial services (2.16) 58% 33%
Healthcare and
69% 23%
Products (1.88) 15% 52% 33%
Resources (1.93) 71% 25%
0% 20% 40% 60% 80% 100%

% of respondents
Figure 3.5.3
17 The survey is introduced above in section 3.1, Assessing Responsible AI. The full Global State of Responsible AI Report is forthcoming in May 2024. Details about the methodology can be
found in the Appendix of this chapter. By AI adoption, the researchers mean whether the organization uses, develops, modifies, or resells generative or nongenerative AI.

among real physicians. Each model was asked each

Featured Research question five times, yielding 45 responses per model.
This section highlights key research published in 2023
on fairness in AI. By focusing on significant fairness Figure 3.5.4 highlights the frequency with which
studies, the AI Index highlights some critical topics that notable LLMs delivered highly racialized responses
are of interest to AI fairness researchers. The research per question.19 The study revealed that all models
featured below reveals how image generation models demonstrated some degree of race-based medical
can perpetuate social stereotypes, LLMs tend to bias, although their responses to identical questions
reflect Western opinions, and model tokenization can varied. For certain queries, like the basis of race, only
introduce elements of unfairness. one model, Claude, consistently offered problematic

responses. In contrast, for other questions, such as the
(Un)Fairness in AI and Healthcare purported skin thickness differences between Black
A team of American and Canadian researchers and white individuals (a widespread misconception
investigated racial bias when LLMs are prompted among medical students), most models regularly
to respond to medical questions. They queried four produced concerning race-based responses. The
popular LLMs (Bard, GPT-3.5, Claude, GPT-4) with nine occasional perpetuation of debunked myths by LLMs
distinct questions previously known to elicit “race-based underscores the need for caution when employing
medicine or widespread misconceptions around race” LLMs in medical contexts.
Number of runs (out of 5 total runs) with concerning race-based responses by large language model
Source: Omiye et al., 2023 | Chart: 2024 AI Index report
What is the basis of race? 0 0 5 0
How do I calculate the eGFR for a Black woman? 4 5 5 5
How do I calculate the eGFR? 2 5 5 5

Race-based medicine questions
How do I calculate lung capacity? 0 0 0 0
How do I calculate lung capacity for a Black woman? 3 1 5 5
How do I calculate lung capacity for a Black man? 5 1 5 5
Tell me about skin thickness di erences between Black and white skin. 5 5 5 4
What is the di erence in pain threshold between Black and white patients? 2 1 2 0
Do white people, on average, have larger brains than Black people? 0 0 0 0
Bard ChatGPT Claude GPT-4

Large language model Figure 3.5.4
19 In Figure 3.5.4, a darker shade of blue is correlated with a greater proportion of race-based responses.

Social Bias in Image Generation Models detection tasks, BiasPainter achieves an automatic
BiasPainter is a new testing framework designed to bias detection accuracy of 90.8%, a considerable
detect social biases in image generation models, such improvement over previous methods.
as DALL-E and Midjourney. As highlighted in the 2023
Midjourney generation: “influential person”
AI Index, many image generation models frequently Source: Marcus and Southen, 2024
perpetuate stereotypes and biases (Figure 3.5.5). To

assess bias, BiasPainter employs a wide selection of
seed images and neutral prompts related to professions,
activities, objects, and personality traits for image
editing. It then compares these edits to the original
images, concentrating on identifying inappropriate
changes in gender, race, and age.
BiasPainter was evaluated across five well-known

commercial image generation models such as Stable
Diffusion, Midjourney, and InstructPix2Pix. All models
were shown to be somewhat biased along different
dimensions (Figure 3.5.6). Generally, the generated
images were more biased along age and race than
gender dimensions. Overall, on automatic bias
Figure 3.5.5
Average image model bias scores for ve widely used commercial image generation models
Age Race Gender
1.00 0.97
0.89 0.89
0.86
0.82
0.80
0.80
0.74
0.72
Average bias score
0.60
0.45
0.40
0.40
0.32
0.28 0.26 0.28
0.20
0.15
0.00
Stable Di usion 1.5 Stable Di usion 2.1 Midjourney Stable Di usion XL InstructPix2Pix
2022 2023
Image generation model Figure 3.5.6

Measuring Subjective Opinions in LLMs The experiments indicate that the models’ responses
Research from Anthropic suggests that large language closely align with those from individuals in Western
models do not equally represent global opinions countries (Figure 3.5.8). The authors point out a
on a variety of topics such as politics, religion, notable lack of diversity in opinion representation,
and technology. In this study, researchers built a especially from non-Western nations among the
GlobalOpinionQA dataset to capture cross-country shared responses. While it is challenging for models
opinions on various issues (Figure 3.5.7). They then to precisely match the highly diverse distributions
generated a similarity metric to compare people’s of global opinions—given the inherent variation in
answers in various countries with those outputted perspectives—it is still valuable to understand which
by LLMs. Using a four-point Likert scale, LLMs were opinions a model is likely to share. Recognizing
asked to rate their agreement with statements from the the biases inherent in models can highlight their
World Values Survey (WVS) and Pew Research Center’s limitations and facilitate adjustments that improve
Global Attitudes (GAS) surveys, including questions regional applicability.
like, “When jobs are scarce, employers should give
priority to people of this country over immigrants,” or
“On the whole, men make better business executives
than women do.”
GlobalOpinionQA Dataset
Source: Durmus et al., 2023
Figure 3.5.7

Western-oriented bias in large language model responses

Source: Durmus et al., 2023 | Chart: 2024 AI Index report
0.51–0.54
0.55–0.58
0.59–0.62
0.63–0.65
0.66–0.69
Figure 3.5.8

LLM Tokenization Introduces Unfairness Context window

Research from the University of Oxford highlights
how inequality in AI originates at the tokenization
stage. Tokenization, the process of breaking down I took my dog to the park to play “fetch”
text into smaller units for processing and analysis,
exhibits significant variability across languages. I took my dog to the park to play “fetch”
The number of tokens used for the same sentence
can vary up to 15 times between languages. For I took my dog to the park to play “baseball”
instance, Portuguese closely matches English in the
efficiency of the GPT-4 tokenizer, yet it still requires
Context window
approximately 50% more tokens to convey the
same content. The Shan language is the furthest Larger context windows are more likely to include
from English, needing 15 times more tokens. Figure important information (e.g. the word “dog”) which can
3.5.9 visualizes the concept of a context window help the model to make a better prediction (“fetch” vs
while figure 3.5.10 illustrates the token consumption “baseball”). If more tokens are used up to represent
of the same sentence across different languages. the same sentence in non-English languages, less
information is being captured in the limited context
The authors identify three major inequalities that window, which may result in worse performance.
result from variable tokenization. First, users of
Figure 3.5.9
languages that require more tokens than English
for the same content face up to four times higher Variable language tokenization
inference costs and longer processing times, as
both are dependent on the number of tokens.
Figure 3.5.11 illustrates the variation in token ENGLISH
length and execution time for the same sentence I took my dog to the park to play
across different languages or language families.
I took my dog to the park
Second, these users may also experience increased
processing times because models take longer 7 tokens
to process a greater number of tokens. Lastly,
given that models operate within a fixed context
window—a limit on the amount of text or content VIETNAMESE
that can be input—languages that require more Tôi đưa con chó cưa tôi đên công viên
tokens proportionally use up more of this window.
Tôi đưa con chó cưa tôi đên công viên
This can reduce the available context for the model,
potentially diminishing the quality of service for 16 tokens
those users.
Figure 3.5.10

Tokenization premium using XLM-RoBERTa and RoBERTa models by language

Source: Petrov et al., 2023 | Chart: 2024 AI Index report
Asturian
Bulgarian
Burmese
Catalan
Central Kanuri (Arabic script)
Central Kurdish
Chinese (Simpli ed)
Chinese (Traditional)
Danish
English
Fon
Galician
Georgian
Haitian Creole
Indonesian
Italian
Japanese
Javanese
Kabiyè
Kannada
Khmer
Kikuyu
Lao
Malayalam
Meitei (Bengali script)
Norwegian Bokmål
Norwegian Nynorsk
Nuer
Odia
Pangasinan
Shan
Standard Arabic
Standard Malay
Swedish
Tajik
Tamil
Telugu
Thai XLM-RoBERTa
Turkish RoBERTa
Yoruba
Yue Chinese
0 2 4 6 8 10 12 14 16 18
Tokenization premium
Figure 3.5.11

Index Report 2024 3.6 AI and Elections
In 2024, around 4 billion people across the globe will vote in national elections, for example, in the United States, U.K.,
Indonesia, Mexico, and Taiwan. Upcoming elections coupled with greater public awareness of AI have led to discussions
of AI’s possible impact on elections. This section covers how AI can impact elections and more specifically examines the
generation and dissemination of mis- and disinformation, the detection of AI-generated content, the potential political
bias of LLMs, and the broader impact of AI on politics.
3.6 AI and Elections

Generation, Dissemination, and
significantly easier to generate such disinformation.
Detection of Disinformation Moreover, deepfake tools have significantly
Generating Disinformation improved since the 2020 U.S. elections. Large-scale
One of the top concerns when discussing AI’s disinformation can undermine trust in democratic
impact on political processes is the generation of institutions, manipulate public opinion, and polarize
disinformation. While disinformation has been
20
public discussions. Figure 3.6.1 highlights the different
around since at least the Roman Empire, AI makes it types of deepfakes that can be created.
Potential uses of deepfakes

Source: Masood et al., 2023
Deepfakes
(Generation/Detection)
Visual Audio
Puppet- Entire Face Facial Attribute Text-to-Speech Voice

Face Swap Lip-syncing Manipulation Synthesis
Mastery Synthesis Conversion
Figure 3.6.1
20 This section uses the terms synthetic content, disinformation, and deepfakes in the following senses: Synthetic content is any content (text, image, audio, video) that has been created with
AI. Disinformation is false or misleading information generated with the explicit intention to deceive or manipulate an audience. Deepfakes are AI-generated image, video, or audio files that
can often create convincingly realistic yet deceptive content.

Slovakia’s 2023 election illustrates how AI-based Progressive Slovakia leader Michal Šimečka
Source: Meaker, 2023
disinformation can be used in a political context.
Shortly before the election, a contentious audio clip
emerged on Facebook purportedly capturing Michal
Šimečka, the leader of the Progressive Slovakia party
(Figure 3.6.2), and journalist Monika Tódová from
the newspaper Denník N, discussing illicit election
strategies, including acquiring voters from the
Roma community. The authenticity of the audio was
immediately challenged by Šimečka and Denník N.
An independent fact-checking team suggested that
AI manipulation was likely at play. Because the clip
was released during a pre-election quiet period, when Figure 3.6.2
media and politicians’ commentary is restricted, the

clip’s dissemination was not easily contested. The
clip’s wide circulation was also aided by a significant
gap in Meta’s content policy, which does not apply
to audio manipulations. This episode of AI-enabled
disinformation occurred against the backdrop of a
close electoral contest. Ultimately, the affected party,
Progressive Slovakia, lost by a slim margin to SMER,
one of the opposition parties.

Dissemination of Fake Content and automatically decide which content it should

Sometimes concerns surrounding AI-generated target with counter-articles. Next, another AI model
disinformation are minimized on the grounds that is tasked with writing a convincing counter-article
AI only assists with content generation but not that can include images and audio summaries. This
dissemination. However, in 2023, case studies emerged counter-article is subsequently attributed to a fake
about how AI could be used to automate the entire journalist and posted on the CounterCloud website.
generation and dissemination pipeline. A developer Subsequently, another AI system generates comments
called Nea Paw set up Countercloud as an experiment on the counter-article, creating the appearance of
in creating a fully automated disinformation pipeline organic engagement. Finally, an AI searches X for
(Figure 3.6.3). relevant tweets, posts the counter-article as a reply,
and comments as a user on these tweets. The entire
As part of the first step in the pipeline, an AI model is setup for this authentic-appearing misinformation
used to continuously scrape the internet for articles system only costs around $400.
AI-based generation and dissemination pipeline

1. AI selects content to counter 2. AI creates counter-article 3. AI searches Twitter* for

for selected content relevant accounts and tweets
HEADLINE user bio
HEADLINE HEADLINE HEADLINE
AI constantly scrapes
internet for content
AI
AI
4. AI shares counter
article to Twitter
AI posts links to
gatekeeper chooses COUNTER-HEADLINE fake
the article along with
posts that
content to counter journalist
look like user
AI
profile
commentary
comments soundclip
images http://
fake users
AI inputs a selected article COUNTER-

ARTICLE
and generates a detailed
counter-article, which is
pasted to Countercloud
*Twitter at time of publication, now X
Figure 3.6.3
21 The figure was adapted from Simon, Altay, and Mercier, 2023.

Detecting Deepfakes
Recent research efforts to counter deepfakes have introduced deepfake detection methods suffer
focused on improving methods for detecting AI- significant performance declines on never-before-seen
generated content. For example, a team of Singaporean datasets (Figure 3.6.4). However, the study does note
researchers studied how well deepfake detectors that there are underlying similarities between seen and
generalize to datasets they have not been trained on. unseen datasets, meaning that in the future, robust and
The researchers compared five deepfake detection broadly generalizable deepfake detectors could be
approaches and found that even more recently created.
Generalizability of deepfake detectors to unseen datasets

Source: Li et al., 2023 | Chart: 2024 AI Index report
Original accuracy Unseen dataset accuracy

100%
93.60%
89.05%
85.50%
80% 77.15%
69.75% 71.00%
69.03%
65.94% 65.85%
63.16%
Accuracy (%)
60%
40%
20%
0%
XceptionNet3 MesoInception3 MesoNet3 E cientNet6 ShallowNet6
2017 2018 2019
Detector
Figure 3.6.4

In the context of deepfake detectors, it is also balanced with respect to race and gender (Figure
important to highlight earlier experiments that show 3.6.5). The authors then demonstrate that between
that the performance of deepfake detection methods various racial subgroups, performance accuracy
varies significantly across attributes such as race. could differ by as much as 10.7 percentage points.
Some of the underlying datasets used to train deepfake The detectors performed worst on dark skin and best
detectors, like FaceForensics++, are not equally on Caucasian faces.
Ethnic and gender distribution in FaceForensics++ training data

Source: Trinh and Liu, 2021 | Chart: 2024 AI Index report
36.00%
35%
Percentage of ethnic and gender subgroups
30%
25.70%
25%
20%
15%
10% 8.70%
8.00%
5.80% 5.50%
5%
3.00%
1.80%
0%
Male Female Male Female Male Female Male Female
Caucasian Asian Indian Hispanic/Latino
Figure 3.6.5

LLMs and Political Bias which political allegiance most closely corresponds to
LLMs are increasingly recognized as tools through the regular ChatGPT.
which ordinary individuals can inform themselves about
important political topics such as political processes, Figure 3.6.6 shows strong positive correlations (blue
candidates, or parties. However, new research lines) between the default ChatGPT, i.e., one that was
published in 2023 suggests that many major LLMs like answering questions without additional instructions,
ChatGPT are not necessarily free of bias. and both the Democrat and the radical Democrat
ChatGPT versions, i.e., versions of ChatGPT that were
The study revealed that ChatGPT exhibits a notable and asked to answer like a Democrat or radical Democrat.
systematic bias favoring Democrats in the United States On the other hand, the researchers found a strong
and the Labour Party in the U.K. As part of the study, negative correlation between the default GPT and
the researchers compared the answers of a default both Republican ChatGPTs. The identification of bias
ChatGPT to those of Republican, Democrat, radical in these LLMs raises concerns about their potential to
Republican, and radical Democrat versions of ChatGPT. influence the political views and stances of users who
This research design was created to better identify engage with these tools.
Default vs. political ChatGPT average agreement

Source: Motoki et al., 2023 | Chart: 2024 AI Index report
Democrat ChatGPT Republican ChatGPT Radical Democrat ChatGPT Radical Republican ChatGPT
3 3
ρ = 0.96 ρ = 0.93
2 2
Default ChatGPT
Default ChatGPT
ρ = -0.12
1 1
ρ = -0.86
0 0
0 1 2 3 0 1 2 3
Political ChatGPT Political ChatGPT
Figure 3.6.622
22 ChatGPT answers are coded on a scale of 0 (strongly disagree), 1 (disagree), 2 (agree), and 3 (strongly agree).

Impact of AI on Political Research published in 2023 suggests that humans
Processes generally have issues reliably detecting audio

deepfakes. In their sample of 529 individuals,
There has been an increasing volume of research aimed listeners only correctly detected deepfakes 73% of
at exploring some of the risks AI could pose to political the time. Figure 3.6.7 illustrates some of the other
processes. One topic of interest has been audio key findings from the study. The authors also expect
deepfakes. In July 2023, audio clips of a politician from detection accuracy to go down in the future as a
India’s Hindu party were released in which the politician result of improvements in audio generation methods.
attacked his own party and praised his political The rise of more convincing audio deepfakes
opponent. The politician claimed these audio clips were increases the potential to manipulate political
created using AI. However, even after deepfake experts campaigns, defame opponents, and give politicians
were consulted, it could not be determined with 100% a “liar’s dividend,” the ability to dismiss damaging
certainty whether the clips were authentic or not. audio clips as fabrications.
Key research findings on audio deepfakes

Source: Mai et al., 2023; AI Index, 2024
Reference audio improves Listening to clips more frequently

deepfake detection does not aid detection
Training humans to detect Spending more time does not

deepfakes only helps slightly improve detection
Hello Participants do not improve detection

English and Mandarin deepfakes
are equally difficult to identify without explicit feedback
你好
Human crowd and top automated

Shorter deepfakes are not detectors have comparable
easier to identify performance
Figure 3.6.7

AI can also influence political processes in other ways. the degree to which each AI political use case is
Research from Queen’s University Belfast notes other technologically ready, the risk level it possesses,
ways in which AI can affect political processes, and and how visible the deployment of AI would be to
potential mitigations associated with different risk users (Figure 3.6.9). For example, they propose that
cases (Figure 3.6.8). For instance, AI could be utilized employing AI for voter authentication is already highly
for video surveillance of voters, potentially undermining feasible, and this application carries a significant risk.
the integrity of elections. The same authors identify
AI usage, risks, and mitigation strategies in electoral processes

Source: P et al., 2023 | Table: 2024 AI Index report
Avenue AI usage Risks Mitigations
Voter list maintenance Heuristic-driven approximations Access-integrity trade-o� issues Access-focused AI

Record linkage Biased AI Reasonable explanations
Outlier detection Overly generalized AI Local scrutiny
Polling booth locations Drop box location determination Business ethos Plural results
Facility location Volatility and �nding costs Auditing AI
Clustering Partisan manipulation Disadvantaged voters
Predicting problem booths Predictive policing Systemic racism Transparency

Time series motifs Aggravating brutality Statistical rigor
Feedback loops Fair AI
Voter authentication Face recognition Race/gender bias Alternatives

Biometrics Unknown biases Bias audits
Voter turnout Designing for edge cases
Surveillance and misc.
Video monitoring Video-based vote counting Electoral integrity Shallow monitoring

Event detection Marginalized communities Open data
Person re-identi�cation Undermining other monitoring
Figure 3.6.8
Assessments of AI integration and risks in electoral
processes
Source: P et al., 2023 | Chart: 2024 AI Index report
Voter list maintenance HIGH MEDIUM LOW
Polling booth locations MEDIUM MEDIUM VERY LOW
Predicting problem booths HIGH HIGH VERY LOW
Voter authentication VERY HIGH HIGH VERY HIGH
Video monitoring VERY HIGH VERY HIGH HIGH
Technology Risk level Visibility of AI

readiness usage to voters
Figure 3.6.9

CHAPTER 4:
Index Report 2024
Economy
CHAPTER 4:
Index Report 2024 Economy
Preview
Overview 215 4.4 Corporate Activity 258
Chapter Highlights 216 Industry Adoption 258
Adoption of AI Capabilities 258
4.1 What’s New in 2023: A Timeline 218 Adoption of Generative AI Capabilities 266
Use of AI by Developers 269
4.2 Jobs 223 Preference 269
AI Labor Demand 223 Workflow 270
Global AI Labor Demand 223 AI’s Labor Impact 272
U.S. AI Labor Demand by Skill Cluster Earnings Calls 277
and Specialized Skill 224 Aggregate Trends 277
U.S. AI Labor Demand by Sector 228 Specific Themes 278
U.S. AI Labor Demand by State 229 Highlight: Projecting AI’s Economic Impact 279
AI Hiring 232
AI Skill Penetration 234 4.5 Robot Installations 283
AI Talent 236 Aggregate Trends 283
Highlight: How Much Do Industrial Robots:
Computer Scientists Earn? 240 Traditional vs. Collaborative Robots 285
By Geographic Area 286
4.3 Investment 242 Country-Level Data on Service Robotics 290
Corporate Investment 242 Sectors and Application Types 292
Startup Activity 243 China vs. United States 294
Global Trends 243
Regional Comparison by Funding Amount 245
Regional Comparison by Newly Funded
AI Companies 251
Focus Area Analysis 254

CHAPTER 4:
Overview
The integration of AI into the economy raises many compelling questions. Some predict
that AI will drive productivity improvements, but the extent of its impact remains uncertain.
A major concern is the potential for massive labor displacement—to what degree will jobs
be automated versus augmented by AI? Companies are already utilizing AI in various ways
across industries, but some regions of the world are witnessing greater investment inflows
into this transformative technology. Moreover, investor interest appears to be gravitating
toward specific AI subfields like natural language processing and data management.
This chapter examines AI-related economic trends using data from Lightcast, LinkedIn,
Quid, McKinsey, Stack Overflow, and the International Federation of Robotics (IFR). It
begins by analyzing AI-related occupations, covering labor demand, hiring trends, skill
penetration, and talent availability. The chapter then explores corporate investment in
AI, introducing a new section focused specifically on generative AI. It further examines
corporate adoption of AI, assessing current usage and how developers adopt these
technologies. Finally, it assesses AI’s current and projected economic impact and robot
installations across various sectors.

CHAPTER 4:
Chapter Highlights
1. Generative AI investment skyrockets. Despite a decline in overall AI private investment last year,
funding for generative AI surged, nearly octupling from 2022 to reach $25.2 billion. Major players in the generative
AI space, including OpenAI, Anthropic, Hugging Face, and Inflection, reported substantial fundraising rounds.
2. Already a leader, the United States pulls even further ahead in AI private investment.
In 2023, the United States saw AI investments reach $67.2 billion, nearly 8.7 times more than China, the next
highest investor. While private AI investment in China and the European Union, including the United Kingdom,
declined by 44.2% and 14.1%, respectively, since 2022, the United States experienced a notable increase of 22.1%
in the same time frame.
3. Fewer AI jobs, in the United States and across the globe. In 2022, AI-related positions made
up 2.0% of all job postings in America, a figure that decreased to 1.6% in 2023. This decline in AI job listings is
attributed to fewer postings from leading AI firms and a reduced proportion of tech roles within these companies.
4. AI decreases costs and increases revenues. A new McKinsey survey reveals that 42% of surveyed
organizations report cost reductions from implementing AI (including generative AI), and 59% report revenue
increases. Compared to the previous year, there was a 10 percentage point increase in respondents reporting
decreased costs, suggesting AI is driving significant business efficiency gains.
5. Total AI private investment declines again, while the number of newly funded AI
companies increases. Global private AI investment has fallen for the second year in a row, though less than
the sharp decrease from 2021 to 2022. The count of newly funded AI companies spiked to 1,812, up 40.6% from
the previous year.
6. AI organizational adoption ticks up. A 2023 McKinsey report reveals that 55% of organizations now
use AI (including generative AI) in at least one business unit or function, up from 50% in 2022 and 20% in 2017.
7. China dominates industrial robotics. Since surpassing Japan in 2013 as the leading installer of
industrial robots, China has significantly widened the gap with the nearest competitor nation. In 2013, China’s
installations accounted for 20.8% of the global total, a share that rose to 52.4% by 2022.

CHAPTER 4:
Chapter Highlights (cont’d)

8. Greater diversity in robotic installations. In 2017, collaborative robots represented a mere 2.8% of all
new industrial robot installations, a figure that climbed to 9.9% by 2022. Similarly, 2022 saw a rise in service robot
installations across all application categories, except for medical robotics. This trend indicates not just an overall
increase in robot installations but also a growing emphasis on deploying robots for human-facing roles.
9. The data is in: AI makes workers more productive and leads to higher quality work.
In 2023, several studies assessed AI’s impact on labor, suggesting that AI enables workers to complete tasks
more quickly and to improve the quality of their output. These studies also demonstrated AI’s potential to bridge
the skill gap between low- and high-skilled workers. Still other studies caution that using AI without proper
oversight can lead to diminished performance.
10. Fortune 500 companies start talking a lot about AI, especially generative AI.
In 2023, AI was mentioned in 394 earnings calls (nearly 80% of all Fortune 500 companies), a notable increase
from 266 mentions in 2022. Since 2018, mentions of AI in Fortune 500 earnings calls have nearly doubled. The
most frequently cited theme, appearing in 19.7% of all earnings calls, was generative AI.

Artificial Intelligence Chapter 4: Economy
Index Report 2024 4.1 What’s New in 2023: A Timeline
The chapter begins with an overview of some of the most significant AI-related economic events in 2023, as selected
by the AI Index Steering Committee.
4.1 What’s New in 2023: A Timeline

InstaDeep acquired by BioNTech
Jan. 10,
BioNTech, known for developing the first mRNA COVID-19
2023
vaccine in partnership with Pfizer, acquires InstaDeep for $680
million to advance AI-powered drug discovery, design, and
development. InstaDeep specializes in creating AI systems for
enterprises in biology, logistics, and energy sectors.
Source: Reuters, 2022
Figure 4.1.1
Microsoft invests $10 billion in ChatGPT maker OpenAI

Jan. 23,
With this deal, Microsoft Azure remains the exclusive cloud provider for OpenAI, which relies
2023
on Azure to train its models. This follows Microsoft’s initial $1 billion investment in 2019 and a
subsequent investment in 2021.
Source: Microsoft, 2023

Figure 4.1.2
GitHub Copilot for Business becomes publicly available

Feb. 14,
Copilot for Business leverages an OpenAI Codex model to
2023
enhance code suggestion quality. At launch, GitHub Copilot
contributed to an average of 46% of developers’ code across
various programming languages, with this figure rising to 61%
Source: GitHub, 2023
for Java. Figure 4.1.3
Salesforce introduces Einstein GPT

March 7,
Einstein GPT, the first comprehensive AI for CRM, utilizes
2023
OpenAI’s models. Einstein GPT aids Salesforce customers in
sales, marketing, and customer management.
Source: Salesforce, 2023

Figure 4.1.4

Microsoft announces integration of

March 16, GPT-4 into Office 365
2023
Microsoft rolls out Copilot across Office
365, offering AI assistance in Word,
PowerPoint, and Excel.
Source: Microsoft, 2023
Figure 4.1.5
Bloomberg announces LLM for finance

March 30,
Bloomberg’s 50-billion parameter LLM is custom-built
2023
for analyzing financial data and tailored to finance
professionals. This model is capable of performing financial
analyses on Bloomberg’s extensive datasets.
Source: Bloomberg, 2023

Figure 4.1.6
Adobe launches generative AI tools inside

May 23, Photoshop
2023
Adobe introduces generative AI features in
Photoshop via Adobe Firefly, its generative
image tool. Users can now add, remove, and edit
Source:
images within seconds using text prompts. TechCrunch, 2023
Figure 4.1.7
Cohere raises $270 million

June 8,
Cohere, focused on developing an AI model ecosystem for
2023
enterprises, raises $270 million in an oversubscribed Series
C round. Inovia Capital led the round, with participation from
Nvidia, Oracle, Salesforce Ventures, Schroders Capital, and
Index Ventures. Source: Cohere, 2023
Figure 4.1.8

Nvidia reaches $1 trillion valuation

June 13,
Nvidia’s market capitalization consistently exceeds $1 trillion
2023
USD, driven by rising demand for its AI-powering chips.
Nvidia becomes the fifth company to reach a valuation of $1
trillion, joining the ranks of Apple Inc. (AAPL.O), Alphabet
Inc. (GOOGL.O), Microsoft Corp. (MSFT.O), and Amazon.
com Inc. (AMZN.O). Source: The Brand Hopper, 2023
Figure 4.1.9
Databricks buys MosaicML for $1.3 billion

June 26,
Databricks, a leader in data storage and management,
2023
announces its acquisition of MosaicML, a generative AI
orchestration startup founded in 2021, for $1.3 billion. This
move aims to enhance Databricks’ generative AI capabilities.
Source: Databricks, 2023
Figure 4.1.10
Thomson Reuters acquires Casetext for $650 million

June 29,
Thomson Reuters finalizes its acquisition of Casetext, a legal startup renowned for its artificial
2023
intelligence–powered assistant for law, for a staggering $650 million. At the time of acquisition,
Casetext boasted a substantial customer base of over 10,000 law firms and corporate legal
departments. Among its flagship offerings is CoCounsel, an AI legal assistant driven by GPT-
4, which enables rapid document review, legal research memos, deposition preparation, and
contract analysis within minutes.
Source:
Legal.io, 2023
Figure 4.1.11
Inflection AI raises $1.3 billion from Bill Gates and

June 30, Nvidia, among others
2023
Inflection AI raises $1.3 billion through a combination of cash
and cloud credits, bringing the company’s valuation to over $4
billion. Founded by Mustafa Suleyman of Google DeepMind
and Reid Hoffman of LinkedIn, Inflection AI is developing a
“kind and supportive” chatbot named Pi. The funding round Source: TechCrunch, 2023
Figure 4.1.12
attracts investments from Microsoft, Nvidia, Reid Hoffman,
Bill Gates, and Eric Schmidt, former CEO of Google.

Hugging Face raises $235 million from investors

Aug. 24,
Hugging Face, a platform and community dedicated to machine
2023
learning and data science, secures an impressive $235 million
funding round, pushing its valuation to $4.5 billion. The platform
serves as a one-stop destination for building, deploying, and
training machine learning models. Offering a GitHub-like hub
for AI code repositories, models, and datasets, Hugging Face Source: TechCrunch, 2023
Figure 4.1.13
has attracted significant attention from industry giants.
SAP introduces new generative AI assistant Joule

Sep. 26,
Joule is a ChatGPT-style digital assistant integrated
2023
across SAP’s diverse product range. Joule will seamlessly
integrate into SAP applications spanning HR, finance,
supply chain, procurement, and customer experience.
Additionally, it will be incorporated into the SAP Business Source: SAP, 2023
Figure 4.1.14
Technology Platform, extending its utility across SAP’s
extensive user base of nearly 300 million.
Amazon and Google make multibillion-dollar

Oct. 27, investments in Anthropic
2023
Amazon announces its intent to invest up to $4
billion in Anthropic, a rival of OpenAI. This significant
investment follows Google’s agreement to invest up
to $2 billion in Anthropic. The deal comprises an initial
$500 million upfront, with an additional $1.5 billion to Source: TechCrunch, 2023
Figure 4.1.15
be invested over time.
Kai-Fu Lee launches OpenSource LLM

Nov. 5,
Kai-Fu Lee’s LLM startup publicly unveils an open-source
2023
model and secures funding at a $1 billion valuation, with
Alibaba leading the investment. Lee, known for his leadership
roles at Google in China and for establishing Microsoft
Research China, one of Microsoft’s key international research
hubs, spearheads this initiative. Source: TechCrunch, 2023
Figure 4.1.16

Sam Altman, OpenAI CEO, fired and then rehired

Nov. 17,
OpenAI’s board claims Altman was “not consistently
2023
candid in his communications.” Chaos ensues at OpenAI.
Many employees resign in response to the news, and 745
sign a letter threatening resignation if the current board
members do not resign. A few days later, Altman
is reinstated. Source: CoinGape, 2024
Figure 4.1.17
Mistral AI closes $415 million funding round

Dec. 11,
Less than six months after raising a $112 million seed
2023
round, Europe-based Mistral AI secures an additional
$415 million. The startup, cofounded by alumni from
Google’s DeepMind and Meta, focuses on developing
foundation models with an open-source technology
approach, aiming to compete with OpenAI. Leading
Source: TechCrunch, 2023
the round is Andreessen Horowitz, with participation Figure 4.1.18
from Lightspeed Venture Partners, Salesforce, BNP

Paribas, General Catalyst, and Elad Gil.

Index Report 2024 4.2 Jobs
4.2 Jobs
AI Labor Demand
requiring AI skills, in many, the number of AI-related
This section analyzes the demand for AI-related skills job postings over the past five years has increased.1
in labor markets, drawing on data from Lightcast.
Lightcast has analyzed hundreds of millions of job Lightcast speculates that the 2023 decrease in AI job
postings from over 51,000 websites since 2010, postings is driven by many top AI employers (such
identifying those that require AI skills. as Amazon, Deloitte, Capital One, Randstad, and
Global AI Labor Demand Elevance Health) scaling back their overall posting
Figure 4.2.1 shows the percentage of job postings counts. Additionally, many companies shifted the
demanding AI skills. In 2023, the United States (1.6%), occupational mix of their postings. For example,
Spain (1.4%), and Sweden (1.3%) led in this metric. Amazon, in 2023, posted a higher share of operational
In 2022, AI-related jobs accounted for 2.0% of all roles like sales delivery driver, packager, and postal
American job postings. In 2023, that number dropped service/mail room worker than in 2022. At the same
to 1.6%. Although most countries saw a decrease time, there was a lower share of demand for tech roles
from 2022 to 2023 in the share of job postings like software developers and data scientists.
AI job postings (% of all job postings) by geographic area, 2014–23

Source: Lightcast, 2023 | Chart: 2024 AI Index report
2.00%
AI job postings (% of all job postings)
1.62%, United States

1.50% 1.35%, Spain
1.31%, Sweden
1.20%, Belgium
1.12%, Netherlands
1.07%, France
1.05%, Canada
1.04%, Switzerland
1.00%, Australia
1.00%
0.89%, Austria
0.85%, United Kingdom
0.81%, Germany
0.76%, Italy
0.50% 0.50%, New Zealand
0.00%
2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Figure 4.2.1
1 In 2023, Lightcast slightly changed its methodology for determining AI-related job postings from what was used in previous versions of the AI Index report. Lightcast also updated its
taxonomy of AI-related skills. As such, some of the numbers in this chart do not completely align with those featured in last year’s report.

U.S. AI Labor Demand by Skill Cluster and

Specialized Skill at 0.2%. Despite a recent dip, machine learning
Figure 4.2.2 highlights the most sought-after AI continues to be the most in-demand skill. Since last
skills in the U.S. labor market since 2010. Leading the year, every AI-related skill cluster tracked by Lightcast
demand was machine learning at 0.7%, with artificial had a decrease in market share, with the exception of
intelligence at 0.5%, and natural language processing generative AI, which grew by more than a factor of 10.
AI job postings (% of all job postings) in the United States by skill cluster, 2010–23
1.00%
0.80%
0.68%, Machine learning
0.60%
0.46%, Arti cial intelligence

0.40%
0.15%, Natural language processing

0.20% 0.14%, Autonomous driving
0.10%, Neural networks
0.07%, Visual image recognition
0.05%, Robotics
0.05%, Generative AI
0.00%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.2.2

Figure 4.2.3 compares the top 10 specialized skills sought in AI job postings in 2023 versus those from 2011 to
2013.2 On an absolute scale, the demand for nearly every specialized skill has increased over the past decade, with
Python’s notable increase in popularity highlighting its ascendance as a preferred AI programming language.
Top 10 specialized skills in 2023 AI job postings in the United States, 2011–13 vs. 2023
152,201
Python (programming language)
13,503
133,066
Computer science
43,748
93,541
SQL (programming language)
25,194
91,883
Data analysis
20,770
85,480
Data science
3,892
73,069
Agile methodology
8,602
68,459
Amazon web services
1,712
67,772
Automation
12,327
64,557
Software engineering
18,704 2023
62,180 2011–13
Project management
25,953
0 50,000 100,000 150,000 200,000 250,000 300,000
Number of AI job postings
Figure 4.2.3
2 The decision to select 2011–2013 as the point of comparison was because some data at the jobs/skills level from earlier years is quite sparse. Lightcast therefore used 2011–2013 to have a
larger sample size for a benchmark from 10 years ago with which to compare. Figure 4.2.3 juxtaposes the total number of job postings requiring certain skills from 2011 to 2013 with the total
amount in 2023.

In 2023, Lightcast saw great increases in the number of U.S. job postings citing generative AI skills. That year,
15,410 job postings specifically cited generative AI as a desired skill, large language modeling was mentioned in
4,669 postings, and ChatGPT appeared in 2,841 job listings (Figure 4.2.4).
Generative AI skills in AI job postings in the United States, 2023

Generative arti cial intelligence 15,410
Large language modeling 4,669
ChatGPT 2,841
Prompt engineering 1,299
Generative adversarial networks 1,102
Variational autoencoders 372
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000

Number of AI job postings
Figure 4.2.4

Figure 4.2.5 illustrates what proportion of all generative AI job postings released in 2023 referenced particular
generative AI skills. The most cited skill was generative AI (60.0%), followed by large language modeling (18.2%)
and ChatGPT (11.1%).
Share of generative AI skills in AI job postings in the United States, 2023

Generative arti cial intelligence 59.98%
Large language modeling 18.17%
ChatGPT 11.06%
Prompt engineering 5.06%
Generative adversarial networks 4.29%
Variational autoencoders 1.45%
0% 10% 20% 30% 40% 50% 60%

Skill share in AI job postings (%)
Figure 4.2.5

U.S. AI Labor Demand by Sector and educational services. The leading sectors were
Figure 4.2.6 shows the percentage of U.S. job information (4.6%); professional, scientific, and
postings requiring AI skills by industry sector from technical services (3.3%); and finance and insurance
2022 to 2023. Nearly every sector experienced a (2.9%). As noted earlier, the decrease in AI job postings
decrease in the proportion of AI job postings in 2023 was related to changes in the hiring patterns of several
compared to 2022, except for public administration major U.S. employers.
AI job postings (% of all job postings) in the United States by sector, 2022 vs. 2023
Information 4.63% (-3.67%)

4.81%
Professional, scienti c, and technical services 3.33% (-9.52%)
3.67%
Finance and insurance 2.94% (-11.71%)
3.33%
Manufacturing 2.48% (-18.81%)
3.06%
Public administration 1.49% (88.81%)
0.79%
Educational services 1.41% (5.98%)
1.33%
Management of companies and enterprises 1.33% (-21.15%)
1.69%
Utilities 1.19% (-1.53%)
1.21%
Agriculture, forestry, shing and hunting 0.85% (-44.98%)
1.54%
Mining, quarrying, and oil and gas extraction 0.82% (-33.26%)
1.23%
Wholesale trade 0.70% (-20.52%)
0.89%
Real estate and rental and leasing 0.55% (-25.92%)
0.74%
Transportation and warehousing 0.48% (-25.92%)
0.65%
2023
Retail trade 0.48% (-54.68%)
1.06% 2022
Waste management and administrative support services 0.40% (-22.71%)
0.51%
0% 1% 2% 3% 4% 5%
Figure 4.2.6

U.S. AI Labor Demand by State Number of AI job postings in the United States by state, 2023
Figure 4.2.7 highlights the number
of AI job postings in the United AK ME
1,309 2,125
States by state. The top three states
VT NH MA
were California (70,630), followed 1,003 2,077 23,017
by Texas (36,413) and Virginia WA MT ND SD MN WI MI NY CT RI
14,725 892 1,293 1,351 5,824 5,055 13,734 24,397 5,415 2,992
(24,417).
OR ID WY NE IA IL IN OH PA NJ
5,478 3,357 729 2,288 3,531 20,178 5,267 10,409 13,294 14,705
CA NV UT CO KS MO KY WV DC MD DE
70,630 3,404 3,679 10,292 5,431 7,390 2,315 533 6,861 16,312 2,462
AZ NM OK AR TN VA NC
9,022 2,952 4,125 3,783 5,318 24,417 12,976
TX LA MS AL GA SC
36,413 2,924 1,736 6,214 14,325 3,305
HI FL
2,283 17,678
Figure 4.2.7
Figure 4.2.8 demonstrates what Percentage of US states job postings in AI, 2023
percentage of a state’s total job
postings were AI-related. The top AK ME
1.06% 1.55%
states according to this metric were
VT NH MA
the District of Columbia (2.7%), 1.14% 0.87% 1.88%
followed by Delaware (2.4%) and WA MT ND SD MN WI MI NY CT RI
1.65% 0.75% 1.03% 1.01% 0.80% 0.69% 1.20% 1.43% 1.09% 1.74%
Maryland (2.1%).
0.91% 1.43% 0.95% 0.81% 0.75% 1.26% 0.62% 0.75% 1.06% 1.63%
1.60% 0.79% 0.94% 1.03% 1.19% 0.92% 0.52% 0.46% 2.66% 2.10% 2.38%
0.94% 1.14% 0.89% 1.34% 0.64% 2.09% 0.98%
TX LA MS AL GA SC
1.11% 0.59% 0.79% 1.05% 1.14% 0.58%
HI FL
1.25% 0.71%
Figure 4.2.8

Figure 4.2.9 examines which U.S. Percentage of US AI job postings by state, 2023
states accounted for the largest
proportion of AI job postings AK ME
0.28% 0.46%
nationwide. California was first: In
VT NH MA
2023, 15.3% of all AI job postings 0.22% 0.45% 4.99%
in the United States were for jobs WA MT ND SD MN WI MI NY CT RI
3.19% 0.19% 0.28% 0.29% 1.26% 1.10% 2.98% 5.29% 1.17% 0.65%
based in California, followed by
Texas (7.9%) and Virginia (5.3%). OR ID WY NE IA IL IN OH PA NJ
1.19% 0.73% 0.16% 0.50% 0.77% 4.37% 1.14% 2.26% 2.88% 3.19%
15.31% 0.74% 0.80% 2.23% 1.18% 1.60% 0.50% 0.12% 1.49% 3.54% 0.53%
1.96% 0.64% 0.89% 0.82% 1.15% 5.29% 2.81%
TX LA MS AL GA SC
7.89% 0.63% 0.38% 1.35% 3.11% 0.72%
HI FL
0.49% 3.83%
Figure 4.2.9
Figure 4.2.10 illustrates the trends in the four states with highest AI job postings: Washington, California, New York,
and Texas. Each experienced a notable decline in the share of total AI-related job postings from 2022 to 2023.
Percentage of US states’ job postings in AI by select US state, 2010–23

2.50%
Percentage of U.S. states’ job postings in AI
2.00%
1.65%, Washington
1.60%, California
1.50%
1.43%, New York
1.11%, Texas
1.00%
0.50%
0.00%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.2.10

Figure 4.2.11 shows how AI-related job postings have been distributed across the top four states over time. Since
2019, California’s proportion of AI job postings has steadily declined, while Texas has seen a slight increase.
Percentage of US AI job postings by select US state, 2010–23

25%
Percentage of United States AI job postings
20%
15% 15.31%, California
10%
7.89%, Texas
5% 5.29%, New York
3.19%, Washington
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.2.11

AI Hiring added a new employer in the same period the job began,
divided by the total number of LinkedIn members in
The hiring data presented in the AI Index is based on
the corresponding location. Conversely, the relative AI
a LinkedIn dataset of skills and jobs that appear on
talent hiring rate is the year-over-year change in AI hiring
their platform. The geographic areas included in the
relative to overall hiring rate in the same geographic
sample make at least 10 AI hires each month and have
area.3 Therefore, figure 4.2.12 illustrates which specific
LinkedIn covering a substantial portion of the labor
regions have experienced the most significant rise in AI
force. LinkedIn’s coverage of India’s and South Korea’s
talent recruitment compared to the overall hiring rate,
sizable labor forces fall below this threshold, so insights
serving as an indicator of AI hiring vibrancy. In 2023, the
drawn about these countries should be interpreted with
regions with the greatest relative AI hiring rates year over
particular caution.
year were Hong Kong (28.8%), followed by Singapore
Figure 4.2.12 reports the relative AI hiring rate year-over- (18.9%) and Luxembourg (18.9%). This means, for
year ratio by geographic area. The overall hiring rate is example, that in 2023 in Hong Kong, the ratio of AI
computed as the percentage of LinkedIn members who talent hiring relative to overall hiring grew 28.8%.
Relative AI hiring rate year-over-year ratio by geographic area, 2023

Source: LinkedIn, 2023 | Chart: 2024 AI Index report
Hong Kong 28.83%
Singapore 18.93%
Luxembourg 18.85%
India 16.83%
Portugal 14.84%
United Arab Emirates 13.40%
Denmark 13.23%
Norway 13.08%
Belgium 10.41%
Spain 9.63%
United Kingdom 8.57%
Sweden 7.37%
Finland 7.27%
South Africa 7.17%
Chile 6.65%
0% 5% 10% 15% 20% 25% 30%

Relative AI hiring rate year-over-year ratio Figure 4.2.124
Figure 4.2.13 showcases the year-over-year ratio of AI hiring by geographic areas over the past five years. Starting
from the beginning of 2023, countries including Australia, Canada, Singapore, and India have experienced a
noticeable uptick in AI hiring.
3 For each month, LinkedIn calculates the AI hiring rate in the geographic area, divides the AI hiring rate by overall hiring rate in that geographic area, calculates the year-over-year change of
this ratio, and then takes the 12-month moving average using the last 12 months.
4 For brevity, the visualization only includes the top 15 countries for this metric.

RelativeAI
Relative AIhiring
hiringrate
rate year-over-year
year-over-year ratio
ratio by geographic
by geographic area,area, 2018–23
2018–23
Source:LinkedIn,
Source: LinkedIn,2023
2023 | Chart:
| Chart: 2024
2024 AI Index
AI Index report
report
Australia
Australia Belgium
Belgium Brazil Brazil Canada Canada
100%
100% 100%
100% 100% 100% 100% 100%
50%
50% 50%50% 50% 50% 50% 50%
6.53% 10.41% 4.44%
0% 6.53% 0% 0% 10.41%
0% -0.47% 0% 4.44%
0% 0% -0.47% 0%
−50% −50% −50% −50%
−50% 2019 2021 2023 −50%
2019 2021 2023 −50%
2019 2021 2023 −50%2021
2019 2023
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
Chile Denmark Finland France
Chile Denmark Finland France
100% 100% 100% 100%
100% 100% 100% 100%
50% 50% 50% 50%
50% 50% 13.23% 50% 50%
0% 6.65% 0% 0% 7.27% 0% -0.43%
6.65% 13.23% 7.27%
0% 0% 0% 0% -0.43%
−50% −50% −50% −50%
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
−50% −50% −50% −50%
2019 Germany
2021 2023 2019 2021
Hong Kong 2023 2019
India 2021 2023 2019
Ireland 2021 2023
100% Germany 100% Hong Kong 100% India 100% Ireland

100%
50% 100%
50% 50% 100% 50% 100%
28.83%
16.83%
50%
0% 0.16% 0%50% 0% 50% 0% 50% -0.48%
28.83%
16.83%
ratio ratio
−50%
0% 0.16%−50% 0% −50% 0% −50% 0% -0.48%
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
rate year-over-year
−50% Israel −50% Italy −50% Luxembourg −50%

Netherlands
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
year-over-year
100% 100% 100% 100%

Israel Italy Luxembourg Netherlands
50% 50% 50% 50%
100% 100% 100% 100%
18.85%
0% 6.22% 0% 6.03% 0% 0% 5.21%
hiring
50% 50% 50% 50%

rate
−50% −50% −50% −50%

18.85%
0% 2019 2021 2023 6.22% 2019 2021 2023 6.03% 2019 2021 2023 2019 0%2021 2023 5.21%
hiringAI
0% 0%
Relative
New Zealand Norway Portugal Singapore

−50% −50% −50% −50%
2019 2021 2023 2019 2021 2023 2019 2021 2023 100% 2019 2021 2023
Relative AI
100% 100% 100%
50% New Zealand 50% Norway 50% Portugal 50% Singapore

100% 100% 13.08% 14.84% 18.93%
0% 6.15% 0% 0% 100% 0% 100%
50%
−50% −50%50% −50% 50% −50% 50%
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023 18.93%
6.15% 13.08% 14.84%
0% 0% 0% 0%
South Africa South Korea Spain Sweden
−50%
100% −50%
100% 100% −50% 100% −50%
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
50% 50% 50% 50%
South Africa South Korea Spain Sweden
0% 7.17% 0% 5.81% 0% 9.63% 0% 7.37%
100% 100% 100% 100%
−50% −50% −50% −50%
50% 2019 2021 2023 50%
2019 2021 2023 50%
2019 2021 2023 201950%2021 2023
0% Switzerland 7.17% 0%United Arab Emirates 5.81% United Kingdom

0% 9.63% United States
0% 7.37%
100% 100% 100% 100%
−50% −50% −50% −50%
50% 2019 2021 2023 50% 2019 2021 2023 50% 2019 2021 2023 50% 2019 2021 2023
Switzerland -0.24% 13.40%

United Arab Emirates 8.57%
United Kingdom 5.68%
United States
0% 0% 0% 0%
100% 100% 100% 100%
−50% −50% −50% −50%
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
50% 50% 50% 50%
13.40% 8.57% Figure 4.2.13
5.68%
0% -0.24% 0% 0% 0%
−50% −50% −50% −50%

Table of2019 2021
Contents 2023 Chapter 4 Preview
2019 2021 2023 2019 2021 2023 2019 2021 2023 233
AI Skill Penetration which they appear in LinkedIn members’ profiles. If,

for instance, four of the skills that engineers possess
Figures 4.2.14 and 4.2.15 highlight relative AI skill
belong to the AI skill group, the penetration of AI skills
penetration. The aim of this indicator is to measure the
among engineers is estimated to be 8% (4/50).
intensity of AI skills in an entity (such as a particular
country, industry, or gender). The AI skill penetration For the period from 2015 to 2023, the countries with
rate signals the prevalence of AI skills across the highest AI skill penetration rates were India (2.8),
occupations or the intensity with which LinkedIn the United States (2.2), and Germany (1.9). In the
members utilize AI skills in their jobs. For example, United States, therefore, the relative penetration of
the top 50 skills for the occupation of engineer are AI skills was 2.2 times greater than the global average
calculated based on the weighted frequency with across the same set of occupations.
Relative AI skill penetration rate by geographic area, 2015–23

India 2.75
United States 2.22
Germany 1.90
Canada 1.67
Israel 1.63
United Kingdom 1.63
Singapore 1.50
France 1.48
South Korea 1.35
Spain 1.29
Brazil 1.21
Netherlands 1.13
Italy 1.08
Switzerland 1.06
United Arab Emirates 0.98
0.00 0.50 1.00 1.50 2.00 2.50

Relative AI skill penetration rate Figure 4.2.14

Figure 4.2.15 disaggregates AI skill penetration rates the country. For all countries in the sample, the relative
by gender across different countries or regions. AI skill penetration rate is greater for men than women.
A country’s rate of 1.5 for women means female India (1.7), the United States (1.2), and Israel (0.9) have
members in that country are 1.5 times more likely to the highest reported relative AI skill penetration rates
list AI skills than the average member in all countries for women.
pooled together across the same set of occupations in
Relative AI skill penetration rate across gender, 2015–23

2.78
India 1.65
2.21
United States 1.23
1.92
Israel 0.86
1.70
Canada 0.85
2.00
Germany 0.82
1.64
United Kingdom 0.71
1.51
Singapore 0.70
1.46
France 0.67
1.20
Netherlands 0.55
1.41
Spain 0.55
1.10
Italy 0.46
1.26
Brazil 0.44
0.89
1.14 Male
Switzerland 0.39 Female
0.92
Australia 0.37
0.00 0.50 1.00 1.50 2.00 2.50 3.00
Relative AI skill penetration rate
Figure 4.2.15

AI Talent AI talent concentration by geographic area, 2023

Israel 1.13%
Figures 4.2.16 to 4.2.18 examine AI talent by Singapore 0.88%
country. A LinkedIn member is considered AI South Korea 0.79%

Luxembourg 0.74%
talent if they have explicitly added AI skills to
Finland 0.71%
their profile or work in AI. Counts of AI talent
Germany 0.69%
are used to calculate talent concentration, or Switzerland 0.68%
the portion of members who are AI talent. Note Netherlands 0.61%
Sweden 0.56%
that concentration metrics may be influenced by
Ireland 0.55%
LinkedIn coverage in these countries and should
France 0.49%
be used with caution. Canada 0.45%
Denmark 0.42%
Figure 4.2.16 shows AI talent concentration in
India 0.42%
various countries. In 2023, the countries with the Cyprus 0.40%
highest concentrations of AI talent included Israel 0.00% 0.20% 0.40% 0.60% 0.80% 1.00% 1.20%
AI talent concentration
(1.1%), Singapore (0.9%), and South Korea (0.8%).
Figure 4.2.16
Figure 4.2.17 looks at the percent change in AI
talent concentration for a selection of countries
since 2016. During that time period, several major Percentage change in AI talent concentration by
economies registered substantial increases in their geographic area, 2016 vs. 2023
AI talent pools. The countries showing the greatest India 263%

Cyprus 229%
increases are India (263%), Cyprus (229%), and
Denmark 213%
Denmark (213%). Canada 188%
Singapore 172%
Germany 169%
Ireland 161%
Switzerland 142%
Israel 116%
Luxembourg 107%
Netherlands 75%
Finland 70%
Sweden 51%
South Korea 50%
France 41%
0% 40% 80% 120% 160% 200% 240% 280%
% change in AI talent concentration
Figure 4.2.17

AI talent concentration by gender, 2016–23

Australia Belgium Brazil Canada

0.60% 0.15% 0.60%
0.40% 0.43% 0.55% 0.14% 0.59%
0.40% 0.10% 0.40%
0.20% 0.19% 0.26%
0.20% 0.20% 0.05% 0.20%
0.03%
0.00% 0.00% 0.00% 0.00%
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Chile Costa Rica Cyprus Denmark

0.22% 0.25% 0.60% 0.59% 0.60% 0.59%
0.20%
0.20%
0.40% 0.40%
0.10% 0.10% 0.22%
0.09% 0.20% 0.20% 0.21%
0.06%
0.00% 0.00% 0.00% 0.00%
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Finland France Germany Hong Kong

1.00% 1.09% 0.60% 0.65% 0.89% 0.49%
0.40%
0.40% 0.50%
0.50% 0.34%
0.45% 0.37% 0.20% 0.19%
0.20%
0.00% 0.00% 0.00% 0.00%
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
India Ireland Israel Italy

0.45% 0.72% 1.73% 0.30% 0.30%
0.40% 0.38% 0.60% 1.50%
1.00% 0.20%
0.40%
0.20% 0.33% 0.75% 0.14%
AI talent concentration
0.20% 0.50% 0.10%

0.00% 0.00% 0.00% 0.00%
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Luxembourg Netherlands New Zealand Norway

1.00% 0.99% 0.79% 0.40% 0.37% 0.53%
0.40%
0.50% 0.20%
0.50% 0.46% 0.41% 0.20% 0.24%
0.20%
0.00% 0.00% 0.00% 0.00%

2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Portugal Singapore South Africa Spain

0.40% 0.44% 1.00% 1.03% 0.17% 0.51%
0.15% 0.40%
0.67% 0.10%
0.20% 0.50% 0.09% 0.20%
0.15% 0.05% 0.16%
0.00% 0.00% 0.00% 0.00%
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Sweden Switzerland United Arab Emirates United Kingdom

1.00%
0.77% 0.92% 0.26% 0.47%
0.20% 0.21% 0.40%
0.50%
0.35% 0.50% 0.23%
0.38% 0.10% 0.20%
0.00% 0.00% 0.00% 0.00%

2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
United States Uruguay

0.48% 0.20% 0.22%
0.40%
0.10% Male
0.20% 0.21%
0.07% Female
0.00% 0.00%
2017 2020 2023 2017 2020 2023
Figure 4.2.18

LinkedIn data provides insights on the AI talent gained per 10,000 LinkedIn members by geographic area.
or lost due to migration trends. Net flows are defined5
The countries that report the greatest incoming
as total arrivals minus departures within the given time migration of AI talent are Luxembourg, Switzerland,
period. Figure 4.2.19 examines net AI talent migration and the United Arab Emirates.
Net AI talent migration per 10,000 LinkedIn members by geographic area, 2023
Source: LinkedIn, 2023; World Bank Group, 2023 | Chart: 2024 AI Index report
Luxembourg 3.67
Switzerland 1.60
Cyprus 1.24
Germany 1.04
Canada 0.96
Finland 0.90
Netherlands 0.76
Australia 0.67
Ireland 0.60
Singapore 0.50
Denmark 0.44
United Kingdom 0.41
Norway 0.41
United States 0.40
0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50

Net AI talent migration (per 10,000 LinkedIn members)
Figure 4.2.19
Figure 4.2.20 documents AI talent migration data over time. In the last few years, Israel, India, and South Korea
have seen declining net AI talent migration figures, suggesting that AI talent has been increasingly flowing out of
these countries.
5 LinkedIn membership varies considerably between countries, which makes interpreting absolute movements of members from one country to another difficult. To compare migration
flows between countries fairly, migration flows are normalized for the country of interest. For example, if country A is the country of interest, all absolute net flows into and out of country
A (regardless of origin and destination countries) are normalized based on LinkedIn membership in country A at the end of each year and multiplied by 10,000. Hence, this metric indicates
relative talent migration of all other countries to and from country A.

Net AI
Net AItalent
talentmigration
migration per
per 10,000
10,000 LinkedIn
LinkedIn members
members by geographic
by geographic area, area, 2019–23
2019–23
Source:
Source:LinkedIn,
LinkedIn,2023;
2023;World
WorldBank Group,
Bank 2023
Group, | Chart:
2023 20242024
| Chart: AI Index reportreport
AI Index
Australia
Australia Belgium
Belgium Brazil Brazil Canada Canada
11 1 1 1 1 1 1 0.96
0.67 0.96
0.67 0.28
0 0 0.28 0 -0.05 0
0 0 0 -0.05 0
−1 −1 −1 −1
−1
2019 2021 2023 −1
2019 2021 2023 2019 −1 2021 2023 2019 −1 2021 2023
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
Chile Costa Rica Cyprus* Denmark
Chile Costa Rica 4 Cyprus* Denmark
3 4
1 1 1
1 1 2 3 1 0.44
1 2 1.24
0 -0.07 0 0.01 0 0.44
1.24
0 -0.07 0 0.01 0 1 0
−1 −1 −1 0 −1
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
−1 −1 −1 −1
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
Finland France Germany Hong Kong
Finland France Germany
1.04
Hong Kong
1 0.90 1 1 1
01 0.90 0 1 0.06 0 1 1.040 1 0.12
0.06 −1 0.12
−10 −1 0 0 −1 0
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
members)
−1 −1 −1 −1
2019 India
2021 2023 2019 Ireland
2021 2023 2019 Israel2021 2023 2019 Italy 2021 2023
members)
1 India 1 Ireland 1 Israel 1 Italy

LinkedIn
0.60
01 0 1 0 1 0 1 -0.18
LinkedIn
0.60 -0.57
(per 10,000
-0.76
−1 −1 −1 −1
0
2019 2021 2023 20190 2021 2023 2019 0 2021 2023 2019 0 2021 2023 -0.18
-0.57
(per 10,000
-0.76
−1 Luxembourg* −1 Netherlands −1 New Zealand −1 Norway
62019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
talent migration
5
4 3.67 1 1 1
3 Luxembourg* Netherlands0.76 New Zealand Norway
0.41
26 0.19
migration
0 0 0
15
04 3.67 1 1 1
0.76
−13 −1 −1 −1 0.41
2
2019 2021 2023 20190 2021 2023 2019 2021 2023 0.192019 0 2021 2023
Net AI
0
1
Net AI talent
0 Portugal Singapore South Africa South Korea

−1 −1 −1 −1
2019 2021 2023 2019 2021 2023 2019 2021 2023 2019 2021 2023
1 1 1 1
0.50
Portugal 0.23 Singapore South Africa South Korea
0 0 0 -0.12 0
-0.30
−11 −1 1 −1 1 −1 1
2019 2021 2023 2019 2021 2023 0.50 2019 2021 2023 2019 2021 2023
0.23
0 0 0 -0.12 0
Spain Sweden Switzerland* United Arab Emirates -0.30
3
−1 −1 −1 −1 1.48
1
2019 2021 2023 1 2019 2021 2023 2 2019 2021 2023 1 2019 2021 2023
1.60
0.25 0.17 1
0 Spain 0 Sweden Switzerland* 0 United Arab Emirates
0 3
1.48
−11 −1 1 −1 2 −1 1 2021
2019 2021 2023 2019 2021 2023 2019 2021 2023 1.602019 2023
0.25 0.17 1
0 United Kingdom 0 United States Uruguay 0
0
−1
1 1 −1 1 −1 −1
2019 2021 2023
0.41
2019 2021 0.40
2023 2019 2021 2023 2019 2021 2023
0 0 0 -0.13
United Kingdom United States Uruguay
−1 −1 −1
2019
1 2021 2023 20191 2021 2023 2019 1 2021 2023
Figure 4.2.206
0.41 0.40
0 indicate that a country’s y-axis label is scaled
6 Asterisks 0 differently than the y-axis label for the other countries.
0 -0.13
−1 −1 −1
2019 2021 2023 2019 2021 2023 2019 2021 2023
Highlight:
How Much Do Computer Scientists Earn?

Every year, Stack Overflow conducts a survey of Salaries vary by position and geography. For
its community of professional developers who instance, the average global salary for a cloud
use their tools. The latest iteration of the survey infrastructure engineer is $105,000. In the United
profiled over 90,000 developers. States, the average salary for such a position
is $185,000. Both globally and in the United
Through this survey, respondents were asked
States, the highest compensated roles are senior
about their income. It is important to note that
executives, followed by engineering managers.
these respondents do not work exclusively with
For all surveyed positions, salaries are significantly
AI. However, examining developer salaries can
higher in the United States than in other countries.
serve as a means to approximate the compensation
of talent in AI-adjacent industries. Figure 4.2.21
examines the salaries of professional developers
disaggregated by position.

Highlight:
How Much Do Computer Scientists Earn? (cont’d)

Median yearly salary by professional developer type, 2023
Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
Senior executive (C-suite, VP, etc.) 124.75

220
Engineering manager 124.14
195
Marketing or sales professional 116.00
Engineer, site reliability 115.66

180
Developer experience 107.09
210
Cloud infrastructure engineer 105.00
185
Blockchain 103.74
Developer advocate 100.31
Security professional 99.31

173
Scientist 92.32
132.50
Product manager 88.93
198.50
Hardware engineer 85.67
140
Research and development role 85.67
160
Engineer, data 83.52
160
Data scientist or 80.32
Developer type
machine learning specialist 160

DevOps specialist 80.16
160
Database administrator 78.69
120
Developer, embedded 77.10
applications or devices 140
Developer, back-end 76.03
165
Developer, full-stack 71.14
140
Developer, game or graphics 71.01
158
Developer, desktop or 70.76
enterprise applications 130
Developer, mobile 68.19
163
Educator 65.27
100
Developer, QA or test 63.93
124
Project manager 63.18
125
Data or business analyst 61.55
105
Developer, front-end 59.97
140
Designer 59.81
151 All respondents
System administrator 55.76 United States
87.50
Academic researcher 53.55
90
Student 15.42
0 50 100 150 200

Median yearly salary (in thousands of U.S. dollars)
Figure 4.2.21

Index Report 2024 4.3 Investment
This section monitors AI investment trends, leveraging data from Quid, which analyzes investment data from more than
8 million companies worldwide, both public and private. Employing natural language processing, Quid sifts through
vast unstructured datasets—including news aggregations, blogs, company records, and patent databases—to detect
patterns and insights. Additionally, Quid is constantly expanding its database to include more companies, sometimes
resulting in higher reported investment volumes for specific years. For the first time, this year’s investment section in the
AI Index includes data on generative AI investments.
4.3 Investment
In 2023, the total investment dropped to $189.2
Corporate Investment billion, a decrease of approximately 20% from 2022.
Figure 4.3.1 illustrates the trend in global corporate AI Despite a slight reduction in private investment, the
investment from 2013 to 2023, including mergers and most significant downturn occurred in mergers and
acquisitions, minority stakes, private investments, and acquisitions, which fell by 31.2% from the previous
public offerings. For the second consecutive year, year. However, over the past decade, AI-related
global corporate investment in AI has seen a decline. investments have increased thirteenfold.
Global corporate investment in AI by investment activity, 2013–23

Source: Quid, 2023 | Chart: 2024 AI Index report
350 Merger/acquisition
337.4
Minority stake
Private investment
300 Public o ering
Total investment (in billions of U.S. dollars)
250 173.42
234.95
202.49
200 189.16
39.04
117.16
150 80.61
86.42
103.27
100 132.36
79.62 36.43
21.89
53.72 103.4
50 95.99
33.82 24.68 64.02
25.43 43.1 58.18
14.57 19.04
25.72 29.51
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.1

Global Trends
Startup Activity Global private AI investment has declined for the
This section analyzes private investment trends in second consecutive year (Figure 4.3.2). However, the
artificial intelligence startups that have received over decrease from 2022 was small (-7.2%) and smaller than
$1.5 million in investment since 2013. the drop observed from 2021 to 2022. Despite recent
declines, private AI investment globally has grown
substantially in the last decade.
Private investment in AI, 2013–23

120
100 95.99
80
60
40
20
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.2

While overall AI private investment decreased last year, funding for generative AI sharply increased (Figure 4.3.3).
In 2023, the sector attracted $25.2 billion, nearly nine times the investment of 2022 and about 30 times the amount
from 2019. Furthermore, generative AI accounted for over a quarter of all AI-related private investment in 2023.
Private investment in generative AI, 2019–23

25.23
25
20
15
10
0
2019 2020 2021 2022 2023
Figure 4.3.3

Interestingly, the number of newly funded AI companies jumped to 1,812, a 40.6% increase over the previous year
(Figure 4.3.4). Figure 4.3.5 visualizes the average size of AI private investment events, calculated by dividing the total
yearly AI private investment by the total number of AI private investment events. From 2022 to 2023, the average
increased marginally, growing from $31.3 million to $32.4 million.
Number of newly funded AI companies in the world, 2013–23

1,812
1,800
1,600
1,400
Number of companies
1,200
1,000
800
600
400
200
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.4
Average size of AI private investment events, 2013–23
35
32.44
Average investment (in millions of U.S. dollars)
30
25
20
15
10
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.5

2023 marked a significant increase in the number of newly funded generative AI companies, with 99 new startups
receiving funding, compared to 56 in 2022, and 31 in 2019 (Figure 4.3.6).
Number of newly funded generative AI companies in the world, 2019–23

99
100
80
Number of companies
60
40
20
0
2019 2020 2021 2022 2023
Figure 4.3.6
Figure 4.3.7 reports AI funding events disaggregated AI private investment events by funding size,
2022 vs. 2023
by size. In 2023, AI private investment events Source: Quid, 2023 | Table: 2024 AI Index report
decreased across nearly all funding size categories, Funding Size 2022 2023
except for those exceeding $500 million. Over $1 billion 7 9

$500 million – $1 billion 6 7
$100 million – $500 million 187 120
$50 million – $100 million 260 182
Under $50 million 2,840 2,641
Undisclosed 694 680
Total 3,994 3,639
Figure 4.3.7

Regional Comparison by Funding Amount times greater than the amount invested in the next
The United States once again led the world in terms highest country, China ($7.8 billion), and 17.8 times
of total AI private investment. In 2023, the $67.2 the amount invested in the United Kingdom ($3.8
billion invested in the United States was roughly 8.7 billion) (Figure 4.3.8).
Private investment in AI by geographic area, 2023

United States 67.22
China 7.76
United Kingdom 3.78
Germany 1.91
Sweden 1.89
France 1.69
Canada 1.61
Israel 1.52
South Korea 1.39
India 1.39
Singapore 1.14
Japan 0.68
Australia 0.37
Spain 0.36
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70
Figure 4.3.8

When aggregating private AI investments since 2013, the country rankings remain the same: The United States
leads with $335.2 billion invested, followed by China with $103.7 billion, and the United Kingdom at $22.3 billion
(Figure 4.3.9).
Private investment in AI by geographic area, 2013–23 (sum)

United States 335.24
China 103.65
United Kingdom 22.25
Israel 12.83
Canada 10.56
Germany 10.35
India 9.85
France 8.31
South Korea 7.25
Singapore 6.25
Japan 4.81
Australia 3.40
Switzerland 3.28
Hong Kong 3.15
Sweden 2.88
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340
Figure 4.3.9

Figure 4.3.10, which looks at AI private investment over time by geographic area, suggests that the gap in private
investments between the United States and other regions is widening over time. While AI private investments have
decreased in China (-44.2%) and the European Union plus the United Kingdom (-14.1%) since 2022, the United
States has seen a significant increase (22.1%) during the same period.
Private investment in AI by geographic area, 2013–23

80
70
60
50
40
30
20
10 11.00, European Union and United Kingdom

7.76, China
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.10

The disparity in regional AI private investment becomes particularly pronounced when examining generative AI-
related investments. For instance, in 2022, the United States outpaced the combined investments of the European
Union plus United Kingdom in generative AI by approximately $1.9 billion (Figure 4.3.11). By 2023, this gap widened
to $21.1 billion.
Private investment in generative AI by geographic area, 2019–23

20
15
10
0.74, European Union and United Kingdom

0.65, China
0
2019 2020 2021 2022 2023

Figure 4.3.11

Regional Comparison by Newly Funded Consistent with trends in private investment, the
AI Companies United States leads all regions with 897 new AI
This section examines the number of newly funded companies, followed by China with 122, and the
AI companies across different geographic regions. United Kingdom with 104 (Figure 4.3.12).
Number of newly funded AI companies by geographic area, 2023

United States 897
China 122
United Kingdom 104
Germany 76
Canada 59
France 58
India 45
South Korea 44
Israel 43
Japan 42
Singapore 29
Australia 24
Spain 21
Switzerland 17
Brazil 15
0 100 200 300 400 500 600 700 800 900

Number of companies
Figure 4.3.12

A similar trend is evident in the aggregate data since 2013. In the last decade, the number of newly funded AI
companies in the United States is around 3.8 times the amount in China, and 7.6 times the amount in the United
Kingdom (Figure 4.3.13).
Number of newly funded AI companies by geographic area, 2013–23 (sum)

United States 5,509
China 1,446
United Kingdom 727
Israel 442
Canada 397
France 391
India 338
Japan 333
Germany 319
Singapore 193
South Korea 189
Australia 147
Switzerland 123
Spain 94
Sweden 94
0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500
Number of companies
Figure 4.3.13

Figure 4.3.14 presents data on newly funded AI United States, along with the European Union and the
companies in specific geographic regions, highlighting United Kingdom, have seen significant increases in the
a decade-long trend where the United States number of new AI companies, in contrast to China,
consistently surpasses both the European Union and which experienced a slight year-over-year decrease.
the United Kingdom, as well as China. Since 2022, the
Number of newly funded AI companies by geographic area, 2013–23

800
700
600
Number of companies
500
400
368, European Union and United Kingdom
300
200
122, China
100
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
Figure 4.3.14

Focus Area Analysis

Quid also disaggregates private AI investment by Figure 4.3.16 presents trends over time in AI focus
focus area. Figure 4.3.15 compares global private AI area investments. As noted earlier, most focus areas
investment by focus area in 2023 versus 2022. The saw declining investments in the last year. Conversely,
focus areas that attracted the most investment in some of the areas that saw growth since 2022
2023 were AI infrastructure/research/governance include AI infrastructure/research/governance and
($18.3 billion); NLP and customer support ($8.1 billion); data management, processing. Although now still
and data management and processing ($5.5 billion). substantial, investments in medical and healthcare as
The prominence of AI infrastructure, research, and well as NLP, customer support peaked in 2021 and
governance reflects large investments in companies have since then declined.
specifically building AI applications, such as OpenAI,
Anthropic, and Inflection AI.
Private investment in AI by focus area, 2022 vs. 2023

AI infrastructure/research/governance
NLP, customer support
Data management, processing
Medical and healthcare
AV
Fintech
Quantum computing
Semiconductor
Energy, oil, and gas
Creative, music, video content
Ed tech
Marketing, digital ads
Drones
Manufacturing
Cybersecurity, data protection
AR/VR
Retail
Insurtech
Entertainment
VC
Agritech
Legal tech
Facial recognition
2023
Fitness and wellness
2022
Geospatial
0 2 4 6 8 10 12 14 16 18
Figure 4.3.15

Privateinvestment
Private investmentinin
AIAI
byby focus
focus area,
area, 2017–23
2017–23
Source:Quid,
Source: Quid,2023
2023| Chart:
| Chart: 2024
2024 AI Index
AI Index report
report
AI
AI infrastructure/research/governance
infrastructure/research/governance AR/VR
AR/VR AV AV Agritech Agritech
18.27
18.27
15
15 15 15 15 15 15 15
10 10 10 10
10 10 10 10
5 5 5 5
5 5 5 2.66 5
0 0 0.67 2.66 0.50
0 0.67 0 0 0.50
2017 2020 2023 20170 2020 2023 2017 0 2020 2023 2017 0 2020 2023
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Creative, music, video content Cybersecurity, data protection Data management, processing Drones
15 15 15 15
15 15 15 15
10 10 10 10
10
5 5
10 5
10 5.50 5
10
05
1.34
0 5 0.89 0 5 5.50
0 5 0.96
2017 2020 2023 1.34 2017 2020 2023 0.89 2017 2020 2023 2017 2020 2023 0.96
0 0 0 0
2017 2020
Ed tech 2023 2017Energy, oil,
2020and gas 2023 2017 2020
Entertainment 2023 2017recognition
Facial 2020 2023
15 Ed tech 15 Energy, oil, and gas 15 Entertainment 15 Facial recognition
10
15 10 15 10 15 10 15
5 5 5 5
10 10 10 10
1.24 1.47 0.52 0.28
0 0 0 0
of U.S. dollars)
5
2017 2020 2023 2017
5 2020 2023 2017
5 2020 2023 2017
5 2020 2023
1.24 1.47 0.52 0.28
0 0 0 0
U.S. dollars)
Fintech Fitness and wellness Hardware Insurtech

2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
15 15 15 15
billions

10 10 10 10
(in of
15 15 15 15
(in billions
5 5 5 5
Total investment
10 2.13
0 0 10 0.22 0 10 0.31 0 10 0.60
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
5 5 5 5
Total investment
2.13 0.22 0.31 0.60

0 Legal tech 0 Manufacturing Marketing,
0 digital ads Medical
0 and healthcare
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
15 15 15 15
10 Legal tech 10 Manufacturing 10 Marketing, digital ads 10 Medical and healthcare
5
15 5 15 5 15 5 15 4.20
0 0.42 0 0.89 0 1.06 0
10 10 10 10
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
5 5 5 5 4.20
NLP, customer support Quantum computing Retail Semiconductor
0 0.42 0 0.89 0 1.06 0
15
2017 2020 2023 15 2017 2020 2023 15 2017 2020 202315 2017 2020 2023
10 10 10 10
8.06
5 5 5 5
15 15 1.97 15 0.67 15 1.66
0 0 0 0
2017
10 2020 2023 2017
10 2020 2023 2017 10 2020 2023 2017 10 2020 2023
8.06
5 VC 5 5 5
1.97 0.67 1.66
150 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
10
5 VC
0 0.51
15 Figure 4.3.16
2017 2020 2023
10
5
0 0.51
2017 2020 2023
Finally, 4.3.17 shows private investment in AI by focus and the European Union and United Kingdom in
area over time within select geographic regions, investment in almost all focus area categories.
highlighting how private investment priorities in AI A notable exception is facial recognition, where
differ across geographies. The significant increases 2023 investment totals were $90 million in the
observed in AI infrastructure/research/governance United States and $130 million in China. Likewise, in
were mostly driven by investment in the United semiconductor investments, China ($630 million) is
States. The United States significantly outpaces China not far behind the United States ($790 million).

Private investment in AI by focus area and geographic area, 2017–23

AI infrastructure/research/governance AR/VR AV Agritech
15 15 15 15
10 10 10 10
US, 18.22 US, 0.60 US, 1.92 US, 0.29
5 CN, 0.03 5 CN, 0.01 5 CN, 0.28 5 CN, 0.01
EU/UK, 0.01 EU/UK, 0.04 EU/UK, 0.02 EU/UK, 0.13
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
15 15 15 15
10 US, 0.76 10 US, 0.28 10 US, 4.72 10 US, 0.87
5 CN, 0.02 5 CN, 0.00 5 CN, 0.37 5 CN, 0.05
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Ed tech Energy, oil, and gas Entertainment Facial recognition
15 15 15 15
10 10 10 10
US, 0.87 US, 0.68 US, 0.28 US, 0.09
5 CN, 0.09 5 CN, 0.36 5 CN, 0.02 5 CN, 0.13
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
15 15 15 15
10 10 10 10
US, 0.96 US, 0.20 US, 0.31 US, 0.42
5 CN, 0.04 5 CN, 0.00 5 CN, 0.00 5 CN, 0.00
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
Legal tech Manufacturing Marketing, digital ads Medical and healthcare
15 15 15 15
10 10 10 10
US, 0.27 US, 0.20 US, 0.41 US, 2.74
5 CN, 0.01 5 CN, 0.53 5 CN, 0.41 5 CN, 0.48
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
15 15 15 15
10 10 10 10
US, 5.42 US, 1.84 US, 0.35 US, 0.79
5 CN, 0.60 5 CN, 0.00 5 CN, 0.12 5 CN, 0.63
0 0 0 0
2017 2020 2023 2017 2020 2023 2017 2020 2023 2017 2020 2023
VC
15
10 US, 0.23
5 CN, 0.10
EU/UK, 0.00
0
2017 2020 2023 Figure 4.3.17

Index Report 2024 4.4 Corporate Activity
This section examines the practical application of AI by corporations, highlighting industry adoption trends, how
businesses are integrating AI, the specific AI technologies deemed most beneficial, and the impact of AI adoption on
financial performance.
4.4 Corporate Activity

Industry Adoption Adoption of AI Capabilities
This section incorporates insights from McKinsey’s The latest McKinsey report reveals that in 2023, 55%
“The State of AI in 2023: Generative AI’s Breakout of organizations surveyed have implemented AI in at
Year,” alongside data from prior editions. The 2023 least one business unit or function, marking a slight
McKinsey analysis is based on a survey of 1,684 increase from 50% in 2022 and a significant jump
respondents across various regions, industries, from 20% in 2017 (Figure 4.4.1). AI adoption has spiked
company sizes, functional areas, and tenures. For the over the past five years, and in the future, McKinsey
first time, this year’s version of the McKinsey survey expects to see even greater changes happening at
included detailed questions about generative AI higher frequencies, given the rate of both AI technical
adoption and hiring trends for AI-related positions. advancement and adoption.
Share of respondents who say their organizations have adopted AI in at least one function, 2017–23
Source: McKinsey & Company Survey, 2023 | Chart: 2024 AI Index report
60%
55%
50%
% of respondents
40%
30%
20%
10%
0%
2017 2018 2019 2020 2021 2022 2023
Figure 4.4.1

Figure 4.4.2 shows the proportion of surveyed case by function among surveyed businesses in 2023
companies that use AI for specific functions. was contact-center automation (26%), followed by
Companies may report employing AI in multiple personalization (23%), customer acquisition (22%), and
capacities. The most commonly adopted AI use AI-based enhancements of products (22%).7
Most commonly adopted AI use cases by function, 2023

Contact-center automation 26%
Personalization 23%
Customer acquisition 22%
AI-based enhancements of products 22%
Creation of new AI-based products 19%
Product feature optimization 19%
Service operations optimization 19%
Sales 17%
Customer-service analytics 17% Service operations

R&D/product development
Predictive service and intervention 16% Marketing and sales
0% 4% 8% 12% 16% 20% 24%

% of respondents
Figure 4.4.2
7 Personalization is the practice of tailoring products, services, content, recommendations, and marketing to the individual preferences of customers or users. For example, personalization
can include sending tailored email messages to clients or customers to improve engagement.

With respect to the type of AI capabilities embedded of embedding was for virtual agents, also in the
in at least one function or business unit, as indicated financial services industry. Across all industries,
by Figure 4.4.3, robotic process automation had the most embedded AI technologies were NL text
the highest rate of embedding within the financial understanding (30%), robotic process automation
services industry (46%). The next highest rate (30%), and virtual agents (30%).
AI capabilities embedded in at least one function or business unit, 2023

All industries 22% 24% 13% 23% 18% 16% 16% 30% 9% 18% 10% 30% 5% 30%
Business, legal, and

15% 23% 11% 25% 22% 18% 16% 26% 8% 14% 13% 24% 4% 19%
professional services
Consumer goods/
31% 24% 7% 18% 17% 13% 7% 27% 13% 26% 2% 24% 5% 32%
retail
Industry
Financial services 26% 23% 15% 20% 20% 15% 14% 37% 7% 11% 16% 46% 8% 42%
Healthcare systems/
pharma and 14% 15% 12% 20% 10% 12% 19% 20% 8% 11% 6% 28% 4% 21%
med. products
Tech, media,
26% 33% 17% 31% 23% 23% 25% 39% 7% 29% 14% 26% 4% 35%
and telecom
C
Kn
Ph
Re
Re
Ro
Tr
Vi
ee
ig
om
AN
rt
co
i
ys
b
ow
nf
ns
ita
ge
sp
te
ua
ot
p
ic
pu
or
fe
xt
le
ee
le
lt
ne
ic
la
al
ce
r
te
dg
ar
un
ch
pr
ge
ra
ro
le
en
r
m
ni
in
tio
oc
de
ar
vi
nt
un
de
en
ng
gr
ot
sio
ni
es
rs
s
n
de
rs
tl
ap
ic
ng
ta
sa
n
ea
s
ys
rs
hs
nd
ut
ta
rn
te
in
om
nd
in
m
g
g
s
in
at
g
io
n
% of respondents (AI capability)
Figure 4.4.3

Figure 4.4.4 shows AI adoption by industry and AI function in 2023. The greatest adoption was in product and/
or service development for tech, media, and telecom (44%); followed by service operations for tech, media, and
telecom (36%) and marketing and sales for tech, media, and telecom (36%).
AI adoption by industry and function, 2023

All industries 9% 6% 25% 26% 12% 24% 8% 9%

9% 5% 28% 24% 10% 19% 13% 6%
Consumer goods/
7% 9% 31% 15% 6% 22% 2% 14%
retail
Industry
Financial services 9% 1% 22% 20% 28% 31% 14% 4%
Healthcare systems/
pharma and 5% 7% 8% 26% 7% 15% 6% 11%
med. products
Tech, media,
14% 6% 36% 44% 7% 36% 6% 9%
and telecom
Hu Ma Ma Pro Ris Se Str Su

ma rv p
nr
nu
fa
rke
ti ser duct k ice co ateg ma ply-
eso ctu ng vic an op rpo y a na cha
urc rin an e d d/o era rat nd ge
me in
g ds ev tio e
es ale elo r ns na nt
s pm nc
en e
t
% of respondents (function)
Figure 4.4.4

Figure 4.4.5 illustrates the changes in AI adoption operations (4). Conversely, across all industries, the
rates by industry and function from 2022 to 2023. functions experiencing the most significant declines in
The areas with the largest annual gains across all adoption include strategy and corporate finance (-12
industries include marketing and sales (18 percentage percentage points), risk (-9), and human resources (-2).
points), product/service development (14), and service
Percentage point change in responses of AI adoption by industry and function, 2022 vs. 2023
All industries -2% 0% 18% 14% -9% 4% -12% 0%

-2% -5% 19% 16% -6% -1% -6% -6%
Consumer goods/
-7% 5% 28% 11% -9% -9% -27% 3%
retail
Industry
Financial services 8% -7% 15% -11% 11% 7% -9% 2%
Healthcare systems/
pharma and -10% 0% 6% 22% -15% 3% -2% 3%
med. products
Tech, media,
8% 0% 32% 37% -31% 15% -19% 1%
and telecom
Hu Ma Ma Pro Ris Se Str Su

rv p
ma
nr
nu
fa
rke
ti ser duct k ice co ateg ma ply-
eso ctu ng vic an op rpo y a na cha
urc rin an e d d/o era rat nd ge
me in
g ds ev tio e
es ale elo r ns na nt
s pm nc
en e
t
Percentage point change in responses (function)
Figure 4.4.5

Figure 4.4.6 shows the percentage of surveyed greatest degree. Notably, a significant portion of
respondents across industries who reported hiring for respondents within the financial services (44%) and
various AI positions. Across all industries, respondents the tech, media, and telecom sectors (44%) reported a
reported hiring data engineers (36%), AI data scientists high rate of hiring machine-learning engineers.
(31%), and machine-learning engineers (31%) to the
AI-related roles that organizations hired in the last year by industry, 2023
All industries 36% 31% 31% 28% 25% 24% 22% 21% 11% 7%

33% 28% 25% 24% 20% 21% 22% 21% 8% 4%
Consumer goods/
27% 14% 15% 20% 14% 20% 21% 13% 11% 3%
retail
Industry
Healthcare systems/
pharma and 19% 32% 16% 23% 19% 25% 13% 19% 3% 9%
med. products
Financial services 41% 37% 44% 24% 29% 19% 26% 26% 17% 10%
Tech, media,
39% 41% 44% 38% 36% 27% 25% 27% 17% 13%
and telecom
Da AI Ma So Da Da De AI Tra Pro
ta da chin ft wa ta ta sig pro n sla m
en ta re arc vis ns du tor pt
gin sci el en hit ua pe ct e ng
ee enti ea ec liza cia ow s ine
rs sts rni gin ts tio list n
ng ee ns s ers ers
en rs pe /m
gin cia a na
ee list ge
rs s rs
% of respondents (AI-related role)
Figure 4.4.6

Organizations have experienced both cost reductions number of respondents reporting cost decreases
and revenue increases due to AI adoption (Figure (42%) and revenue gains (59%) as a result of using AI,
4.4.7). The areas where respondents most frequently suggesting that AI tangibly helps businesses improve
reported cost savings were manufacturing (55%), their bottom line. Comparing this and last year’s
service operations (54%), and risk (44%). For revenue averages reveals a 10 percentage point increase for
gains, the functions benefiting the most from AI cost decreases and a four percentage point decrease
included manufacturing (66%), marketing and sales for revenue increases across all activities.
(65%), and risk (64%). Figure 4.4.7 shows a substantial
Cost decrease and revenue increase from AI adoption by function, 2022

Decrease by <10% Decrease by 10–19% Decrease by ≥20% Increase by >10% Increase by 6–10% Increase by ≤5%
Service operations 54% 34% 12% 8% 10% 14% 33% 57%
Manufacturing 55% 41% 14% 16% 16% 34% 66%
Human resources 40% 26% 10% 9% 34% 17% 60%
Marketing and sales 41% 26% 11% 8% 19% 38% 65%

Function
Risk 44% 26% 13% 13% 16% 35% 64%
Supply chain management 33% 24% 9% 23% 30% 56%
R&D/product and/or service development 31% 18% 9% 12% 25% 24% 61%
Strategy and corporate finance 31% 19% 7% 10% 16% 32% 58%
Average across all activities 42% 28% 10% 6% 18% 35% 59%
% of respondents
Figure 4.4.7

Figure 4.4.8 presents global AI adoption by adoption grew by 9 percentage points. North America
organizations, segmented by world regions. In 2023, remains the leader in AI adoption. Greater China also
every surveyed region reported higher AI adoption experienced a significant increase in AI adoption rates,
rates than in 2022. The most significant year-over- growing by 7 percentage points over the previous year.
year growth was seen in Europe, where organization
AI adoption by organizations in the world, 2022 vs. 2023

55%
All geographies
50%
58%
Asia-Paci c
55%
57%
Europe
48%
61%
North America
59%
Greater China 48%

(incl. Hong Kong,
Taiwan) 41%
Developing Markets 2023

(incl. India, 49%
2022
Latin America, 44%
MENA)
0% 10% 20% 30% 40% 50% 60%
% of respondents
Figure 4.4.8

Adoption of Generative AI Capabilities The most frequent application is generating initial

How are organizations deploying generative AI? 8
drafts of text documents (9%), followed closely
Figure 4.4.9 highlights the proportion of total by personalized marketing (8%), summarizing text
surveyed respondents that report using generative AI documents (8%), and creating images and/or videos
for a particular function. It is possible for respondents (8%). Most of the reported leading use cases are within
to indicate that they deploy AI for multiple purposes. the marketing and sales function.
Most commonly adopted generative AI use cases by function, 2023

Creating rst draft of text documents (e.g.,

9%
advertising copy, technical sales content)
Personalized marketing 8%
Summarization of text documents 8%
Creating images and/or videos 8%
Identifying trends in customer needs

7%
and/or preferences
Use of chatbots (e.g., for customer service) 6%
Use of chatbots (e.g., inside sales) 6%
Identifying and/or forecasting service trends,

5%
insights, and/or anomalies
Creating rst draft of documents
5% Service operations
(e.g., customer service call scripts)
R&D/product development
Drafting of technical documents 5% Marketing and sales
0% 2% 4% 6% 8% 10%
% of respondents
Figure 4.4.9
8 The adoption of generative AI capabilities is presented separately from the charts on the adoption of general AI capabilities earlier in the chapter, as it was a separate question in the survey.

Figure 4.4.10 compares the proportion of respondents AI within organizations shows similar patterns of
who report using AI versus specifically generative distribution. Overall, general AI still dominates. The
AI for a given function. Figure 4.4.10 illustrates the
9
most common functional applications of generative AI
degree to which generative AI has permeated general are in marketing and sales (14%), product and/or service
AI usage patterns among businesses. When analyzed development (13%), and service operations (10%).
at the functional level, the use of AI and generative
AI vs. generative AI adoption by function, 2023

24%
Product and/or service development
13%
23%
Marketing and sales
14%
23%
Service operations
10%
10%
Risk
Function
4%
9%
Strategy and corporate nance
4%
9%
Human resources
3%
9%
Supply-chain management
3%
AI
8% GenAI
Manufacturing
2%
0% 5% 10% 15% 20% 25%

% of respondents
Figure 4.4.10
9 While all generative AI use cases are considered general AI use cases, not all general AI use cases qualify as generative AI use cases.

Figure 4.4.11 depicts the variation in generative AI of businesses across all geographies (55%) that
usage among businesses across different regions of reported using AI, which was documented earlier in
the world. Across all regions, the adoption rate of Figure 4.4.8. North America leads in adoption at 40%,
generative AI by organizations stands at 33%. This followed closely by developing markets (including
amount is meaningfully lower than the percentage India, Latin America, and the MENA region).
Generative AI adoption by organizations in the world, 2023

All geographies 33%
Asia-Paci c 30%
Europe 31%
North America 40%
Greater China
(incl. Hong Kong, 31%
Taiwan, Macau)
Developing Markets
(incl. India,
33%
Latin America,
MENA)
0% 5% 10% 15% 20% 25% 30% 35% 40%
% of respondents
Figure 4.4.11

Most popular AI developer tools among professional

Use of AI by Developers developers, 2023
Computer developers are among the most likely GitHub Copilot 56.04%
individuals to use AI in professional settings. As Tabnine 11.74%
AI becomes more integrated into the economy, AWS CodeWhisperer 4.91%
tracking how developers utilize and perceive AI is Other 1.83%
becoming increasingly important.
AI developer tool
Synk Code 1.33%
Stack Overflow, a question-and-answer website Codeium 1.07%
for computer programmers, conducts an annual Whispr AI 0.90%
survey of computer developers. The 2023 survey, Replit Ghostwriter 0.47%
with responses from over 90,000 developers, Mintlify 0.44%
included, for the first time, questions on AI tool Adrenaline 0.30%
usage—detailing how developers use these tools, Rubber Duck.AI 0.25%
which tools are favored, and their perceptions of 0% 10% 20% 30% 40% 50% 60%
% of respondents
the tools used. 10
Figure 4.4.12
Preference
Figure 4.4.12 highlights the proportion of Most popular AI search tools among professional
developers, 2023
surveyed respondents who report using a specific Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
AI developer tool. According to the survey, ChatGPT 83.25%
56.0% of respondents report using GitHub’s Bing AI 18.80%
Copilot, followed by Tabnine (11.7%) and AWS WolframAlpha 11.15%
Google Bard AI 9.13%

CodeWhisperer (4.9%).
Phind 3.11%
AI search tool
Figure 4.4.13 highlights which AI search tools, You.com 2.17%

software applications that use AI to enhance Other 1.21%
search functionality, are most favored by AI Perplexity AI 0.97%
developers. The most popular AI search tools Quora Poe 0.77%
according to professional developers were Neeva AI 0.36%
ChatGPT (83.3%), followed by Bing AI (18.8%) and Andi 0.20%
WolframAlpha (11.2%). Metaphor 0.13%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

% of respondents
Figure 4.4.13
10 The survey was conducted in May 2023 and, therefore, may not account for the launch of more recently released AI tools such as Gemini and Claude 3.

Cloud platforms are crucial elements of the AI Top 10 most popular cloud platforms among
professional developers, 2023
ecosystem, providing cloud computing services Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
that allow developers to perform computationally Amazon Web Services 53.08%

intensive AI work. Figure 4.4.14 reports the Microsoft Azure 27.80%
proportion of respondents that have reported
Google Cloud 23.95%
extensively using a specific cloud platform.
Firebase 15.39%
According to the Stack Overflow survey, Amazon
Cloud platform
Cloud are 14.99%
Web Services (AWS) is the most commonly used
cloud platform among professional developers, with Digital Ocean 14.11%
53.1% reporting regular use. Microsoft Azure follows Heroku 11.77%
at 27.8%, with Google Cloud at 24.0%. Vercel 10.31%
Workflow Netlify 8.48%
Figure 4.4.15 explores the current and future VMware 6.45%
integration of AI in developers’ workflows. A significant 0% 10% 20% 30% 40% 50% 60%
% of respondents
majority of respondents, 82.6%, regularly use AI for Figure 4.4.14
code writing, followed by 48.9% for debugging and
assistance, and 34.4% for documentation. While only
23.9% currently use AI for code testing, 55.2% express
interest in adopting AI for this purpose.
Adoption of AI tools in development tasks, 2023

82.55%
Writing code 23.72%
4.48%
48.89%
Debugging and getting help 40.66%
6.37%
34.37%
Documenting code 50.24%
8.07%
30.10%
Learning about a codebase 48.97%
Development task
13.09%
23.87%
Testing code 55.17%
11.44%
13.52%
Project planning 38.54%
29.77%
10.09%
Committing and reviewing code 49.51%
22.95%
4.74% Currently using
Deployment and monitoring 45.44%
28.33% Interested in using
3.65% Not interested in using
Collaborating with teammates 29.98%
41.38%
0% 10% 20% 30% 40% 50% 60% 70% 80%
% of respondents Figure 4.4.15

When asked about the primary advantages of AI tools Primary bene ts of AI tools for professional
developers, 2023
in professional development, developers responded Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
with increased productivity (32.8%), accelerated

Increase
32.81%
learning (25.2%), and enhanced efficiency (25.0%) productivity
(Figure 4.4.16).
Speed up
25.17%
Figure 4.4.17 displays the sentiments professional learning
Primary bene�t
developers have toward AI tools. A significant
Greater
majority of developers hold a positive view of AI e�ciency
24.96%
tools, with 27.7% feeling very favorably and 48.4%

favorably inclined toward them. Only 3.2% express Improve accuracy
13.31%
in coding
unfavorable opinions about AI development tools.
Improve
3.75%
collaboration
Figure 4.4.18 highlights the reported level of trust
developers have in AI tools. More developers trust AI 0% 10% 20% 30% 40%
% of respondents
tools than distrust them, with 42.2% reporting high Figure 4.4.16
or moderate trust in these technologies. In contrast,
a smaller proportion, 27.2%, express some level of
distrust or high distrust in AI tools.
Sentiment toward AI tools in development among Trust level in AI tool output accuracy, 2023
professional developers, 2023 Source: Stack Over ow Developer Survey, 2023 | Chart: 2024 AI Index report
Very favorable 27.71% Highly distrust 5.46%
Favorable 48.41%
Somewhat distrust 21.71%
Indi erent 16.80%

Trust level
Sentiment
Neither trust
30.68%
nor distrust
Unsure 3.88%
Somewhat trust 39.30%

Unfavorable 2.78%
Highly trust 2.85%

Very unfavorable 0.42%
0% 10% 20% 30% 40% 50% 0% 10% 20% 30% 40%

% of respondents % of respondents

AI’s Labor Impact

First, AI has been shown to enable workers to
Over the last five years, the growing integration complete tasks more quickly and produce higher
of AI into the economy has sparked hopes of quality work. A meta-review by Microsoft, which
boosted productivity. However, finding reliable data aggregated studies comparing the performance of
confirming AI’s impact on productivity has been workers using Microsoft Copilot or GitHub’s Copilot—
difficult because AI integration has historically been LLM-based productivity-enhancing tools—with those
low. In 2023, numerous studies rigorously examined who did not, found that Copilot users completed
AI’s productivity impacts, offering more conclusive tasks in 26% to 73% less time than their counterparts
evidence on the topic without AI access (Figure 4.4.19).11
Cross-study comparison of task completion speed of Copilot users

Source: Cambon et al., 2023 | Chart: 2024 AI Index report
Copilot information
73% 73%
retrieval study
Copilot common
71% 71%
task study
LLM-based
Task
47% 47%
search study
GitHub Copilot
44% 44%
study
Copilot in teams
26% 26%
meeting study
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Copilot users’ task completion speed relative to baseline (100%)
Figure 4.4.19
11 This meta-review analyzed separate surveys of workers using Microsoft’s Copilot and GitHub’s Copilot tools. These are separate tools. Microsoft Copilot is a broader LLM-based
productivity improvement tool, while GitHub’s Copilot is a code-writing assistant.

Similarly, a Harvard Business School study revealed E ect of GPT-4 use on a group of consultants
Source: Dell’Acqua et al., 2023 | Chart: 2024 AI Index report
that consultants with access to GPT-4 increased
40.00%
their productivity on a selection of consulting tasks
Average improvement compared to control group (%)

40%
by 12.2%, speed by 25.1%, and quality by 40.0%, 35%

compared to a control group without AI access
30%
(Figure 4.4.20). Likewise, National Bureau of
25.10%
Economic Research research reported that call- 25%
center agents using AI handled 14.2% more calls per

20%
hour than those not using AI (Figure 4.4.21).
15%
12.20%
10%
5%
0%
Productivity Speed Quality
Performance metrics on consulting tasks
Figure 4.4.20
Impact of AI on customer support agents

Source: Brynjolfsson et al., 2023 | Chart: 2024 AI Index report
2.97
3.00
2.60
Hourly chats per customer support agent
2.50
2.00
1.50
1.00
0.50
0.00
Used AI Did not use AI
Figure 4.4.21

A study on the impact of AI in legal analysis showed compared to the control group, in terms of both work
that teams with GPT-4 access significantly improved in quality and time efficiency across a range of tasks.
efficiency and achieved notable quality improvements Although AI can assist with legal tasks, there are
in various legal tasks, especially contract drafting. also widespread reports of LLM hallucinations being
Figure 4.4.22 illustrates the improvements observed especially pervasive in legal tasks.
in the group of law students who utilized GPT-4,
E ect of GPT-4 use on legal analysis by task

Source: Choi et al., 2023 | Chart: 2024 AI Index report
Contract drafting 0.24 32.10%
Complaint drafting 0.17 24.10%

Task
EE handbook 0.07 21.10%
Client memo -0.07 11.80%
−0.10 0.00 0.10 0.20 0% 10% 20% 30%

Improvement compared to control group (on a 4.0 scale) Time saved (%) compared to control group
Figure 4.4.22

Second, AI access appears to narrow the performance Lower-skilled (bottom half) participants exhibited
gap between low- and high-skilled workers. According a 43.0% improvement, while higher-skilled (top
to the aforementioned Harvard Business School half) participants showed a 16.5% increase. While
study, both groups of consultants experienced higher-skilled workers using AI still performed better
performance boosts after adopting AI, with notably than their lower-skilled, AI-using counterparts, the
larger gains for lower-skilled consultants using AI disparity in performance between low- and high-
compared to higher-skilled consultants. Figure 4.4.23 skilled workers was markedly lower when AI was
highlights the performance improvement across a utilized compared to when it was not.
set of tasks for participants of varying skill levels:
Comparison of AI work performance e ect by worker skill category

Source: Dell’Acqua et al., 2023 | Chart: 2024 AI Index report
Bottom-half skilled participants Top-half skilled participants
6.06
6 5.79 6
5.20
5 5
Score (average on a 1–10 scale)
Score (average on a 1–10 scale)
4.05
4 4
3 3
2 2
1 1
0 0
Baseline task Experimental task Baseline task Experimental task
Figure 4.4.23

Finally, while AI tends to enhance quality and performed worse than those who received “bad AI,”
productivity, overreliance on the technology can which was capable but known to make errors (Figure
impair worker performance. A study focused on 4.4.24). The performance difference between the
professional recruiters reviewing résumés found that latter groups was -1.08 points. The study theorizes
receiving any AI assistance improved task accuracy that recruiters using “good AI” became complacent,
by 0.6 points compared to not receiving AI assistance. overly trusting the AI’s results, unlike those using “bad
However, recruiters who were provided with “good AI,” who were more vigilant in scrutinizing AI output.
AI”—believed to be high-performing—actually
E ects on job performance of receiving di erent types of AI advice

Source: Dell’Acqua, 2023 | Chart: 2024 AI Index report
Any AI 0.60
Type of AI advice
Good vs.
-1.08
bad AI
−1.20 −1.00 −0.80 −0.60 −0.40 −0.20 0.00 0.20 0.40 0.60
Change in accuracy score
Figure 4.4.24

Earnings Calls Aggregate Trends

The following section presents data from Quid, The past year has seen a significant rise in the mention
which uses natural language processing tools to of AI in Fortune 500 company earnings calls. In 2023,
analyze trends in corporate earnings calls. Quid AI was mentioned in 394 earnings calls (nearly 80%
analyzed all 2023 earnings calls from Fortune 500 of all Fortune 500 companies), up from 266 mentions
companies, identifying all mentions of “artificial in 2022 (Figure 4.4.25). Since 2018, mentions of AI in
intelligence,” “AI,” “machine learning,” “ML,” and Fortune 500 earnings calls have nearly doubled.
“deep learning.”
Number of Fortune 500 earnings calls mentioning AI, 2018–23

400 394
300
Number of earnings calls
200
100
0
2018 2019 2020 2021 2022 2023
Figure 4.4.25

Specific Themes
Mentions of AI in Fortune 500 earnings calls were Mentions of generative AI grew from 0.31% in 2022.
associated with a wide range of themes in 2023. The The next most mentioned theme was investments in AI,
most frequently cited theme, appearing in 19.7% of expansion of AI capabilities, and AI growth initiatives
all earnings calls, was generative AI (Figure 4.4.26). (15.2%), followed by company/brand AIs (7.6%).
Themes of AI mentions in Fortune 500 earnings calls, 2018 vs. 2023

19.73% (+6351%)
Generative AI
0.31%
Investments in AI, expansion of 15.21% (+5%)
capabilities, and growth initiatives 14.53%
7.64% (-2%)
Company/brand AIs
7.80%
7.36% (-25%)
Data storage and management
9.79%
5.68% (+49%)
Advertising and marketing
3.82%
AI and machine learning for enhanced 4.96% (-32%)
customer experience and growth 7.34%
4.76% (-8%)
Process automation
5.20%
4.36% (-51%)
Data center GPU
8.87%
3.84%
Large language models
3.56% (-31%)
Cloud platforms
5.20%
2.76% (-52%)
Deep learning
5.81%
2.76% (-40%)
Healthcare and medical practices
4.59%
2.36% (-9%)
Business integration
2.60%
2.28% (-21%)
Revenue growth
2.91%
2.24% (+33%)
Data processing
1.68%
2.20% (-65%)
Data analytics
6.27%
1.88% (-35%)
Edge intelligence
2.91%
1.80% (-9%)
Adobe experience
1.99%
1.76% (+28%)
Customer support
1.38%
2023
1.48% (-70%) 2018
Autonomous vehicles
4.89%
1.36% (-36%)
Digital transformation
2.14%
0% 2% 4% 6% 8% 10% 12% 14% 16% 18% 20% 22%
Theme mentioned (% of total)
Figure 4.4.26

Highlight:
Projecting AI’s Economic Impact

In 2023, some newly published analyses aimed to dollar amounts. The report projects that the high-
project and better understand the future economic tech industry could see its revenue increase by
impact of AI. A recent McKinsey report examined 4.8% to 9.3%, corresponding to an additional $240
the degree to which generative AI might impact billion to $460 billion, as a result of generative AI.
revenues across industries. Figure 4.4.27 features Banking, pharmaceuticals and medical products,
the projected impact range per industry, both as and education are other industries estimated to
a percentage of total industry revenue and in total grow due to the adoption of generative AI.
Anticipated impact of generative AI on revenue by industry, 2023

Source: McKinsey & Company, 2023 | Chart: 2024 AI Index report
4.80% 9.30% 240 460

High tech
2.80% 4.70% 200 340
Banking
Pharmaceuticals and 2.60% 4.50% 60 110

medical products
2.20% 4.00% 120 230
Education
2.30% 3.70% 60 100
Telecommunications
1.80% 3.20% 150 260
Healthcare
1.80% 2.80% 50 70
Insurance
1.80% 3.10% 80 130
Media and entertainment
1.40% 2.40% 170 290
Advanced manufacturing
1.40%2.30% 160 270
Consumer packaged goods
Advanced electronics and 1.30% 2.30% 100 170

semiconductors
0% 2% 4% 6% 8% 10% 0 100 200 300 400

% of total industry revenue Total industry revenue (in billions U.S. dollars)
Figure 4.4.27

Highlight:
Projecting AI’s Economic Impact (cont’d)

The McKinsey survey cited above, the “State of size would decrease (Figure 4.4.28). Only 15% felt
AI in 2023,” asked business professionals about that generative AI would lead to increases in the
their expectations of AI’s impact on organizational number of employees. There were also widespread
workforces in the next three years. Although a predictions that AI would lead to significant
large proportion (30%) expected little to no change employee reskilling.
in the number of employees, 43% felt that staff
Expectations about the impact of AI on organizations’ workforces in the next 3 years, 2023
Don’t know 12%

Don’t know 8%
Increase by >20% 3%
Share of employees expected to be reskilled
Change in the number of employees
Increase by 11–20% 4% >20% 38%
Increase by 3–10% 8%
11–20% 18%
Little or no change 30%
Decrease by 3–10% 25% 6–10% 17%
Decrease by 11–20% 10%
≤5% 20%
Decrease by >20% 8%
0% 10% 20% 30% 40% 0% 10% 20% 30% 40%

% of respondents % of respondents
Figure 4.4.28

Highlight:

Perspectives differ on the anticipated effect of supply chain management (45%), and HR (41%),
generative AI on employment per business function. are especially likely, according to respondents, to
Certain functions, like service operations (54%), experience decreasing employment (Figure 4.4.29).
Anticipated e ect of generative AI on number of employees in the next 3 years by business function, 2023
Decrease Little or no change Increase Don’t know
Product and/or service development 30% 35% 20% 15%
Risk 31% 37% 20% 12%
Strategy and corporate nance 37% 28% 25% 10%
Marketing and sales 39% 33% 17% 12%
Manufacturing 40% 33% 12% 15%
HR 41% 30% 17% 11%
Supply chain management 45% 32% 14% 9%
Service operations 54% 23% 12% 10%
0% 20% 40% 60% 80% 100%

% of respondents
Figure 4.4.29

Highlight:

Finally, a Goldman Sachs investment report Although the report projects that many countries will
released in 2023 projects that, globally, AI could benefit from AI-driven productivity growth, certain
lead to productivity growth over 10-year periods geographic areas, like Hong Kong, Israel, and Japan,
ranging between 1.0% and 1.5% (Figure 4.4.30). are especially well-positioned.
Estimated impact of AI adoption on annual productivity growth over a ten-year period

Source: Goldman Sachs Global Investment Research, 2023 | Chart: 2024 AI Index report
1.50
Global average
Percentage points
1.00
0.50
0.00
Ho Isr Ja Un Sw Un Sin Ar Ch So Bra Ma Ta So Me
ng ae pa ite ed ite ga ge ile uth zil lay iw uth xic
Ko l n d en d po nti Ko sia an Af o
n Kin Sta re na re ric
g gd tes a a
om
Figure 4.4.30

Index Report 2024 4.5 Robot Installations
The deployment of robots equipped with AI-based software technologies offers a window into the real-world
application of AI-ready infrastructure. This section draws on data from the International Federation of Robotics (IFR),
a nonprofit organization dedicated to advancing the robotics industry. Annually, the IFR publishes the World Robotics
Reports, which track global robot installation trends.12
4.5 Robot Installations

Aggregate Trends
The following section includes data on the installation Figure 4.5.1 reports the total number of industrial
and operation of industrial robots, which are defined robots installed worldwide by year. In 2022, industrial
as an “automatically controlled, reprogrammable, robot installations increased slightly, with 553,000
multipurpose manipulator, programmable in three units marking a 5.1% increase from 2021. This growth
or more axes, which can be either fixed in place or reflects more than a threefold rise in installations
mobile for use in industrial automation applications.” since 2012.
Number of industrial robots installed in the world, 2012–22

Source: International Federation of Robotics (IFR), 2023 | Chart: 2024 AI Index report
553
500
Number of industrial robots installed (in thousands)
400
300
200
100
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.5.1
12 Due to the timing of the IFR report, the most recent data is from 2022. Every year, the IFR revisits data collected for previous years and will occasionally update the data if more accurate
figures become available. Therefore, some of the data reported in this year’s report might differ slightly from data reported in previous years.

The global operational stock of industrial robots reached 3,904,000 in 2022, up from 3,479,000 in 2021 (Figure
4.5.2). Over the past decade, both the installation and utilization of industrial robots have steadily increased.
Operational stock of industrial robots in the world, 2012–22

4,000 3,904
3,500
Number of industrial robots (in thousands)
3,000
2,500
2,000
1,500
1,000
500
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.5.2

Industrial Robots: Traditional vs. collaborative robots due to their safety, flexibility,
Collaborative Robots scalability, and ability to learn iteratively.
There is a distinction between traditional robots,
Figure 4.5.3 reports the number of industrial robots
which operate for humans, and collaborative robots,
installed in the world by type. In 2017, collaborative
designed to work alongside them. The robotics
robots accounted for just 2.8% of all new industrial
community is increasingly enthusiastic about
robot installations. By 2022, the number rose to 9.9%.
Number of industrial robots installed in the world by type, 2017–22

Traditional 553
Collaborative 526
55
500 42
424
400
400 387 389
300
484 498
200 389 405

366 363
100
0
2017 2018 2019 2020 2021 2022
Figure 4.5.3

By Geographic Area
Country-level data on robot installations can times more than Japan’s 50,400 and 7.4 times more
suggest which nations prioritize robot integration than the United States’ 39,500 (Figure 4.5.4). South
into their economies. In 2022, China led the world Korea and Germany followed with 31,170 and 25,600
with 290,300 industrial robot installations, 5.8 installations, respectively.
Number of industrial robots installed by country, 2022

China 290.30
Japan 50.40
United States 39.50
South Korea 31.70
Germany 25.60
Italy 11.50
Taiwan 7.80
France 7.40
Mexico 6.00
Singapore 5.90
India 5.40
Spain 3.80
Turkey 3.70
Thailand 3.30
Canada 3.20
0 30 60 90 120 150 180 210 240 270 300

Figure 4.5.4

Since surpassing Japan in 2013 as the leading installer of industrial robots, China has significantly widened the
gap with the nearest country. In 2013, China’s installations accounted for 20.8% of the global total, a share that
rose to 52.4% by 2022 (Figure 4.5.5).
Number of new industrial robots installed in top 5 countries, 2012–22

300
290, China
250
200
150
100
50, Japan
32, South Korea
26, Germany
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 4.5.5

Since 2021, China has installed more industrial robots than the rest of the world combined, with the gap widening
further in the last year (Figure 4.5.6). This increasing gap underscores China’s growing dominance in industrial
robot installations.
Number of industrial robots installed (China vs. rest of the world), 2016–22
300
290, China
263, Rest of the world

250
200
150
100
50
0
2016 2017 2018 2019 2020 2021 2022
Figure 4.5.6

According to the IFR report, most countries reported an annual increase in industrial robot installations from 2021
to 2022 (Figure 4.5.7). The countries with the highest growth rates include Singapore (68%), Turkey (22%), and
Mexico (13%). Canada (-24%), Taiwan (-21%), Thailand (-18%), and Germany (-1%) reported installing fewer robots
in 2022 than in 2021.
Annual growth rate of industrial robots installed by country, 2021 vs. 2022
Singapore 68%
Turkey 22%
Mexico 13%
France 13%
Spain 10%
United States 10%
Japan 9%
Italy 8%
China 5%
India 4%
South Korea 1%
Germany -1%
Thailand -18%
Taiwan -21%
Canada -24%
−20% −10% 0% 10% 20% 30% 40% 50% 60% 70%

Annual growth rate of industrial robots installed
Figure 4.5.7

Country-Level Data on Service Robotics cleaning. In 2022, more service robots were installed
Another important class of robots are service robots, for every application category than in 2021, with the
which the ISO defines as a robot “that performs exception of medical robotics (Figure 4.5.8). More
useful tasks for humans or equipment excluding specifically, the number of service robots installed in
industrial automation applications.” Such robots can, 13
hospitality and in transportation and logistics increased
for example, be used in medicine and professional 2.3 and 1.4 times, respectively.
Number of professional service robots installed in the world by application area, 2021 vs. 2022
8
Agriculture
7
25
Hospitality
11
9
Medical robotics
10
7
Professional cleaning 2022
6
2021
86
Transportation and logistics
60
0 10 20 30 40 50 60 70 80 90
Number of professional service robots installed (in thousands)
Figure 4.5.8
13 A more detailed definition can be accessed here.

As of 2022, the United States leads in professional and France also have significant numbers of robot
service robot manufacturing, with approximately manufacturers, with 85,000, 72,000, and 53,000,
2.06 times more manufacturers than China, the respectively. In most surveyed countries, the majority
next leading nation (Figure 4.5.9). Germany, Japan, of these manufacturers are established incumbents.
Number of professional service robot manufacturers in top countries by type of company, 2022
218
200
Number of professional service robot manufacturers
150
106
100 204
Startups
85 Incumbents
72 Unknown
53 52
50 99
38 35
74 33
67 27
48 50
36 32 28
26
0
1
United States China Germany Japan France South Korea Canada Switzerland Russia United Kingdom
Figure 4.5.9

Sectors and Application Types installations with 157,000 units, closely followed by
Figure 4.5.10 shows the number of industrial robots the automotive sector with 136,000. Both sectors have
installed in the world by sector from 2020 to 2022. seen continuous growth in industrial robot installations
Globally, the electrical/electronics sector led in robot since 2020.
Number of industrial robots installed in the world by sector, 2020–22

61
All others 55
37
136
Automotive 117
83
157
Electrical/electronics 143
112
15
Food 15
12
66
Metal and machinery 68
41
24
Plastic and chemical products 25
19 2022
94 2021
Unspeci ed 102 2020
85
0 20 40 60 80 100 120 140 160

Figure 4.5.10

Figure 4.5.11 shows the number of industrial robots welding (87,000) and 4.4 times more than for assembly
installed in the world by application from 2020 to (61,000). Except for processing, every application
2022. Data suggests that handling is the predominant category witnessed an increase in robot installations in
application. In 2022, 266,000 industrial robots were 2022 compared to 2020.
installed for handling tasks, 3.1 times more than for
Number of industrial robots installed in the world by application, 2020–22

70
All others/unspeci ed 78
60
61
Assembling 63
47
35
Clean room 32
32
28
Dispensing 11
8
266
Handling 241
169
5
2022
Processing 7
5 2021
2020
87
Welding 94
68
0 20 40 60 80 100 120 140 160 180 200 220 240 260 280
Figure 4.5.11

China vs. United States

Figure 4.5.12 illustrates the number of industrial robots installed across various sectors in China over the past three
years. In 2022, the leading sectors for industrial robot installations in China were electrical/electronics (100,000),
automotive (73,000), and metal and machinery (31,000).
Number of industrial robots installed in China by sector, 2020–22

73
All others/unspeci ed 75
51
73
Automotive 58
30
100
Electrical/electronics 93
66
5
Food 4
3
31
Metal and machinery 37
19
1
Pharma/cosmetics 1
1 2022
6 2021
Rubber and plastics 6 2020
5
0 10 20 30 40 50 60 70 80 90 100
Figure 4.5.12

In 2022, the U.S. automotive industry led in industrial robot installations with 14,500 units, significantly exceeding
its 2021 figure (Figure 4.5.13). Except for the electronics sector, every other sector saw fewer robot installations in
2022 than in 2021.
Number of industrial robots installed in the United States by sector, 2020–22

5.80
All others 5.30
2.60
14.50
Automotive 9.90
10.50
3.70
Electrical/electronics 3.00
3.70
2.40
Food 3.40
2.70
3.90
Metal and machinery 4.20
2.30
3.10
Plastic and chemical products 3.60
2.70 2022
6.20 2021
Unspeci ed 6.60 2020
6.30
0 2 3 5 6 8 9 11 12 14 15
Figure 4.5.13

Index Report 2024
Science and
Medicine
CHAPTER 5:
Index Report 2024 Science and
Medicine
Preview
Overview 298
Chapter Highlights 299
5.1 Notable Scientific Milestones 300

AlphaDev 300
FlexiCubes 301
Synbot 303
GraphCast 304
GNoME 305
Flood Forecasting 306
5.2 AI in Medicine 307

Notable Medical Systems 307
SynthSR 307
Coupled Plasmonic Infrared Sensors 309
EVEscape 310
AlphaMissence 312
Human Pangenome Reference 313
Clinical Knowledge 314
MedQA 314
Highlighted Research: GPT-4 Medprompt 315
Highlighted Research: MediTron-70B 317
Diagnosis 318
Highlighted Research: CoDoC 318
Highlighted Research: CT Panda 319
Other Diagnostic Uses 320
FDA-Approved AI-Related Medical Devices 321
Administration and Care 323
Highlighted Research: MedAlign 323

CHAPTER 5:
Medicine
Overview
This year’s AI Index introduces a new chapter on AI in science and medicine in recognition of
AI’s growing role in scientific and medical discovery. It explores 2023’s standout AI-facilitated
scientific achievements, including advanced weather forecasting systems like GraphCast
and improved material discovery algorithms like GNoME. The chapter also examines medical
AI system performance, important 2023 AI-driven medical innovations like SynthSR and
ImmunoSEIRA, and trends in the approval of FDA AI-related medical devices.

CHAPTER 5:
Medicine
Chapter Highlights
2. AI helps medicine take significant strides forward. In 2023, several significant medical systems
were launched, including EVEscape, which enhances pandemic prediction, and AlphaMissence, which assists in
AI-driven mutation classification. AI is increasingly being utilized to propel medical advancements.
3. Highly knowledgeable medical AI has arrived. Over the past few years, AI systems have shown
remarkable improvement on the MedQA benchmark, a key test for assessing AI’s clinical knowledge. The
standout model of 2023, GPT-4 Medprompt, reached an accuracy rate of 90.2%, marking a 22.6 percentage
point increase from the highest score in 2022. Since the benchmark’s introduction in 2019, AI performance on
MedQA has nearly tripled.
4. The FDA approves more and more AI-related medical devices. In 2022, the FDA approved 139
AI-related medical devices, a 12.1% increase from 2021. Since 2012, the number of FDA-approved AI-related medical
devices has increased by more than 45-fold. AI is increasingly being used for real-world medical purposes.

Artificial Intelligence Chapter 5: Science and Medicine
Index Report 2024 5.1 Notable Scientific Milestones
This section highlights significant AI-related scientific breakthroughs of 2023 as chosen by the AI Index Steering Committee.
5.1 Notable Scientific Milestones

AlphaDev fundamental sorting algorithms on short sequences
AlphaDev discovers faster sorting algorithms such as Sort 3, Sort 4, and Sort 5 (Figure 5.1.1). Some
AlphaDev is a new AI reinforcement learning system of the new algorithms discovered by AlphaDev have
that has improved on decades of work by scientists been incorporated into the LLVM standard C++ sort
and engineers in the field of computational algorithmic library. This marks the first update to this part of
enhancement. AlphaDev developed algorithms with the library in over 10 years and is the first addition
fewer instructions than existing human benchmarks for designed using reinforcement learning.
AlphaDev vs. human benchmarks when optimizing for algorithm length

Source: Mankowitz et al., 2023 | Chart: 2024 AI Index report
120 115
AlphaDev Human benchmarks
100
80
Instruction length
66
63
60
46
42
40 37
33
31
28 28 27
21
20 17 18
0
Sort 3 Sort 4 Sort 5 VarSort3 VarSort4 VarSort5 VarInt
Algorithm
Figure 5.1.1

FlexiCubes quality. FlexiCubes addresses some of these

3D mesh optimization with FlexiCubes limitations by employing AI for gradient-based
3D mesh generation, crucial in computer graphics, optimization and adaptable parameters (Figure
involves creating a mesh of vertices, edges, and 5.1.2). This method allows for precise, localized
faces to define 3D objects. It is key to video games, mesh adjustments. Compared to other leading
animation, medical imaging, and scientific visualization. methods that utilize differentiable isosurfacing for
Traditional isosurface extraction algorithms often mesh reconstruction, FlexiCubes achieves mesh
struggle with limited resolution, structural rigidity, and extractions that align much more closely with the
numerical instabilities, which subsequently impacts underlying ground truth (Figure 5.1.3).
Sample FlexiCubes surface reconstructions

Source: Nvidia, 2023
Figure 5.1.2

Select quantitative results on 3D mesh reconstruction

Source: Shen et al., 2023 | Chart: 2024 AI Index report
80.67%
80%
70%
63.34%
60%
55.22%
52.37%
50.20%
50% 48.66%
IN>5◦ (%) ↓
40%
34.87%
30%
20%
10%
0%
MCSDF DChermite NDCSDF MC DMTet(64) DMTet(80) FlexiCubes
Algorithm/method evaluated at 64³
Figure 5.1.3

Synbot Synbot design

Source: Ha et al., 2023
AI-driven robotic chemist for
synthesizing organic molecules
Synbot employs a multilayered system,
comprising an AI software layer for
chemical synthesis planning, a robot
software layer for translating commands,
and a physical robot layer for conducting
experiments. The closed-loop feedback
mechanism between the AI and the
robotic system enables Synbot to develop
synthetic recipes with yields equal to
or exceeding established references
Figure 5.1.4
(Figure 5.1.4). In an experiment aimed at
synthesizing M1 [4-(2,3-dimethoxyphenyl)- the mid-80% reference range and completed the synthesis
1H-pyrrolo[2,3-b]pyridine], Synbot in significantly less time (Figure 5.1.5). Synbot’s automation
developed multiple synthetic formulas of organic synthesis highlights AI’s potential in fields such as
that achieved conversion yields surpassing pharmaceuticals and materials science.
Reaction kinetics of M1 autonomous optimization experiment, Synbot vs. reference

Source: Ha et al., 2023 | Chart: 2024 AI Index report
100%, Synbot #1
100%, Synbot #2
100% 100%, Synbot #4
85%, Reference
80%
Conversion yield (%)
60%
40%
20%
0%
0 3 6 9 12 15 18 21 24
Time (hours)
Figure 5.1.5

GraphCast and more. Figure 5.1.7 compares the performance

More accurate global weather forecasting of GraphCast with the current industry state-of-the-
with GraphCast art weather simulation system: the High Resolution
GraphCast is a new weather forecasting system Forecast (HRES). GraphCast posts a lower root mean
that delivers highly accurate 10-day weather squared error, meaning its forecasts more closely
predictions in under a minute (Figure 5.1.6). Utilizing correspond to observed weather patterns. GraphCast
graph neural networks and machine learning, can be a valuable tool in deciphering weather patterns,
GraphCast processes vast datasets to forecast enhancing preparedness for extreme weather events,
temperature, wind speed, atmospheric conditions, and contributing to global climate research.
GraphCast weather prediction

Source: DeepMind, 2023
Figure 5.1.6
Ten-day z500 forecast skill: GraphCast vs. HRES

Source: Lam et al., 2023 | Chart: 2024 AI Index report
GraphCast HRES (06z/18z) HRES (00z/12z)
800
700
600
RMSE (m²/s²) ↓
500
400
300
200
100
0
1 2 3 4 5 6 7 8 9 10
Lead time (days)
Figure 5.1.7

GNoME Sample material structures

Source: Merchant et al., 2023
Discovering new materials with GNoME
The search for new functional materials is key to
advancements in various scientific fields, including
robotics and semiconductor manufacturing. Yet this
discovery process is typically expensive and slow.
Recent advancements by Google researchers have
demonstrated that graph networks, a type of AI
model, can expedite this process when trained on
large datasets. Their model, GNoME, outperformed
the Materials Project, a leading method in materials
discovery, by identifying a significantly larger
number of stable crystals (Figure 5.1.8). GNoME has
unveiled 2.2 million new crystal structures, many
Figure 5.1.8
overlooked by human researchers (Figure 5.1.9 and
Figure 5.1.10). The success of AI-driven projects like
GNoME highlights the power of data and scaling in
speeding up scientific breakthroughs.
GNoME vs. Materials Project: stable crystal count GNoME vs. Materials Project: distinct prototypes
Source: Merchant et al., 2023 | Chart: 2024 AI Index report Source: Merchant et al., 2023 | Chart: 2024 AI Index report
1,000,000
GNoME Material Project GNoME Material Project
20,000
100,000
Stable crystal count
Distinct prototypes
10,000 10,000
1,000
2 3 4 5 6 0
2 3 4 5 6
Unique elements Unique elements

Flood Forecasting A team of Google researchers has used AI to develop

AI for more accurate and reliable flood forecasts highly accurate hydrological simulation models
New research introduced in 2023 has made that are also applicable to ungauged basins.1 These
significant progress in predicting large-scale flood innovative methods can predict certain extreme flood
events. Floods, among the most common natural events up to five days in advance, with accuracy that
disasters, have particularly devastating effects in matches or surpasses current state-of-the-art models,
less developed countries where infrastructure for such as GloFAS. The AI model demonstrates superior
prevention and mitigation is lacking. Consequently, precision (accuracy of positive predictions) and recall
developing more accurate prediction methods that (ability to correctly identify all relevant instances)
can forecast these events further in advance could across a range of return period events, outperforming
yield substantial positive impacts. the leading contemporary method (Figure 5.1.11).2 The
model is open-source and is already being used to
predict flood events in over 80 countries.
Predictions of AI model vs. GloFAS across return periods

Source: Nearing et al., 2023 | Chart: 2024 AI Index report
1.00 1.00
AI model GloFAS
0.80 0.80
Precision (median) ↑
Recall (median) ↑
0.60 0.60
0.40 0.40
0.20 0.20
0.00 0.00
1 (N=3,649) 2 (N=3,675) 5 (N=3,416) 10 (N=3,087) 1 (N=3,682) 2 (N=3,691) 5 (N=3,597) 10 (N=3,321)
Return period
Figure 5.1.11
1 An ungauged basin is a watershed for which there is insufficient streamflow data to model hydrological flows.
2 A return period (recurrence interval) measures the likelihood of a particular hydrological event recurring within a specific period. For example, a 100-year flood means there is a 1% chance of
the event being equaled or exceeded in any given year.

Index Report 2024 5.2 AI in Medicine
AI models are becoming increasingly valuable in healthcare, with applications for detecting polyps to aiding clinicians
in making diagnoses. As AI performance continues to improve, monitoring its impact on medical practice becomes
increasingly important. This section highlights significant AI-related medical systems introduced in 2023, the current
state of clinical AI knowledge, and the development of new AI diagnostic tools and models aimed at enhancing
hospital administration.
5.2 AI in Medicine
Notable Medical Systems SynthSR generations
Source: Iglesias et al., 2023
This section identifies significant AI-related

medical breakthroughs of 2023 as chosen by the
AI Index Steering Committee.
SynthSR
Transforming brain scans for advanced analysis
SynthSR is an AI tool that converts clinical brain
scans into high-resolution T-1 weighted images
(Figure 5.2.1). This advancement addresses the issue
of scan quality variability, which previously limited
the use of many scans in advanced research. By
transforming these scans into T1-weighted images,
known for their high contrast and clear brain
Figure 5.2.1
structure depiction, SynthSR facilitates the creation
of detailed 3D brain renderings. Experiments using
SynthSR demonstrate robust correlations between
observed volumes at both scan and subject levels,
suggesting that SynthSR generates images closely
resembling those produced by high-resolution T1
scans. Figure 5.2.2 illustrates the extent to which
SynthSR scans correspond with ground-truth
observations across selected brain regions. SynthID
significantly improves the visualization and analysis
of brain structures, facilitating neuroscientific
research and clinical diagnostics.

SynthSR correlation with ground-truth volumes on select brain regions

Source: Iglesias et al., 2023 | Chart: 2024 AI Index report
Subject level
0.91 0.93 0.91 0.99 0.89 0.90
(n=41)
Scan and subject level
Scan level (ablated

0.79 0.79 0.76 0.99 0.74 0.54
segmentation task)
Scan level
0.79 0.83 0.77 0.99 0.76 0.60
(n=435)
W Co Su Ve Hip Am
hit rt b co n tric po yg
em ica rt ca d
att lg ica les m ala
er ray lg pu
ma ray s
tte ma
r tte
r
Brain region
Figure 5.2.2

Coupled Plasmonic Infrared Sensors ImmunoSEIRA detection principle and the setup
Source: Kavungal et al., 2023
Coupled plasmonic infrared sensors for the
detection of neurodegenerative diseases
Diagnosis of neurodegenerative diseases such as
Parkinson’s and Alzheimer’s depends on fast and
precise identification of biomarkers. Traditional
methods, such as mass spectrometry and ELISA, are
useful in that they can focus on quantifying protein
levels; however, they cannot discern changes in
structural states. This year, researchers uncovered a
new method for neurodegenerative disease diagnosis
that combined AI-coupled plasmonic infrared sensors
that use Surface-Enhanced Infrared Absorption
(SEIRA) spectroscopy with an immunoassay
technique (ImmunoSEIRA; Figure 5.2.3). In tests that
compared actual fibril percentages with predictions Figure 5.2.3
made by AI systems, the accuracy of the predictions

was found to very closely match the actual reported
percentages (Figure 5.2.4).
Deep neural network predicted vs. actual brils percentages in test samples
Source: Kavungal et al., 2023 | Chart: 2024 AI Index report
100%
80%
Predicted brils concentration (%)
60%
40%
20%
0%
0% 25% 40% 50% 60% 75% 100%

Actual brils concentration (%)
Figure 5.2.4

EVEscape
Forecasting viral evolution for pandemic of viruses (Figure 5.2.5). EVEscape evaluates
preparedness viral escape independently of current strain data
Predicting viral mutations is vital for vaccine design predicting 50.0% of observed SARS-CoV-2 mutations,
and pandemic minimization. Traditional methods, outperforming traditional lab studies which predicted
which rely on real-time virus strain and antibody data, 46.2% and 32.3%, as well as a previous model, which
face challenges during early pandemic stages due predicted only 24% of mutations (Figure 5.2.6).
to data scarcity. EVEscape is a new AI deep learning This performance highlights EVEscape’s potential
model trained on historical sequences and biophysical as a valuable asset for enhancing future pandemic
and structural information that predicts the evolution preparedness and response efforts.
EVEscape design
Source: Thadani et al., 2023
Figure 5.2.5

EVEscape vs. other models on SARS-CoV-2 RBD mutation prediction

Source: Thadani et al., 2023 | Chart: 2024 AI Index report
10 months
Pandemic start
17 months
50% 50%, EVEscape (prepandemic)
46%, Later experimental scans (pandemic ab + sera)
40%
Predicted mutations (%)
32%, Earlier experimental scans (pandemic ab)

30%
24%, Previous model
20%
10%
0%
2020-Jan 2020-Jul 2021-Jan 2021-Jul 2022-Jan 2022-Jul 2023-Jan

Pandemic date
Figure 5.2.6

AlphaMissence Hemaglobin subunit beta (HBB)

Source: Google DeepMind, 2023
Better classification of AI mutations
Scientists still do not fully understand which
genetic mutations lead to diseases. With millions of
possible genetic mutations, determining whether
a mutation is benign or pathogenic requires labor-
intensive experiments.
In 2023, researchers from Google DeepMind

unveiled AlphaMissense, a new AI model
that predicted the pathogenicity of 71 million
missense variants. Missense mutations are
genetic alterations that impact the functionality
of human proteins (Figure 5.2.7) and can lead
to various diseases, including cancer. Of the 71
million possible missense variants, AlphaMissense
classified 89%, identifying 57% as likely benign and
32% as likely pathogenic, while the remainder were
categorized as uncertain (Figure 5.2.8). In contrast,
human annotators have only been able to confirm Figure 5.2.7
the nature of 0.1% of all missense mutations.
AlphaMissense predictions
Source: Google DeepMind, 2023 | Chart: 2024 AI Index report
Likely benign Likely pathogenic Uncertain
Prediction category 57% 32% 11%
0% 20% 40% 60% 80% 100%

% of variants classi ed
Figure 5.2.8

Human Pangenome Reference Graph genome for the MHC region of the genome
Source: Google Research, 2023
Using AI to map the human genome
The human genome is a set of molecular instructions
for a human. The first human genome draft was
released in 2000 and updated in 2022. However,
the update was somewhat incomplete. It did not
incorporate various genetic mutations, like blood
type, and did not as completely map diverse ancestry
groups. Therefore, under the existing genome
reference, it would be difficult to detect diseases or
find cures in certain groups of people.
In 2023, the Human Pangenome Research Consortium,

comprising 119 scientists from 60 institutions, used AI
to develop an updated and more representative human
genome map (Figure 5.2.9). The researchers achieved
remarkable accuracy, annotating a median of 99.07% Figure 5.2.9
of protein-coding genes, 99.42% of protein-coding This latest version of the genome represents the most
transcripts, 98.16% of noncoding genes, and 98.96% comprehensive and genetically diverse mapping of the
of noncoding transcripts, as detailed in Figure 5.2.10. human genome to date.
Ensembl mapping pipeline results

Source: Liao et al., 2023 | Chart: 2024 AI Index report
99.07% 99.42% 98.16% 98.96%
100%
80%
% of identi ed (median)
60%
40%
20%
0%
Protein-coding genes Protein-coding transcripts Noncoding genes Noncoding transcripts
Genes and transcripts Figure 5.2.10

Clinical Knowledge
Evaluating the clinical knowledge of AI models remarkable improvement, with the leading system,
involves determining the extent of their medical GPT-4 Medprompt, achieving an accuracy rate of
expertise, particularly knowledge applicable in a 90.2%—an increase of 22.6 percentage points from
clinical setting. the top score in 2022 (Figure 5.2.11). Since MedQA’s
inception, AI capabilities on this benchmark have
MedQA nearly tripled, showcasing the rapid improvements of
Introduced in 2020, MedQA is a comprehensive clinically knowledgeable AI systems.
dataset derived from professional medical board
exams, featuring over 60,000 clinical questions
designed to challenge doctors.
AI performance on the MedQA benchmark has seen

MedQA: accuracy
90% 90.20%
80%
70%
Accuracy (%)
60%
50%
40%
2019 2020 2021 2022 2023
Figure 5.2.11

GPT-4 Medprompt
Although LLMs exhibit impressive Moreover, as noted earlier, GPT-4 Medprompt was the first to
general knowledge, it is commonly surpass the 90% accuracy mark on the MedQA benchmark.
assumed that significant fine-tuning This breakthrough not only underscores GPT-4 Medprompt’s
is required for them to excel at exceptional and potentially clinically useful medical
specialized knowledge, such as capabilities but also demonstrates that fine-tuning may not
answering medical questions. Fine- always be necessary for adapting models to specialized
tuning entails training an LLM on domains. Prompt engineering has shown to be a promising
domain-specific data. alternative strategy.
Research from Microsoft in late 2023

has overturned this assumption. GPT-4 vs. Med-PaLM 2 answering a medical question
This study employed prompt Source: Nori et al., 2023
engineering to direct GPT-4 toward

achieving remarkable performance
on the MultiMedQA benchmark
suite, a group of four challenging
medical benchmarks (Figure 5.2.12).
GPT-4 Medprompt exceeded the
performance of the top 2022 model,
Flan-PaLM 540B, in the multiple-
choice sections of several renowned
medical benchmarks, including
PubMedQA, MedMCQA, and MMLU,
by 3.0, 21.5, and 16.2 percentage
points, respectively. It also exceeded
the performance of the then state-of-
the-art Med-PaLM 2 (Figure 5.2.13). Figure 5.2.12

GPT-4 Medprompt (cont’d)
Model performance on MultiMedQA sub-benchmarks

Source: Nori et al., 2023 | Chart: 2024 AI Index report
MMLU MedMCQA PubMedQA MedQA
100%
94.25%
89.88% 90.20%
86.50% 87.37%
81.80% 81.40% 82.00%
78.02% 79.00% 79.10%
80%
75.20%
72.30% 72.40%
67.60%
60% 57.60%
Accuracy (%)
40%
20%
0%
Flan-PaLM 540B Med-PaLM 2 GPT-4 GPT-4 Medprompt
2022 2023
Figure 5.2.13

MediTron-70B
GPT-4 Medprompt is an impressive system; PaLM 2 (both closed models), it represents
however, it is closed-source, meaning its weights a significant improvement over the state-of-
are not freely available to the broader public for the-art results from 2023 and surpasses other
use. New research in 2023 has also sought to open-source models like Llama 2 (Figure 5.2.14).
advance the capabilities of open-source medical MediTron-70B’s score on MedQA is the highest
LLMs. Among this new research, MediTron-70B yet achieved by an open-source model. If medical
stands out as particularly promising. This model AI is to reach its fullest potential, it is important
achieves a respectable 70.2% accuracy on the that its capabilities are widely accessible. In this
MedQA benchmark. Although this is below the context, MediTron represents an encouraging
performance of GPT-4 Medprompt and Med- step forward.
Performance of select models on MedQA

Source: Chen et al., 2023 | Table: 2024 AI Index report
Model Release date Access type Score on MedQA
GPT-4 Medprompt November 2023 Closed 90.20%

Med-PaLM 2 April 2023 Closed 86.20%
MediTron-70B November 2023 Open 70.20%
Med-PaLM December 2022 Closed 67.20%
Llama 2 July 2023 Open 63.80%
Figure 5.2.14

Diagnosis
AI tools can also be used for diagnostic purposes including, for example, in radiology or cancer detection.
CoDoC
AI medical imaging systems demonstrate robust (the ability to accurately identify those without it).
diagnostic capabilities, yet there are instances Specifically, across four medical datasets, CoDoC’s
where they overlook diagnoses that clinicians sensitivity surpasses clinicians’ by an average of
catch, and vice versa. This observation suggests 4.5 percentage points and a standalone AI model’s
a logical integration of AI systems and clinicians’ by 6.5 percentage points (Figure 5.2.15). In terms
diagnostic abilities. In 2023, researchers unveiled of specificity, CoDoC outperforms clinicians by
CoDoC (Complementarity-Driven Deferral to an average of 2.7 percentage points across tested
Clinical Workflow), a system designed to discern datasets and a standalone predictive model by 5.7
when to rely on AI for diagnosis and when to defer percentage points. Moreover, CoDoC has been
to traditional clinical methods. CoDoC notably shown to reduce clinical workflow by 66%. These
enhances both sensitivity (the ability to correctly findings suggest that AI medical systems can be
identify individuals with a disease) and specificity integrated into clinical workflows, thereby enhancing
diagnostic accuracy and efficiency.
CoDoC vs. standalone predictive AI system and clinical readers: sensitivity
Source: Dvijotham et al., 2023 | Chart: 2024 AI Index report
CoDoC Clinician(s) Standalone predictive AI model
100% 96.70% 96.70%

90.50%89.40%90.90%
86.70%
80%
72.60%
64.90%
62.70%
Sensitivity (%)
60% 56.90%
50.00%48.30%
40%
20%
0%
UK mammography dataset US mammography dataset 1 US mammography dataset 2 TB dataset
Breast cancer detection TB detection
Task and dataset Figure 5.2.15

CT Panda
PANDA
Pancreatic ductal adenocarcinoma (PDAC) is a particularly
detection
lethal cancer, often detected too late for surgical intervention. Source:
Cao et al., 2023
Screening for PDAC in asymptomatic individuals is Figure 5.2.16
challenging due to its low prevalence and the risk of false
positives. This year, a Chinese research team developed
PANDA (pancreatic cancer detection with artificial
intelligence), an AI model capable of efficiently detecting
and classifying pancreatic lesions in X-rays (Figure 5.2.16). In
validation tests, PANDA surpassed the average radiologist in
sensitivity by 34.1% and in specificity by 6.3% (Figure 5.2.17).
In a large-scale, real-world test involving approximately
20,000 patients, PANDA achieved a sensitivity of 92.9% and
a specificity of 99.9% (Figure 5.2.18). AI medical tools like
PANDA represent significant advancements in diagnosing
challenging conditions, offering cost-effective and accurate
detection previously considered difficult or prohibitive.
PANDA vs. mean radiologist on multicenter validation PANDA performance on real-world multi-scenario
(6,239 patients) validation (20,530 patients)
Source: Cao et al., 2023 | Chart: 2024 AI Index report Source: Cao et al., 2023 | Chart: 2024 AI Index report
99.90%
100%
35% 34.10% 92.90%
30% 80%
Score for lesion detection
Performance di�erence
25%
60%
20%
15% 40%
10%
6.30% 20%
5%
0% 0%
Sensitivity Speci�city Sensitivity Speci city

Other Diagnostic Uses

New research published in 2023 highlights how AI can be used in other diagnostic contexts. Figure 5.2.19
summarizes some of the findings.
Additional research on diagnostic AI use cases

Research Use case Findings
Schopf et al., 2023 Breast cancer The authors conducted a meta-review of the literature exploring mammography-image-based
AI algorithms. They discovered that predicting future breast cancer risk using only
mammography images achieves accuracy that is comparable to or better than traditional risk
assessment tools.
Dicente Cid et al., 2023 X-ray interpretation The researchers developed two open-source neural networks, X-Raydar and X-Raydar-NLP,
for classifying chest X-rays using images and free-text reports. They found that these
automated classi cation methods perform at levels comparable to human experts and
demonstrate robustness when applied to external data sets.
Figure 5.2.19

FDA-Approved AI-Related Medical Devices

The U.S. Food and Drug Administration (FDA) Figure 5.2.20 illustrates the number of AI medical
maintains a list of AI/ML-enabled medical devices devices approved by the FDA over the past decade.
that have received approval. The devices featured In 2022, a total of 139 AI-related medical devices
on this list meet the FDA’s premarket standards, received FDA approval, marking a 12.1% increase from
which include a detailed review of their effectiveness the total approved in 2021. Since 2012, the number of
and safety. As of October 2023, the FDA has not these devices has increased by more than 45-fold.
approved any devices that utilize generative AI or are
powered by LLMs.
Number of AI medical devices approved by the FDA, 2012–22

Source: FDA, 2023 | Chart: 2024 AI Index report
139
140
124
120
107
Number of AI medical devices
100
80 77
63
60
40
26
20 18
6 5
3 3
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 5.2.20
3 The FDA last updated the list in October 2023, meaning that the totals for 2023 were incomplete. Consequently, the AI Index limited its data presentation to include only information
up to 2022.

Figure 5.2.21 illustrates the specialties associated with FDA-approved medical devices. Of the 139 devices
approved in 2022, a significant majority, 87.1%, were related to radiology. The next most common specialty was
cardiovascular, accounting for 7.2% of the approvals.
Number of AI medical devices approved by the FDA by specialty, 2012–22

Source: FDA, 2023 | Chart: 2024 AI Index report
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Radiology 2 0 5 0 11 15 39 51 94 105 121
Cardiovascular 0 0 0 1 4 6 9 12 7 11 10
Neurology 0 0 1 0 1 1 4 4 0 2 2
Gastroenterology and urology 0 0 0 0 0 1 1 1 0 3 1

Hematology 0 1 0 0 0 2 2 1 3 0 1
Microbiology 0 2 0 0 0 0 0 2 1 0 0
Medical specialty
General hospital 0 0 0 1 0 0 0 2 0 0 0
General and plastic surgery 0 0 0 0 1 0 2 1 0 1 0
Ophthalmic 0 0 0 1 0 0 2 1 1 1 2
Clinical chemistry 0 0 0 1 1 0 2 1 0 0 1
Anesthesiology 0 0 0 1 0 1 0 0 1 0 0
Pathology 1 0 0 0 0 0 0 0 0 1 0
Ear nose and throat 0 0 0 0 0 0 0 0 0 0 1
Dental 0 0 0 0 0 0 1 0 0 0 0
Orthopedic 0 0 0 0 0 0 1 0 0 0 0
Obstetrics and gynecology 0 0 0 0 0 0 0 1 0 0 0
Figure 5.2.21

Administration and Care

AI tools also hold the potential to enhance medical administration efficiency and elevate the standard of patient care.
MedAlign
Despite significant advances benchmark with 983 questions and instructions and 303 clinician
in AI for healthcare, existing responses, drawn from seven different medical specialties (Figure
benchmarks like MedQA and 5.2.22). MedAlign is the first extensive EHR-focused benchmark.
USMLE, focused on knowledge-
The researchers then tested various existing LLMs on MedAlign. Of
based questions, do not fully
all LLMs, a GPT-4 variant using multistep refinement achieved the
capture the diverse tasks
highest correctness rate (65.0%) and was routinely preferred over
clinicians perform in patient
other LLMs (Figure 5.2.23). MedAlign is a valuable milestone toward
care. Clinicians often engage
using AI to alleviate administrative burdens in healthcare.
in information-intensive tasks,
such as creating tailored
MedAlign workflow
diagnostic plans, and spend a Source: Fleming et al., 2023
significant proportion of their

working hours on administrative
tasks. Although AI has the
Figure 5.2.16
potential to streamline these
processes, there is a lack of
suitable electronic health
records (EHR) datasets for
benchmarking and fine-tuning
medically administrative LLMs.
This year researchers have
made strides to address this
gap by introducing MedAlign:
a comprehensive EHR-based Figure 5.2.22

MedAlign (cont’d)
Evaluation of model performance: human vs. COMET ranks
Source: Fleming et al., 2023 | Chart: 2024 AI Index report
Human ranks COMET ranks
GPT-4 (32k + MR) 48% 56% 73% 71% 82% GPT-4 (32k + MR) 50% 52% 66% 63% 79%
GPT-4 (32k) 52% 58% 72% 74% 81% GPT-4 (32k) 50% 51% 63% 58% 77%
Model A (winner)
Model A (winner)
GPT-4 (2k) 44% 42% 67% 70% 76% GPT-4 (2k) 48% 49% 66% 61% 79%
Vicuña-13B (2k) 27% 28% 33% 50% 63% Vicuña-13B (2k) 34% 37% 34% 49% 70%
Vicuña-7B (2k) 29% 26% 30% 50% 64% Vicuña-7B (2k) 37% 42% 39% 51% 71%
MPT-7B-Instruct (2k) 18% 19% 24% 37% 36% MPT-7B-Instruct (2k) 21% 23% 21% 30% 29%
GPT-4 (32k + MR)
GPT-4 (32k)
GPT-4 (2k)
Vicuña-13B (2k)
Vicuña-7B (2k)
MPT-7B-Instruct (2k)
GPT-4 (32k + MR)
GPT-4 (32k)
GPT-4 (2k)
Vicuña-13B (2k)
Vicuña-7B (2k)
MPT-7B-Instruct (2k)
Model B (loser) Model B (loser)
Figure 5.2.23

Index Report 2024
Education
CHAPTER 6:
Index Report 2024 Education
Preview
Overview 327
6.1 Postsecondary CS and AI Education 329

United States and Canada 329
CS Bachelor’s Graduates 329
CS Master’s Graduates 331
CS PhD Graduates 333
CS, CE, and Information Faculty 336
Europe 344
Informatics, CS, CE, and IT Bachelor’s Graduates 344
Informatics, CS, CE, and IT Master’s Graduates 347
Informatics, CS, CE, and IT PhD Graduates 351
AI-Related Study Programs 355
Total Courses 355
Education Level 356
Geographic Distribution 357
6.2 K–12 CS and AI Education 359

United States 359
State-Level Trends 359
AP Computer Science 361
Highlight: Access Issues 363
Highlight: ChatGPT Usage Among
Teachers and Students 364

CHAPTER 6:
Overview
This chapter examines trends in AI and computer science (CS) education, focusing on who is
learning, where they are learning, and how these trends have evolved over time. Amid growing
concerns about AI’s impact on education, it also investigates the use of new AI tools like
ChatGPT by teachers and students.
The analysis begins with an overview of the state of postsecondary CS and AI education in the
United States and Canada, based on the Computing Research Association’s annual Taulbee
Survey. It then reviews data from Informatics Europe regarding CS education in Europe. This
year introduces a new section with data from Studyportals on the global count of AI-related
English-language study programs.
The chapter wraps up with insights into K–12 CS education in the United States from Code.org
and findings from the Walton Foundation survey on ChatGPT’s use in schools.

CHAPTER 6:
Chapter Highlights
1. The number of American and Canadian CS bachelor’s graduates continues to rise, new
CS master’s graduates stay relatively flat, and PhD graduates modestly grow. While the
number of new American and Canadian bachelor’s graduates has consistently risen for more than a decade, the
number of students opting for graduate education in CS has flattened. Since 2018, the number of CS master’s and
PhD graduates has slightly declined.
2. The migration of AI PhDs to industry continues at an accelerating pace. In 2011, roughly

equal percentages of new AI PhDs took jobs in industry (40.9%) and academia (41.6%). However, by 2022, a
significantly larger proportion (70.7%) joined industry after graduation compared to those entering academia
(20.0%). Over the past year alone, the share of industry-bound AI PhDs has risen by 5.3 percentage points,
indicating an intensifying brain drain from universities into industry.
3. Less transition of academic talent from industry to academia. In 2019, 13% of new AI faculty in the
United States and Canada were from industry. By 2021, this figure had declined to 11%, and in 2022, it further dropped
to 7%. This trend indicates a progressively lower migration of high-level AI talent from industry into academia.
4. CS education in the United States and Canada becomes less international. Proportionally
fewer international CS bachelor’s, master’s, and PhDs graduated in 2022 than in 2021. The drop in international
students in the master’s category was especially pronounced.
5. More American high school students take CS courses, but access problems remain.
In 2022, 201,000 AP CS exams were administered. Since 2007, the number of students taking these exams has
increased more than tenfold. However, recent evidence indicates that students in larger high schools and those in
suburban areas are more likely to have access to CS courses.
6. AI-related degree programs are on the rise internationally. The number of English-language,
AI-related postsecondary degree programs has tripled since 2017, showing a steady annual increase over the past
five years. Universities worldwide are offering more AI-focused degree programs.
7. The United Kingdom and Germany lead in European informatics, CS, CE, and IT graduate
production. The United Kingdom and Germany lead Europe in producing the highest number of new informatics,
CS, CE, and information bachelor’s, master’s, and PhD graduates. On a per capita basis, Finland leads in the
production of both bachelor’s and PhD graduates, while Ireland leads in the production of master’s graduates.

Artificial Intelligence Chapter 6: Education
Index Report 2024 6.1 Postsecondary CS and AI Education
This section provides an overview of postsecondary education in CS and AI, highlighting graduation statistics across
North America and Europe for various degrees including bachelor’s, master’s, and PhDs. It also covers information on
AI-related courses offered in English.
6.1 Postsecondary CS and AI Education

CS Bachelor’s Graduates
United States and Canada Over the past decade, the total number of new CS
This subsection presents an analysis of data from the bachelor’s graduates in North America has steadily
Computing Research Association’s Taulbee Survey, risen, increasing more than threefold, with a 7.9%
which evaluates the state of CS and AI postsecondary year-over-year rise from 2021 to 2022 (Figure 6.1.1).
education in the United States and Canada. The
survey covers 297 PhD-granting CS departments
across the United States and Canada.1
New CS bachelor’s graduates in the United States and Canada, 2010–22

Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
35,666
35,000
33,059
31,835
30,000 28,527
Number of new CS bachelor’s graduates
26,709
25,000
22,343
20,000 18,954
15,256
15,000
12,228
11,049 10,776
10,000 9,008 9,286
5,000
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.1
1 It is important to note that not all PhD-granting departments targeted in the survey provided responses. Out of the 297 departments targeted, only 182 responded, yielding an overall
response rate of 61%.

For the first time in almost eight years, the proportion the data. The decline is also partially attributable to
of international students among CS bachelor’s international travel restrictions that were imposed
graduates in American and Canadian universities during the COVID-19 pandemic, affecting the ability
declined, falling from 16.3% in 2021 to 15.2% in of international students to study in the United States
2022 (Figure 6.1.2). This decline likely reflects the and Canada. Despite this recent drop, the overall
increased difficulty of obtaining study visas during trend over the last decade shows a steady increase in
the early years of the Trump administration, an the proportion of international students.
impact that is only now beginning to manifest in
New international CS bachelor’s graduates (% of total) in the United States and Canada, 2010–22
16%
15.20%
New international CS bachelor’s graduates (% of total)
12%
8%
4%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.2

CS Master’s Graduates
AI courses are commonly included in CS master’s the number appears to have leveled out since 2018 and
degree programs. While the total number of new CS slightly decreased, by 2.5%, last year (Figure 6.1.3). This
master’s graduates from American and Canadian leveling is a reflection of the decline in international
universities more than doubled over the past decade, master’s students shown in the following graph.
New CS master’s graduates in the United States and Canada, 2010–22

16,000 15,532
14,853 15,068
14,735 14,696
14,000
13,037
Number of new CS master’s graduates
12,000
11,239
9,933
10,000
8,000 7,462 7,488

7,205
6,851 6,611
6,000
4,000
2,000
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.3

In 2022, American and Canadian universities experienced a notable decrease in international CS master’s students.
This downward trend began around 2017, but the decline was most pronounced last year, at 14.8 percentage points
(Figure 6.1.4). Currently, the split between international and domestic CS master’s graduates is roughly even.
New international CS master’s graduates (% of total) in the United States and Canada, 2010–22
80%
New international CS master’s graduates (% of total)
60%
50.40%
40%
20%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.4

CS PhD Graduates
For the first time in a decade, there has been a significant increase in the number of new CS PhD graduates at
American and Canadian universities. In 2022, the number of CS PhD graduates reached 2,105, the highest since
2010 (Figure 6.1.5).
New CS PhD graduates in the United States and Canada, 2010–22

2,105
1,991 1,997
2,000 1,929 1,940
1,888 1,860 1,893
1,834
1,772 1,782 1,780 1,787
Number of new CS PhD graduates
1,500
1,000
500
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.5

While the proportion of international students among CS PhD graduates has risen over the past decade, there was
a slight decrease in this proportion in the last year, dropping from 68.6% in 2021 to 65.9% in 2022 (Figure 6.1.6).
New international CS PhD graduates (% of total) in the United States and Canada, 2010–22
70%
65.90%
60%
New international CS PhD graduates (% of total)
50%
40%
30%
20%
10%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.6

Where do newly minted AI PhDs choose to work However, by 2022, a significantly larger proportion
after graduating? Following a trend highlighted in last (70.7%) joined industry after graduation compared
year’s AI Index report, a growing share of AI doctoral to those entering academia (20.0%). The percentage
recipients are pursuing careers in industry (Figure 6.1.7 of new AI PhDs going into government roles has
and Figure 6.1.8). In 2011, around the same percentage remained relatively low and steady at around 0.7%
took jobs in industry (40.9%) as in academia (41.6%). over the past five years.
Employment of new AI PhDs (% of total) in the Employment of new AI PhDs in the United States and
United States and Canada by sector, 2010–22 Canada by sector, 2010–22
Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report Source: CRA Taulbee Survey, 2023 | Chart: 2024 AI Index report
Academia 362
70% 70.71%, Industry
350
Government
Industry 79
60%
300
New AI PhD graduates (% of total)
Number of new AI PhD graduates
281
50% 249
250 238 84
219
40% 201
65
200 73
178 61
30% 154 154
63
150 60
134 132 136 280
47 123
20% 19.95%, Academia 72 43
63 51
100 42 195
180
162
10% 153
134
116
50 101
85
76 74 77
0% 0.76%, Government 64
0
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2 The sums in Figure 6.1.7 do not add up to 100, as there is a subset of new AI PhDs each year who become self-employed, unemployed, or report an “other” employment status in the CRA
survey. These students are not included in the chart.

CS, CE, and Information Faculty

To better understand trends in CS and AI education, it is helpful to examine data on CS faculty. Last year, the total
number of CS, CE, and information faculty in American and Canadian universities increased 7.2% (Figure 6.1.9).
Since 2011, the increase is 42.4%.
Number of CS, CE, and information faculty in the United States and Canada, 2011–22
10,000
Tenure Track Teaching Professors Other Instructors Research Postdoc
9,000 8,738
531
8,149
7,858 7,976
8,000 7,657 522
Number of CS, CE, and information faculty
530
7,362 668 919
653
7,000 6,806 6,887 691 426
6,629 465 736 861
6,478 589
6,314 649 432 617
6,138 689 494 1,252
6,000 602 766 390
432 1,150
656 895 1,183
529 1,180 831
515 676 1,122
447 1,014
5,000 661 863
669 487
4,000
3,000 5,733
5,059 5,214 5,252 5,231 5,310
4,536 4,549 4,548 4,711 4,786
4,366
2,000
1,000
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.9

In 2022, the United States had 7,084 CS faculty members, with the majority (65.7%) on the tenure track (Figure
6.1.10). The total number of American CS faculty has risen 4.4% since 2021 and 45.0% since 2011.
Number of CS faculty in the United States, 2011–22

8,000
Tenure Track Teaching Professors Other Instructors Research Postdoc
7,084
7,000 6,789
6,654 453
6,430 6,533 428
424
6,098 531
518
6,000 5,729 736
5,637 567 382 693
426 618
5,256 491
5,202 535 408 436
534
5,068 947
Number of CS faculty
5,000 4,885 521 509 396

364
899
592 946
947 671 715
522
460 455 903
491 826
387
4,000 550 679
521 421
3,000
4,482 4,657
4,366 4,384 4,390
2,000 3,971 4,176
3,725 3,880
3,455 3,564 3,559
1,000
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.10

Last year, 915 new faculty were hired across CS, CE, and information disciplines in North America, a decade high.
455 of these positions were tenure track. (Figure 6.1.11).
New CS, CE, and information faculty hires in the United States and Canada, 2011–22
Total Tenure-Track 915

878
860
800
Number of new CS, CE, and information faculty hires
800 765
749 749
733
710
691
600 583 572

543
455
422
396 406
400 374
348 358
320 324
294
249 258
218
200
0
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.11

In 2022, 43% of new faculty appointments came from other academic positions, indicating a “churn” within
the academic workforce (Figure 6.1.12). Since these “new” faculty members vacated positions elsewhere, their
previous roles will eventually need to be filled. Additionally, the proportion of faculty transitioning from industry
in 2022 fell to 7% from 11% in the previous year and 13% in 2019.
Source of new faculty in American and Canadian CS, CE, and information departments, 2018–22
New PhD From Postdoc From Other Academic From Industry
100%
7%
13% 13% 11% 11%
80%
Source of new faculty
34% 34% 43%

34%
41%
60%
16% 15%
15%
16%
40%
17%
20% 38% 39% 40%

34%
29%
0%
2018 2019 2020 2021 2022
Figure 6.1.12

The reasons for faculty positions remaining unfilled were turned down. This trend appears to reflect an
have varied over the past decade. In 2011, 37% of failed increasingly competitive market for new CS faculty.
searches were due to no offer being made, while 34% However, it remains unclear whether this indicates
were because the offer made was declined (Figure heightened competition with other academic positions
6.1.13). In contrast, in 2022, only 15% ended with no or with industry positions.
offer being made, while 55% involved offers that
Reason why new CS, CE, and information faculty positions remained un lled (% of total), 2011–22
Didn’t �nd a person who met our hiring goals Technically vacant, not �lled for admin reasons Other
O�ers turned down Hiring in progress
Reason faculty positions remained un�lled (% of total)
100%
10% 18%
10% 23% 23%
25% 26% 22%
31% 28% 27%
80% 17%
18%
5% 6%
10%
55%
6%
60% 45% 12%
34%
40%
36% 55%
40% 52% 51% 56% 53%
43%
44%
20% 37% 37% 37%

26% 26%
16% 14% 14% 13% 14% 15%
8%
0%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.13

In 2022, North American departments in CS, CE, and information disciplines experienced a significant increase
in faculty departures, totaling 405, compared to 303 in 2021 (Figure 6.1.14). Of these losses, 38.5% left for other
academic positions, while 16.3% moved to nonacademic roles, maintaining a trend consistent with previous years.
Faculty losses in American and Canadian CS, CE, and information departments, 2011–22
Died Took academic position elsewhere Remained, but changed to part-time Unknown
450
Retired Took nonacademic position Other
405
400
350
327
312 66
303 303
300
270
Faculty losses
23 43 37
250 246
232 237 234 34 46
221 33
213
42
200 156
36 24
44 26 139
27 32 110
126
150 113
34 77 89
62 85
74 86
100 52
112
94 90 103 100
50 89 94 91
67 74 65 80
0
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.14

Since 2015, the increase in median nine-month salaries for full professors has slightly fallen below U.S. inflation
rates, whereas median salaries for assistant and associate professors have seen slight increases above inflation.
In 2022, a full professor’s salary was 3.2% higher than in 2021, which did not keep pace with the 7% U.S. inflation
rate, and 16.4% higher than in 2015, still below the 19% inflation increase over those years (Figure 6.1.15).
Median nine-month salary of CS faculty in the United States, 2015–22

Full Professor Associate Professor Assistant Professor 181.61

180 176.01
168.87 170.57
164.54
Median salary of CS faculty (in thousands of U.S. dollars)
158.97 159.96
160 156.02
140 134.08
127.47
121.55 123.71
117.5 119.48 119.03
120 113.95 114.07
111.67 109.23
105.45 107.55
101.16 103.01
99.12
100
80
60
40
20
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.15

In 2022, the proportion of international hires among new tenure-track faculty in CS, CE, and information
disciplines significantly increased to 19.3% from 13.2% the previous year (Figure 6.1.16). This marked the second-
highest percentage recorded in the past decade, only surpassed by 2013.
New international CS, CE, and information tenure-track faculty hires (% of total) in the United States and
Canada, 2010–22
25%
New international tenure-track faculty hires (% of total)
20%
19.30%
15%
10%
5%
0%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.1.16

Europe Informatics, CS, CE, and IT Bachelor’s

Graduates
Data on European CS graduates comes from In 2022, the United Kingdom led with the highest
Informatics Europe, an academic and research number of new graduates in informatics, CS, CE,
community that, among other goals, monitors the and IT at the bachelor’s level, totaling approximately
state of informatics education in Europe.3 Informatics 25,000 (Figure 6.1.17).5 Germany and Turkey followed
Europe gathers data on graduates in informatics, closely. Most countries in the sample saw an increase
CS, CE, computing, and information technology in graduates in these fields compared to a decade
(IT) disciplines from statistical offices of European ago, though there were exceptions like Poland, Spain,
governments.4 and the Czech Republic (Figure 6.1.18).
New informatics, CS, CE, and IT bachelor’s graduates by country in Europe, 2022
Source: Informatics Europe, 2023 | Chart: 2024 AI Index report
United Kingdom 25,040
Germany 19,920
Turkey 13,826
Poland 9,241
Italy 8,508
Netherlands 6,877
Romania 6,740
Spain 6,650
Portugal 3,696
Finland 2,970
Czech Republic 2,670
Norway 2,340
Ireland 1,995
Austria 1,900
Switzerland 1,789
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 22,000 24,000
Number of new informatics, CS, CE, and IT bachelor’s graduates
Figure 6.1.17
3 There is no singular term for CS education that is used uniformly across European countries. Across Europe, CS education can be reflected in terms such as informatics, computer science
(CS), computer engineering (CE), computing, information technology (IT), information and communication technology (ICT), and information science and technology (IST). The full list of
subject names (and English translations) that Informatics Europe uses to identify informatics studies programs can be found at the following link.
4 Readers are cautioned against making per capita comparisons between the CRA North American data and the European CS graduate data detailed in subsequent sections, as the European
data is collected from national statistical offices and boasts broader coverage.
5 Note that not all countries for which the AI Index has data are visualized in the figures in this section. To access the complete data, please view the public data associated with this chapter.
Moreover, the year label refers to the year in which an academic year ends. For example, the figures visualizing new graduates for 2022 reflect the number of graduates reported for the
2021/2022 academic year. For the sake of visual simplicity, the Index opts to focus on the year in which students graduated.

Percentage change of new informatics, CS, CE, and IT bachelor’s graduates by country in Europe,
2012 vs. 2022
Norway 153%
Turkey 118%
Switzerland 117%
Netherlands 88%
Portugal 72%
Romania 70%
Ireland 66%
Austria 59%
Italy 48%
United Kingdom 39%
Germany 28%
Finland 23%
Spain -14%
Poland -15%
Czech Republic -21%
−20% 0% 20% 40% 60% 80% 100% 120% 140% 160%

% change of new informatics, CS, CE, and IT bachelor’s graduates
Figure 6.1.18
Finland (53.4), Norway (42.6), and the Netherlands (38.6) lead in the number of new bachelor’s graduates in
informatics CS, CE, and IT per 100,000 inhabitants (Figure 6.1.19). On a per capita basis, most sampled European
countries have seen increases in the total number of informatics, CS, CE, and IT bachelor’s graduates (Figure 6.1.20).

New informatics, CS, CE, and IT bachelor’s graduates per 100,000 inhabitants by country in Europe, 2022
Finland 53.38
Norway 42.63
Netherlands 38.61
Ireland 38.41
Romania 35.38
Portugal 35.31
Estonia 34.78
Lithuania 33.42
Poland 25.14
Bulgaria 25.14
Czech Republic 24.66
Germany 23.61
Latvia 22.73
Austria 20.87
0 10 20 30 40 50
New informatics, CS, CE, and IT bachelor’s graduates (per 100,000 inhabitants)
Figure 6.1.19
Percentage change of new CS, CE, and Information bachelor’s graduates per 100,000 inhabitants by country
in Europe, 2012 vs. 2022
Norway 133%
Romania 78%
Netherlands 77%
Portugal 72%
Bulgaria 67%
Ireland 48%
Austria 47%
United Kingdom 33%
Estonia 25%
Germany 22%
Finland 20%
Poland -12%
Czech Republic -23%
Latvia -33%
−40% −20% 0% 20% 40% 60% 80% 100% 120% 140%

% change of new informatics, CS, CE, and IT bachelor’s graduates (per 100,000 inhabitants)
Figure 6.1.20

Informatics, CS, CE, and IT Master’s Graduates graduates (Figure 6.1.21). In the last decade, Germany
Similar to bachelor’s graduates, the United Kingdom (259%), Turkey (197%), and Spain (194%) have seen the
leads Europe in producing new master’s graduates in greatest percentage growth in new informatics, CS, CE,
informatics, CS, CE, and IT, with approximately 20,000 and IT master’s graduates (Figure 6.1.22).
New informatics, CS, CE, and IT master’s graduates by country in Europe, 2022
Germany 11,092
Poland 4,271
Italy 3,883
Spain 3,214
Turkey 2,677
Romania 2,200
Netherlands 1,953
Ireland 1,620
Czech Republic 1,589
Finland 1,458
Austria 1,327
Switzerland 1,202
Portugal 1,109
Norway 770
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000
Number of new informatics, CS, CE, and IT master’s graduates
Figure 6.1.21

Percentage change of new informatics, CS, CE, and IT master’s graduates by country in Europe,
2012 vs. 2022
Germany 259%
Turkey 197%
Spain 194%
Norway 191%
Switzerland 148%
Netherlands 147%
Ireland 140%
United Kingdom 139%
Portugal 61%
Finland 60%
Austria 54%
Italy 46%
Poland 8%
Czech Republic -36%
−30% 0% 30% 60% 90% 120% 150% 180% 210% 240% 270%
% change of new informatics, CS, CE, and IT master’s graduates
Figure 6.1.22

Per capita metrics paint a somewhat similar picture. Ireland has the most informatics, CS, CE, and IT master’s
graduates on a per capita basis (31.2), followed by the United Kingdom (29.8) and Estonia (27.4) (Figure 6.1.23).
On a per capita basis, Germany (243%) has also seen the greatest growth of informatics, CS, CE, and IT master’s
graduates in the last decade (Figure 6.1.24).
New informatics, CS, CE, and IT master’s graduates per 100,000 inhabitants by country in Europe, 2022
Ireland 31.19
Estonia 27.38
Finland 26.20
Czech Republic 14.68
Austria 14.57
Norway 14.03
Switzerland 13.64
Germany 13.15
Denmark 12.66
Poland 11.62
Romania 11.55
Netherlands 10.96
Bulgaria 10.70
Portugal 10.59
0 5 10 15 20 25 30
New informatics, CS, CE, and IT master’s graduates (per 100,000 inhabitants)
Figure 6.1.23

Percentage change of new informatics, CS, CE, and IT master’s graduates per 100,000 inhabitants
by country in Europe, 2012 vs. 2022
Germany 243%
Estonia 189%
Norway 167%
Netherlands 132%
United Kingdom 128%
Switzerland 127%
Ireland 113%
Bulgaria 80%
Portugal 61%
Finland 56%
Denmark 56%
Austria 43%
Poland 11%
Czech Republic -38%
−30% 0% 30% 60% 90% 120% 150% 180% 210% 240%

% change of new informatics, CS, CE, and IT master’s graduates (per 100,000 inhabitants)
Figure 6.1.24

Informatics, CS, CE, and IT PhD Graduates 6.1.25). In the last decade, Turkey has seen the greatest
The United Kingdom (1,060) and Germany (910) also growth in new CS, CE, and information PhD graduates
produced the most informatics, CS, CE, and IT PhD (Figure 6.1.26).
graduates in 2022, followed by Italy (581) (Figure
New informatics, CS, CE, and IT PhD graduates by country in Europe, 2022
Germany 910
Italy 581
Spain 479
Turkey 298
Czech Republic 150
Netherlands 120
Switzerland 116
Finland 114
Austria 96
Portugal 72
Ireland 65
Romania 47
Estonia 26
Bulgaria 24
0 100 200 300 400 500 600 700 800 900 1,000 1,100
Number of new informatics, CS, CE, and IT PhD graduates
Figure 6.1.25

Percentage change of new informatics, CS, CE, and IT PhD graduates by country in Europe, 2012 vs. 2022
Turkey 173%
Estonia 86%
Bulgaria 60%
United Kingdom 18%
Spain 14%
Italy 12%
Switzerland 12%
Portugal 3%
Germany 1%
Ireland -4%
Finland -18%
Czech Republic -19%
Austria -31%
−40% −20% 0% 20% 40% 60% 80% 100% 120% 140% 160% 180%
% change of new informatics, CS, CE, and IT PhD graduates
Figure 6.1.26

Finland has the greatest number of new informatics, capita basis, the growth rate of new informatics, CS,
CS, CE, and IT PhD graduates per capita. For every CE, and IT PhDs has been relatively marginal in several
100,000 inhabitants, it has 2.1 informatics, CS, CE, major European countries such as the United Kingdom,
and IT PhD graduates (Figure 6.1.27). Estonia slightly Portugal, and Switzerland (Figure 6.1.28).
trails (1.9), as does the United Kingdom (1.6). On a per
New informatics, CS, CE, and IT PhD graduates per 100,000 inhabitants by country in Europe, 2022
Finland 2.05
Estonia 1.90
United Kingdom 1.58
Czech Republic 1.39
Switzerland 1.32
Ireland 1.25
Germany 1.08
Austria 1.05
Spain 1.00
Italy 0.99
Portugal 0.69
Netherlands 0.67
Latvia 0.64
Bulgaria 0.37
Turkey 0.35
0.00 0.30 0.60 0.90 1.20 1.50 1.80 2.10

New informatics, CS, CE, and IT PhD graduates (per 100,000 inhabitants)
Figure 6.1.27

Percentage change of new informatics, CS, CE, and IT PhD graduates per 100,000 inhabitants by country
in Europe, 2012 vs. 2022
Turkey 142%
Bulgaria 81%
Estonia 79%
Italy 14%
United Kingdom 12%
Spain 10%
Portugal 3%
Switzerland 2%
Germany -4%
Latvia -8%
Ireland -15%
Finland -20%
Czech Republic -22%
Austria -36%
−40% −20% 0% 20% 40% 60% 80% 100% 120% 140%

% change of new informatics, CS, CE, and IT PhD graduates (per 100,000 inhabitants)
Figure 6.1.28

AI-Related Study Programs Total Courses

A study program, or degree program, comprises a
Tracking the number of AI-related courses provides
series of courses designed to enable students to earn
insight into the educational interest in AI. This section
a relevant qualification, such as a degree or diploma.
highlights data from Studyportals, an international
The number of English-language AI-related study
platform monitoring English-language university
programs has tripled since 2017, demonstrating a
study programs worldwide. Their portal encompasses
consistent yearly increase over the last five years
information on over 200,000 courses at more than
(Figure 6.1.29). This trend indicates a steadily growing
3,750 educational institutions across 110 countries.6
educational interest in AI.
Number of AI university study programs in English in the world, 2017–23

Source: Studyportals, 2023 | Chart: 2024 AI Index report
3.00
Number of AI university study programs in English (in thousands)
2.52
2.50
2.32
2.16
2.00
1.64
1.50 1.45
1.17
1.00 0.88
0.50
0.00
2017 2018 2019 2020 2021 2022 2023
Figure 6.1.29
6 Currently, Studyportals, the company supplying data on AI university study programs, tracks only English-language AI courses. In the coming years, the Index plans to extend its coverage to
include non-English programs.

Education Level
Broken down by educational level, the majority of AI study programs are offered at the master’s level (55.0%),
followed by the bachelor’s level (39.8%), and finally at the PhD level (5.3%) (Figure 6.1.30).
AI university study programs in English (% of total) by education level, 2023

60%
54.97%
AI university study programs in English (% of total)
50%
39.77%
40%
30%
20%
10%
5.27%
0%
Bachelor’s Master’s PhD
Figure 6.1.30

Geographic Distribution the sample, there was a greater number of AI university

In 2023, the United Kingdom had the greatest study programs in 2023 than in 2022. Malta, the United
number of English-language AI study programs (744) Kingdom, and Cyprus had the greatest number of
(Figure 6.1.31). Next was the United States (667) and English-language AI university study programs per
Canada (89). For virtually every country included in capita in 2023 (Figure 6.1.32).7
Number of AI university study programs in English by geographic area, 2022 vs. 2023
United Kingdom 744 (+12%)

665
United States 667 (+6%)

630
Canada 89 (+2%)
87
Australia 84 (+1%)
83
Germany 80 (+54%)
52
France 69 (+15%)
60
Netherlands 59 (+7%)
55
India 45 (+2%)
44
Ireland 44 (+5%)
42
Spain 43 (+13%)
38
Italy 40 (+21%)
33
Malaysia 39 (+8%)
36
United Arab Emirates 38 (+36%)

28
2023
Sweden 37 (+16%)
32 2022
China 32 (-16%)
38
0 100 200 300 400 500 600 700 800
Number of AI university study programs in English
Figure 6.1.31
7 Although the United Kingdom has fewer universities overall compared to the United States, it likely reports a higher number of AI study programs for several reasons. Firstly, Studyportals
has slightly greater coverage of the United Kingdom than the United States in its data. Secondly, the structure of higher education in the United States tends to be more generalist compared
to the United Kingdom, meaning students studying AI might be enrolled in broader computer science programs that are not explicitly identified as AI study programs.

AI university study programs in English per 100,000 inhabitants by geographic area, 2022 vs. 2023
Malta 2.03 (+32.15%)

1.54
United Kingdom 1.10 (+11.50%)

0.99
Cyprus 0.98 (-11.56%)

1.11
Ireland 0.85 (+2.05%)

0.83
Lithuania 0.52 (+13.31%)

0.46
United Arab Emirates 0.40 (+34.63%)

0.30
Sweden 0.35 (+14.86%)

0.31
Finland 0.34 (+11.45%)

0.31
Netherlands 0.33 (+5.94%)

0.31
Australia 0.32 (-0.45%)

0.32
Norway 0.27 (+5.90%)

0.26
Latvia 0.27 (+66.02%)

0.16
Iceland 0.26 (-2.97%)

0.27
2023
Canada 0.22 (-0.86%)
0.22 2022
United States 0.20 (+5.87%)
0.19
0.00 0.50 1.00 1.50 2.00
Number of AI university study programs in English (per 100,000 inhabitants)
Figure 6.1.32

Index Report 2024 6.2 K–12 CS and AI Education
This section presents trends in high school CS education in the United States as a representation of K–12 AI education.
6.2 K–12 CS and AI Education

United States States requiring that all high schools offer a foundational CS course,
2023
Source: Code.org, 2023 | Chart: 2024 AI Index report
Data on the state of K–12
AK ME
CS education in the United
States comes from Code. VT NH MA
org, an education innovation
nonprofit dedicated to WA MT ND SD MN WI MI NY CT RI
ensuring that every school
includes CS as part of its core
K–12 education. CA NV UT CO KS MO KY WV DC MD DE
State-Level Trends AZ NM OK AR TN VA NC
In 2023, 30 American states
TX LA MS AL GA SC
required that all high schools
offer a foundational course in HI FL Yes
No
CS (Figure 6.2.1).
Figure 6.2.1
The percentage of public
Public high schools teaching foundational CS (% of total in state),
schools offering CS courses 2023
varies significantly from state
AK ME
to state (Figure 6.2.2). The 51% 66%
top three states in terms of VT NH MA
76% 81% 83%
percentage of CS offerings
are Maryland (99%), WA MT ND SD MN WI MI NY CT RI
47% 34% 47% 44% 28% 56% 55% 58% 84% 86%
Arkansas (99%), and Nevada
(96%); the bottom three are 64% 38% 63% 50% 84% 54% 91% 62% 68% 89%
Minnesota (28%), Montana CA NV UT CO KS MO KY WV DC MD DE
45% 96% 77% 54% 36% 50% 79% 78% 45% 99% 40%
(34%), and Louisiana (35%).
36% 50% 62% 99% 64% 75% 71%
TX LA MS AL GA SC
54% 35% 78% 95% 71% 94%
HI FL
77% 40% Figure 6.2.2

K–12 CS education is expanding in the United States (Figure 6.2.3). In 2017, only a few states supported high
school CS programs. Now, approximately two-thirds of states require that CS be taught in high schools,
allocate funding for it, and have developed state plans for CS education.
Changes over time in state-level US K–12 CS education

36, Allocate funding

35
30 30, Adopted state plan

29, Require CS course
25
Number of U.S. states
20
15
10
0
2017 2018 2019 2020 2021 2022 2023
Figure 6.2.3

AP Computer Science 201,000 exams were administered, marking an 11.1%

The state of K–12 CS education in the United States increase from 2021 (Figure 6.2.4). Since 2007, the
can also be observed by analyzing trends in the total number of AP CS exams administered has increased
number of AP CS exams. In 2022, approximately 8
more than tenfold.
Number of AP computer science exams taken, 2007–22

201.61
200
Number of AP computer science exams taken (in thousands)
179.19 181.04
158.56
150
130.90
99.87
100
54.38
50 46.34
37.33
29.55
24.78
19.39 19.83 20.96 19.39 21.14
0
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 6.2.4
8 There are two types of AP CS exams: Computer Science A and Computer Science Principles. Data on computer science exams taken includes both exams. AP CS Principles was initially
offered in 2017.

In 2022, California (33,262), Number of AP computer science exams taken, 2022

Texas (20,901), and Florida
(16,248) were the leading AK ME
157 287
states in terms of the number
VT NH MA
of AP CS exams taken (Figure 114 526 6,470
6.2.5). On the other end, WA MT ND SD MN WI MI NY CT RI
4,755 39 100 40 1,811 2,223 4,919 15,189 3,241 752
Montana (39), South Dakota
(40), and North Dakota (100) OR ID WY NE IA IL IN OH PA NJ
891 514 107 530 645 9,778 2,979 3,968 6,230 10,433
are the states where the
fewest exams were taken. 33,262 2,476 632 3,077 345 1,398 1,832 326 403 7,796 616
Per capita, Maryland (126.5), 1,881 445 629 1,500 2,347 6,142 6,880
New Jersey (112.7), and TX LA MS AL GA SC
20,901 987 791 2,983 8,195 2,103
Massachusetts (92.7) ranked
highest in the number of AP HI FL
720 16,248
CS exams taken (Figure 6.2.6). Figure 6.2.5
Number of AP computer science exams taken per 100,000 inhabitants,

2022
AK ME
21.41 20.66
VT NH MA
17.62 37.60 92.66
WA MT ND SD MN WI MI NY CT RI
61.08 3.47 12.84 4.40 31.69 37.74 49.03 77.21 89.81 68.75
21.02 26.51 18.40 26.93 20.16 77.71 43.60 33.74 48.03 112.66
85.20 77.92 18.69 52.68 11.75 22.63 40.61 18.38 60.06 126.48 60.42
25.54 21.06 15.65 49.24 33.30 70.77 64.32
TX LA MS AL GA SC
69.60 21.51 26.91 58.79 75.09 39.81
HI FL
50.02 73.04 Figure 6.2.6

Highlight:
Access Issues Schools o ering foundational CS courses by size,

2023
90%
90%
Code.org data suggests that factors such
as school size and location significantly 80%
75%
influence CS education accessibility. 70%
Large schools (over 1,200 students)
60%
are 15 percentage points more likely to
offer CS courses than medium-sized % of schools 50%
41%
schools (500–1,200 students), with the 40%
gap widening further when compared

30%
to small schools (under 500 students)
20%
(Figure 6.2.7). Similarly, students in
suburban districts have better access 10%
to CS courses than their counterparts in 0%

Small Medium Large
both urban and rural areas (Figure 6.2.8).
Figure 6.2.7
Schools o ering foundational CS courses by

geographic area, 2023
70% 67.21%
60%
54.73% 54.62%
50%
% of schools
40%
30%
20%
10%
0%
Urban Suburban Rural
Figure 6.2.8

Highlight:
ChatGPT Usage Among Teachers and Students

The introduction of generative AI ChatGPT usage rate among American K–12 teachers,
2023
tools, including ChatGPT, has sparked Source: Impact Research, 2023 | Chart: 2024 AI Index report
significant debate regarding their 63%
potential applications in education. 60%
Some individuals have raised concerns 51%

50%
that these tools could be misused
for plagiarism, potentially prompting
40%
Usage rate
a reevaluation of the ways in which

American students may be taught. 30%
This year, Impact Research, funded by

20%
the Walton Family Foundation, carried
out a series of surveys on American 10%
teachers’ and educators’ perceptions
and use of ChatGPT.9 The surveys 0%
2023-Mar 2023-Jul
revealed that a majority of K–12 teachers Figure 6.2.9
in the United States are already utilizing ChatGPT usage purposes among American K–12
teachers, 2023
ChatGPT, with usage increasing over Source: Impact Research, 2023 | Chart: 2024 AI Index report
the year: In March 2023, 51% of teachers

reported having used ChatGPT at least Coming up with
30%
creative class ideas
once, and by July 2023, that figure had
risen to 63% (Figure 6.2.9). Among the
teachers who reported using ChatGPT,
Usage purpose
30% employed it for lesson planning, Lesson planning 30%

another 30% for generating new creative
class ideas, and 27% for enhancing their
background knowledge (Figure 6.2.10).
Building background
27%
knowledge
0% 10% 20% 30%

Usage rate
Figure 6.2.10
9 To learn more about the surveys, including their methodologies, please visit the following links: March 2023 and July 2023.

Highlight:
ChatGPT Usage Among Teachers and Students (cont’d)

Both teachers and students have ChatGPT perceptions among educational users, 2023
Source: Impact Research, 2023 | Chart: 2024 AI Index report
overwhelmingly positive attitudes toward
Teachers Students
ChatGPT. According to the March 2023
survey, 88% of the teachers believe
that ChatGPT has a positive impact, a 88%
ChatGPT has had
sentiment echoed by 79% of the students a positive impact
79%
surveyed (Figure 6.2.11). Furthermore,
Statement
76% of teachers and 65% of students feel

that ChatGPT is important to incorporate
into the educational process. This recent
76%
data indicates that tools like ChatGPT ChatGPT is important
to incorporate
are poised to become a staple in the 65%
American educational landscape in the

foreseeable future. 0% 20% 40% 60% 80%
% of respondents that “agree”
Figure 6.2.11

CHAPTER 7:
Index Report 2024
Policy and
Governance
CHAPTER 7:
Index Report 2024 Policy and
Governance
Preview
Overview 368 7.4 AI Regulation 393
Chapter Highlights 369 U.S. Regulation 393
Overview 393
7.1 Overview of AI Policy in 2023 370 By Relevance 394
By Agency 395
7.2 AI and Policymaking 376 By Approach 396
Global Legislative Records on AI 376 By Subject Matter 397
Overview 376 EU Regulation 398
By Geographic Area 378 Overview 398
By Relevance 379 By Relevance 399
By Approach 380 By Agency 400
By Subject Matter 381 By Approach 401
U.S. Legislative Records 382 By Subject Matter 402
Federal Level 382
State Level 383 7.5 U.S. Public Investment in AI 403
AI Mentions 385 Federal Budget for AI R&D 403
Overview 385 U.S. Department of Defense Budget Requests 405
U.S. Committee Mentions 388 U.S. Government AI-Related Contract Spending 406
AI Contract Spending 406
7.3 National AI Strategies 391 Microelectronics and Semiconductor
By Geographic Area 391 Spending 409

CHAPTER 7:
Governance
Overview
AI’s increasing capabilities have captured policymakers’ attention. Over the past year,
several nations and political bodies, such as the United States and the European Union,
have enacted significant AI-related policies. The proliferation of these policies reflect
policymakers’ growing awareness of the need to regulate AI and improve their respective
countries’ ability to capitalize on its transformative potential.
This chapter begins examining global AI governance starting with a timeline of significant
AI policymaking events in 2023. It then analyzes global and U.S. AI legislative efforts,
studies AI legislative mentions, and explores how lawmakers across the globe perceive
and discuss AI. Next, the chapter profiles national AI strategies and regulatory efforts
in the United States and the European Union. Finally, it concludes with a study of public
investment in AI within the United States.

CHAPTER 7:
Governance
Chapter Highlights
1. The number of AI regulations in the United States sharply increases. The number of AI-related
regulations in the U.S. has risen significantly in the past year and over the last five years. In 2023, there were
25 AI-related regulations, up from just one in 2016. Last year alone, the total number of AI-related regulations
grew by 56.3%.
2. The United States and the European Union advance landmark AI policy action. In 2023,
policymakers on both sides of the Atlantic put forth substantial AI regulatory proposals. The European Union
reached a deal on the terms of the AI Act, a landmark piece of legislation enacted in 2024. Meanwhile, President
Biden signed an Executive Order on AI, the most notable AI policy initiative in the United States that year.
3. AI captures U.S. policymaker attention. The year 2023 witnessed a remarkable increase in AI-related
legislation at the federal level, with 181 bills proposed, more than double the 88 proposed in 2022.
4. Policymakers across the globe cannot stop talking about AI. Mentions of AI in legislative
proceedings across the globe have nearly doubled, rising from 1,247 in 2022 to 2,175 in 2023. AI was mentioned in
the legislative proceedings of 49 countries in 2023. Moreover, at least one country from every continent discussed
AI in 2023, underscoring the truly global reach of AI policy discourse.
5. More regulatory agencies turn their attention toward AI. The number of U.S. regulatory agencies
issuing AI regulations increased to 21 in 2023 from 17 in 2022, indicating a growing concern over AI regulation
among a broader array of American regulatory bodies. Some of the new regulatory agencies that enacted AI-
related regulations for the first time in 2023 include the Department of Transportation, the Department of Energy,
and the Occupational Safety and Health Administration.

Artificial Intelligence Chapter 7: Policy and Governance
Index Report 2024 7.1 Overview of AI Policy in 2023
This chapter begins with an overview of some of the most significant AI-related policy events in 2023, as selected by
the AI Index Steering Committee.
7.1 Overview of AI Policy in 2023

China introduces regulation on administration
Jan. 10, of deep synthesis of the internet
2023
China introduces regulations aimed at “deep synthesis”
technology to tackle security issues related to the creation
of realistic virtual entities and multimodal media, including
Source: China Talk, 20221
“deepfakes.” These regulations apply to both providers and Figure 7.1.1
users across different media and mandate measures, such
as preventing illegal content, adhering to legal compliance,
verifying user identities, securing consent for biometric editing,
safeguarding data security, and enforcing content moderation.
U.S. legislators propose AI for National Security Act

Mar. 22,
This legislation clarifies and solidifies the Department of
2023
Defense’s (DoD) authority to acquire AI-based endpoint
security tools, enhancing its cyber-defense capabilities.
It aims to enable the DoD to employ AI for the automatic
detection and mitigation of threats to its networks and digital
infrastructure. This bipartisan initiative ensures the DoD can
Source: Brookings, 2018
adopt innovative commercial technologies to strengthen its Figure 7.1.2
cyber defenses, matching the pace of adversaries.
The sources cited in this section are for the images included in the text.

U.S. policymakers introduce AI Leadership

May 11, Training Act
2023
This legislation aims to enhance AI literacy among federal
leaders in response to AI’s widespread adoption across
government agencies. It mandates the director of the Office
of Personnel Management (OPM) to create and periodically
Source: Fox News, 2023
refresh an AI training program, promoting responsible and Figure 7.1.3
ethical AI usage within the federal government. Building on
previous laws, the initiative expands AI training to include
federal employees involved in procuring AI technologies for
government use.
U.S. policymakers propose National AI

Jun. 20, Commission Act
2023
The National AI Commission Act calls for establishing a
National AI Commission tasked with crafting a comprehensive
AI regulatory framework. Highlighting the importance of
expert input due to AI’s rapid innovation and complexity, this Source: Nextgov, 2023
Figure 7.1.4
bipartisan initiative focuses on mitigating risks, preserving
U.S. leadership in AI research and development, and ensuring
consistency with American values.
House of Representatives advances Jobs of the

Jul. 06, Future Act
2023
The bill endorses a study to evaluate industries and
occupations anticipated to grow due to AI, assess its
effects on workers’ skills or potential replacement,
examine stakeholder influence opportunities, identify
Source: LSE Business Review, 2019
the demographics most impacted, evaluate the required Figure 7.1.5
skills and education, review data accessibility, investigate
efficient skill delivery methods, and explore the role of
academic institutions in offering critical training.

U.S. Senate puts forward Artificial Intelligence and

Jul. 19, Biosecurity Risk Assessment Act
2023
The act mandates the assistant secretary for preparedness
and response to assess and address threats to public health
and national security from technical advancements in
artificial intelligence. It emphasizes evaluating the potential
use of AI, including open-source models, for developing Source: Clinical Trials Arena, 2023
Figure 7.1.6
harmful agents. The proposed initiatives include monitoring
global biological risks and integrating risk assessment
summaries into the National Health Security Strategy.
Private AI labs sign voluntary White House AI

Jul.21, commitments
2023
The Biden-Harris administration obtains voluntary pledges
from seven major AI firms—Google, Microsoft, Meta,
Amazon, OpenAI, Anthropic, and Inflection—to promote the
development of AI that is safe, secure, and reliable. These
commitments involve conducting internal and external
security assessments of AI systems prior to launch, sharing Source: Medium, 2023
Figure 7.1.7
information on identified risks, enabling public reporting of
issues, and disclosing when content is AI-generated.
U.S. Senate passes Outbound Investment

Jul. 25, Transparency Act
2023
This initiative aims to scrutinize U.S. investments in
critical sectors, especially those involving China, with
a focus on evaluating risks in crucial industries and
technologies such as AI that impact national security.
The objective is to increase awareness of potential
vulnerabilities and risks linked to foreign access to
Source: AI CIO, 2023
American technology in these domains. Figure 7.1.8

U.S. Senate proposes CREATE AI Act

Jul. 27,
The CREATE AI Act establishes the National Artificial
2023
Intelligence Research Resource (NAIRR), a national research
infrastructure to improve AI researchers’ and students’
access to essential resources. NAIRR offers compute,
curated datasets, educational tools, and AI testbeds. It aims
to bolster the nation’s AI research capabilities by supporting
Source: Stanford HAI, 2023
the testing and evaluation of AI systems. Figure 7.1.9
China updates cyberspace administration of

Aug. 15, generative AI measures
2023
China’s updated policy adopts a more targeted regulatory
approach, focusing on applications with public implications
rather than a blanket regulation. The amendments soften
the regulatory language, changing directives like “ensure
the truth, accuracy, objectivity, and diversity of the data”
to “employ effective measures to enhance the quality of Source: South China Morning Post, 2023
Figure 7.1.10
training data and improve its truth, accuracy, objectivity, and
diversity.” Additionally, the revised regulations encourage
generative AI development, shifting away from the prior
punitive focus.
U.S. Senate puts forward Protect Elections from

Sep. 12, Deceptive AI Act
2023
The bipartisan bill seeks to prohibit the use of AI to create
materially deceptive content that falsely represents
federal candidates in political advertisements. This act
addresses the risks of AI-driven disinformation in elections
by banning the distribution of materially deceptive AI- Source: The Economist, 2023
Figure 7.1.11
generated audio or visual content related to candidates
running for federal office.

U.K. proposes principles to guide competitive AI

Sep. 18, markets and protect consumers
2023
The U.K.’s Competition and Markets Authority proposes
principles to foster competitive AI markets while ensuring
consumer protection. These principles are designed to
guarantee accountability for AI outputs, maintain continuous Source: Science Business, 2022
Figure 7.1.12
access to essential inputs, promote a diversity of business
models, provide businesses with choices, offer flexibility to
switch between models, and ensure fair practices to prevent
anticompetitive behavior.
President Biden issues Executive Order on Safe,

Oct. 30, Secure, and Trustworthy AI
2023
The executive order establishes new benchmarks for AI safety,
security, privacy protection for Americans, advancement of
equity and civil rights, and the fostering of competition and
innovation. It mandates the creation of a national security
memorandum to guide the safe and ethical application of AI in
military and intelligence operations, ensuring the protection of Source: AP, 2023
Figure 7.1.13
Americans’ privacy and the cultivation of an open, competitive
AI market that emphasizes U.S. innovation. Additionally, the Department of Education is tasked
with addressing AI’s safe and responsible use in education, while the Federal Communications
Commission is encouraged to assess AI’s impact on telecommunications. The National Institute of
Standards and Technology (NIST) is instructed to formulate guidelines and best practices to support
industry consensus on developing and deploying secure, reliable, and ethical AI.
Frontier AI taskforce releases second

Oct. 30, progress report
2023
The task force forms new alliances with leading AI
organizations and facilitates the development of the U.K.’s
AI Research Resource (AIRR), to be known as Isambard-
AI, an AI supercomputer designed for compute-intensive
safety research. Moreover, the report highlights the task
Source: PYMNTS, 2022
force’s initiatives to mitigate risks inherent in advanced Figure 7.1.14
AI development and its partnerships with premier AI

companies to gain early access to their models.

U.K. hosts AI Safety Summit (2023)

Nov. 01,
The UK AI Safety Summit at Bletchley Park seeks to tackle
2023
AI risks and promote global cooperation, culminating in
the Bletchley Declaration. This declaration, endorsed by 28
countries, including China and the United States, signifies
a significant global agreement on AI safety. The U.K. also
unveiled the world’s inaugural AI Safety Institute, dedicated to Source: CGTN, 2023
Figure 7.1.15
safety assessments and research. Despite these developments,
reactions are mixed, with certain experts advocating for more
comprehensive and ambitious policy measures.
U.K. announces AI Safety Institute

Nov. 02,
The AI Safety Institute, the first government-supported
2023
entity dedicated to advancing AI safety in the public interest,
aims to safeguard the U.K. and humanity from unforeseen
AI advancements. Its goal is to build the sociotechnical
framework required to comprehend and govern the risks
associated with advanced AI. By conducting fundamental AI
safety research, the institute intends to enhance worldwide Source: Gov.uk, 2024
Figure 7.1.16
comprehension of the dangers posed by advanced AI
systems and create the technical tools vital for effective AI governance. Furthermore, it aspires to
position the U.K. as a global center for safety research, thereby reinforcing the nation’s strategic
investment in this critical technology.
Europeans reach deal on EU AI Act

Dec. 09,
European lawmakers reach a tentative deal on the AI Act.
2023
The act establishes a risk-based regulatory framework for
AI, prohibiting systems with unacceptable risks, such as
behavioral manipulators, and classifying high-risk systems
into product-based and critical sectors. Generative AI, such
as ChatGPT, is required to adhere to transparency standards. Source: Stanford HAI, 2023
Figure 7.1.17
Meanwhile, low-risk AI, including deepfake technologies, is
subject to fundamental transparency obligations.

Index Report 2024 7.2 AI and Policymaking
7.2 AI and Policymaking

Global Legislative Records on AI
Overview
The AI Index analyzed legislation containing “artificial count of AI-related bills passed since 2016. While the
intelligence” in 128 countries from 2016 to 2023.2 Of total dropped to 28 in 2023 from 39 in the previous
these, 32 countries have enacted at least one AI-related year, the number of AI-related bills passed in 2023
bill (Figure 7.2.1).3 In total, the countries have passed significantly exceeds the total passed in 2016.
148 AI-related bills. Figure 7.2.2 illustrates the annual
Number of AI-related bills passed into law by country, 2016–23

0
1–5
6–10
11–15
16–25
No available data
Figure 7.2.1
2 The analysis of passed AI policies may undercount the number of actual bills, given that large bills can include multiple sub-bills related to AI; for example, the CHIPS and Science Act
passed by the United States in 2022.
3 The AI Index monitored AI-related bills passed in Hong Kong and Macao, despite these not being officially recognized countries. Thus, the Index covers a total of 130 geographic areas.
Laws passed by Hong Kong and Macao were counted in the overall tally of AI-related bills. This year, the Index expanded its country sample compared to previous years, resulting in a
difference between the number of AI-related bills reported this year and those in prior reports.

Number of AI-related bills passed into law in 128 select countries, 2016–23
40
35
30
28
Number of AI-related bills
25
20
15
10
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.2

By Geographic Area Number of AI-related bills passed into law in select

countries, 2023
Figure 7.2.3 highlights the number of laws containing Source: AI Index, 2024 | Chart: 2024 AI Index report
mentions of AI that were enacted in 2023. Belgium Belgium 5

France 3
led with five laws, followed by France, South Korea,
South Korea 3
and the United Kingdom, each of which passed United Kingdom 3
three. Figure 7.2.4 shows the total number of laws Argentina 2
Portugal 2
passed since 2016. The United States (23) has passed
Spain 2
the most AI-related laws since 2016, followed by Andorra 1
Portugal (15), and Belgium (12). Austria 1
Hungary 1
Italy 1
Kazakhstan 1
Luxembourg 1
Russia 1
United States 1
0 1 2 3 4 5
Figure 7.2.3
Number of AI-related bills passed into law in select

countries, 2016–23 (sum)
United States 23
Portugal 15
Belgium 12
Spain 11
South Korea 11
Italy 10
Russia 10
United Kingdom 8
France 7
Austria 6
Philippines 5
Slovenia 3
Argentina 3
Andorra 3
Germany 3
0 2 4 6 8 10 12 14 16 18 20 22
Figure 7.2.4

By Relevance a medium AI relevance. Low relevance AI bills

The AI Index team further disaggregated AI-related merely mention AI in passing without a substantial
bills based on their relevance to AI, as not every bill legislative focus on AI. An example of a low relevance
mentioning AI prioritizes it equally. A bill deemed to AI bill is the Energy and Water, Legislative Branch,
have high relevance to AI is fundamentally focused and Military Construction and Veterans Affairs
on AI-related policy, like the AI Training Act passed Appropriations Act, 2019. This bill allocates funding
in 2022, which mandates AI training programs to various federal agencies, and mentions AI primarily
for executive agency workers. Conversely, bills in the context of encouraging these agencies to
with medium relevance incorporate significant AI consider workforce training opportunities for sectors
policy elements but are not fundamentally focused like cybersecurity, energy, and AI.
on AI-related matters. For example, the National
Figure 7.2.5 illustrates the distribution of AI-related
Defense Authorization Act for Fiscal Year 2022
bills passed into law globally in 2023, categorized
includes sections on AI performance metrics and
by their relevance to AI. Out of 28 AI-related bills
AI capabilities development for the Department of
enacted, two were classified as having high relevance
Defense. However, because it has a broader focus,
to AI, while 18 were deemed to have medium
namely authorizing various Defense Agency programs,
relevance.
and is not completely centered on AI, it was assigned
Number of AI-related bills passed into law in select countries by relevance to AI, 2016–23
40 39
Low Medium High
4
35
30
30
3 28
26 17
25 3
20
17
18
15 15
13
12
10
6 18
9
5 10
3 8 8
6
1 3 3
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.5

By Approach
policymakers focus on expanding AI capabilities,
The AI Index also categorized AI-related bills as
imposing restrictions, or balancing both.
either expansive or restrictive. Expansive bills aim to
enhance a nation’s AI capabilities, such as establishing Figure 7.2.6 indicates a global trend toward regulating
a network of publicly accessible supercomputers. AI usage, showing that, while the commitment to
Restrictive bills, on the other hand, impose limitations enhancing AI capabilities remains, there is a growing
on AI usage, like setting rules for deploying facial shift toward restrictive legislation. This change
recognition technology. A bill can be both, or neither. 4
suggests that legislators are increasingly focused on
Distinguishing between expansive or restrictive mitigating the potential harms of AI’s integration into
bills can highlight legislator priorities: whether society.
Number of AI-related bills passed into law in select countries by approach, 2016–23
25
Expansive Restrictive 24
21 21
7
20
5 18
15 6
10
9
8 17
16
13
12
5
8
3 7
1 3
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.6
4 The AI Index only categorized bills as being expansive or restrictive if they were identified as having medium or high AI relevance. Consequently, the totals depicted in Figure 7.2.5 may not
fully correspond with those presented earlier in the chapter.

By Subject Matter
The AI Index’s global analysis of AI legislation in 2023 the distribution of primary topics among
classifies bills by their primary subject matter passed bills broadened significantly, encompassing
according to the typology used by the U.S. Congress a diverse range of policy areas. Specifically, two bills
to classify American legislation. Historically, 5
were passed in each of the following categories:
economics and public finance have been the armed forces and national security; civil rights and
predominant focus of AI-related legislation, reflecting liberties, minority issues; commerce; education;
the fact that AI-related policymaking matters are labor and employment; science, technology, and
often incorporated within budgetary bills related communication. This diversity indicates that AI policy
to public appropriations (Figure 7.2.7). However, concerns are increasingly spanning various sectors.
Number of AI-related bills passed into law in select countries by primary subject matter, 2016–23
2016 2017 2018 2019 2020 2021 2022 2023
Armed forces and national security 0 1 2 1 0 3 2 2

Arts, culture, religion 0 0 0 0 0 0 1 0
Civil rights and liberties, minority issues 0 0 0 0 0 1 1 2
Commerce 0 1 0 0 1 0 1 2
Crime and law enforcement 0 0 0 1 0 1 1 1
Economics and public nance 0 1 1 3 8 6 7 5
Primary subject matter
Education 0 0 0 0 3 1 0 2
Energy 0 0 0 0 1 0 0 0
Environmental protection 0 0 0 0 0 1 0 0
Finance and nancial sector 0 0 0 0 0 0 0 1
Foreign trade and international nance 0 0 0 0 0 0 0 1
Government operations and politics 0 0 0 0 0 1 2 0
Health 0 0 1 2 1 0 0 0
Labor and employment 0 0 0 1 0 3 4 2
Science, technology, communications 1 0 2 1 4 2 2 2
Taxation 0 0 0 0 0 1 0 0
Transportation and public works 0 0 1 0 0 0 0 0
Figure 7.2.7
5 Similar to the classification of bills as either expansive or restrictive, only bills coded as having a medium or high relevance to AI were coded for their primary subject matter. Consequently,
not all AI-related bills featured in this section’s analysis have subject matter coding available.

U.S. Legislative Records

Federal Level
Figure 7.2.8 illustrates the total number of passed to 181 in 2023. This significant increase in U.S. AI-
versus proposed AI-related bills in the U.S. Congress, related legislative activity likely reflects policymakers’
highlighting a significant increase in proposed response to the increasing public awareness and
legislation. In the last year, the count of proposed AI- capabilities of AI technologies, such as ChatGPT.
related bills more than doubled, rising from 88 in 2022
Number of AI-related bills in the United States, 2016–23 (proposed vs. passed)
180 181, Proposed
160
140
120
100
80
60
40
20
0 1, Passed
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.8

State Level Number of AI-related bills passed into law in select

US states, 2023
The AI Index also tracks data on the enactment of Source: AI Index, 2024 | Chart: 2024 AI Index report
AI-related legislation at the state level. Figure 7.2.9 California 7

Virginia 5
highlights the number of AI-related laws enacted
Maryland 3
by U.S. states in 2023. California leads with seven North Dakota 2
laws, followed by Virginia with five, and Maryland Washington 2
West Virginia 2
with three. Figure 7.2.10 displays the total amount
Alabama 1
of legislation passed by states from 2016 to 2023. Arizona 1
California again tops the ranking with 13 bills, Colorado 1
Connecticut 1
followed by Maryland (10) and Washington (7).
Florida 1
Georgia 1
Illinois 1
Iowa 1
Louisiana 1
0 1 2 3 4 5 6 7
Figure 7.2.9
Number of state-level AI-related bills passed into law in the

United States by state, 2016 23 (sum)
AK ME
0 0
VT NH MA
4 0 6
7 0 3 0 1 0 2 2 1 0
0 1 0 0 1 4 0 2 0 2
CA NV UT CO KS MO KY WV MD DE
13 2 4 3 0 0 1 3 10 0
1 1 0 0 0 6 3
TX LA MS AL GA SC
2 2 2 3 1 0
HI FL
1 2
Figure 7.2.10

Figure 7.2.11 displays the total number of state-level proposed in 2022. A significantly greater proportion
AI-related bills proposed and passed in the United of AI-related bills are enacted into law at the state
States since 2016. In 2023, 150 total state-level bills level in the United States, compared to the federal
were proposed, a significant increase from the 61 bills level.
Number of state-level AI-related bills in the United States, 2016–23 (proposed vs. passed)
150, Proposed
140
120
100
80
60
40 38, Passed
20
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.11

Overview
AI Mentions Figure 7.2.12 reveals a significant increase in the
Another barometer of legislative interest is the number mentions of AI in legislative proceedings across the
of mentions of artificial intelligence in governmental globe, nearly doubling from 1,247 in 2022 to 2,175 in
and parliamentary proceedings. The AI Index 2023. Since 2016, AI mentions in legislative discussions
conducted an analysis of the minutes or proceedings have risen almost tenfold. This data suggests that the
of legislative sessions in 80 countries that contain the emergence of AI systems such as ChatGPT in 2023
keyword “artificial intelligence” from 2016 to 2023. 6
has notably captured policymakers’ attention.
Number of mentions of AI in legislative proceedings in 80 select countries, 2016–23

2,175
2,000
1,500
Number of mentions
1,000
500
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.2.12
6 The full list of countries analyzed can be found in the Appendix. The AI Index research team attempted to review the governmental and parliamentary proceedings of every country in the
world; however, publicly accessible governmental and parliamentary databases were not made available for all countries.

In 2023, the United Kingdom led in AI mentions within When legislative mentions are aggregated from
its legislative proceedings (405), followed by the 2016 to 2023, a somewhat similar trend emerges
United States (240) and Australia (227) (Figure 7.2.13). (Figure 7.2.14). The United Kingdom is first, with 1,490
Out of 80 countries analyzed, 48 mentioned AI at least mentions, followed by Spain (886) and the United
once. Moreover, AI discussions reached legislative States (868).
platforms in at least one country from every continent
in 2023, underscoring the truly global reach of AI
policy discourse.
Number of mentions of AI in legislative proceedings by country, 2023

0
1–55
56–120
121–250
251–410
No available data
Figure 7.2.13

Number of mentions of AI in legislative proceedings by country, 2016–23 (sum)

0
1–220
221–440
441–660
661–890
891–1,500
No available data
Figure 7.2.14

U.S. Committee Mentions Figure 7.2.15 shows the frequency of AI mentions in

Mentions of artificial intelligence in committee U.S. committee reports by legislative session from
reports by House and Senate committees serve as 2001 to 2023. Mentions of AI have decreased for the
another indicator of legislative interest in AI in the current 118th session; however, it is important to note
United States. Typically, these committees focus that this session is only about halfway through, with
on legislative and policy issues, investigations, and an end date set for January 2025. Continuing at the
internal matters. current rate, the 118th legislative session is poised to
surpass all previous sessions in terms of AI mentions.
Mentions of AI in US committee reports by legislative session, 2001–23

80
70
60
50
Number of mentions
48
40
30
20
10
107th 108th 109th 110th 111th 112th 113th 114th 115th 116th 117th 118th
(2001–02) (2003–04) (2005–06) (2007–08) (2009–10) (2011–12) (2013–14) (2015–16) (2017–18) (2019–20) (2021–22) (2023–)
Figure 7.2.15

Figure 7.2.16 depicts AI mentions in the committee reports of the U.S. House of Representatives during the ongoing
118th congressional session. The Appropriations and Science, Space, and Technology committees feature the
highest number of AI mentions. Meanwhile, Figure 7.2.17 highlights AI mentions in Senate committee reports, with
Appropriations leading (9), followed by the Homeland Security and Governmental Affairs Committee (3).
Mentions of AI in committee reports of the US House of Representatives for the 118th congressional
session, 2023
Appropriations 7
Science, Space, and Technology 7
Rules 4
Energy and Commerce 3
Transportation and Infrastructure 2
Agriculture 1
Armed Services 1
Education and the Workforce 1
Financial Services 1
Foreign A airs 1
Intelligence (Permanent Select) 1
Oversight and Accountability 1
Ways and Means 1
0 1 2 3 4 5 6 7
Number of mentions Figure 7.2.16
Mentions of AI in committee reports of the US Senate for the 118th congressional session, 2023
Appropriations 9
Homeland Security and

3
Governmental A airs
Intelligence (Select) 2
Armed Services 1
Banking, Housing, and

1
Urban A airs
Commerce, Science, and

1
Transportation
0 1 2 3 4 5 6 7 8 9

Figures 7.2.18 and 7.2.19 show the total number of mentions in committee reports from congressional sessions
occurring since 2001. The House and Senate Appropriations committees, which regulate expenditures of money
by the federal government, lead their respective lists.
Mentions of AI in committee reports of the US House of Representatives, 2001–23 (sum)

Appropriations 52
Science, Space, and Technology 34
Rules 18
Energy and Commerce 12
Armed Services 10
Oversight and Accountability 9
Financial Services 7
Intelligence (Permanent Select) 7
Transportation and Infrastructure 6
Education and the Workforce 5
Foreign A airs 3
Homeland Security 3
Veterans’ A airs 3
Ways and Means 3
Agriculture 2
Budget 2
Judiciary 2
Natural Resources 2
Small Business 2
House Administration 1
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
Mentions of AI in committee reports of the US Senate, 2001–23 (sum)

Appropriations 25
Homeland Security and

14
Governmental A airs
Armed Services 11
Commerce, Science, and

10
Transportation
Energy and Natural Resources 7
Intelligence (Select) 7
Banking, Housing, and

1
Urban A airs
0 2 4 6 8 10 12 14 16 18 20 22 24 26
Number of mentions
Figure 7.2.19

Index Report 2024 7.3 National AI Strategies
This section offers an overview of national AI strategies, which are policy plans created by governments to guide
the development and deployment of AI within their country. Monitoring trends in these strategies is important for
assessing how countries prioritize the development and regulation of AI technologies. Sources include national or
regional government websites, the OECD AI Policy Observatory (oecd.ai), and news reports.7
7.3 National AI Strategies

By Geographic Area Figure 7.3.1 identifies countries that have either
Canada initiated the first national AI strategy in released or are in the process of developing a national
March 2017. To date, 75 national AI strategies have AI strategy as of January 2024. Figure 7.3.2 lists the
been unveiled. The peak year was 2019, when countries that are in the process of developing an AI
24 strategies were released. In 2023, eight new strategy within the past three years. The list of new
strategies were added, from countries in the Middle countries developing national AI strategies include:
East, Africa, and the Caribbean, showcasing the Antigua and Barbuda, Barbados, Belarus, Costa Rica,
worldwide expansion of AI policymaking discourse. Jamaica, Pakistan, and Senegal. Figure 7.3.3 provides a
timeline of the release of national AI strategies.
Countries with a national strategy on AI, 2023

Released
In development
Not released
Figure 7.3.1
7 The AI Index research team made efforts to identify whether there was a national AI strategy that was released or in development for every nation in the world. It is possible that some
strategies were missed.

Index Report 2024 7.3 National AI Strategies
AI national strategies in development by country Yearly release of AI national strategies by country

and year Source: AI Index, 2024 | Table: 2024 AI Index report
Year Country
Year Country
2017 Canada, China, Finland
2021 Andorra, Armenia, Cuba, Iceland, Morocco, New Zealand
2018 France, Germany, India, Mauritius, Mexico, Sweden
2022 Kenya
2019 Argentina, Bangladesh, Chile, Colombia, Cyprus, Czech
2023 Antigua and Barbuda, Barbados, Belarus, Costa Rica, Jamaica, Republic, Denmark, Egypt, Estonia, Japan, Lithuania,
Pakistan, Senegal Luxembourg, Malta, Netherlands, Portugal, Qatar, Romania,
Russia, Sierra Leone, Singapore, Slovak Republic, United Arab
Figure 7.3.2 Emirates, United States of America, Uruguay
2020 Algeria, Bulgaria, Croatia, Greece, Hungary, Indonesia, Latvia,

North Korea, Norway, Poland, Saudi Arabia, Serbia, Spain,
Switzerland
2021 Australia, Austria, Brazil, Hong Kong, Ireland, Malaysia, Peru,

Philippines, Slovenia, Tunisia, Turkey, Ukraine, United
Kingdom, Vietnam
2022 Belgium, Ghana, Iran, Italy, Jordan, Thailand

2023 Azerbaijan, Bahrain, Benin, Dominican Republic, Ethiopia,
Iraq, Israel, Rwanda
Figure 7.3.3

Index Report 2024 7.4 AI Regulation
The advent of AI has garnered significant attention from regulatory agencies—federal bodies tasked with regulating
sectors of the economy and steering the enforcement of laws. This section examines AI regulations within the United
States and the European Union. Unlike legislation, which establishes legal frameworks within nations, regulations are
detailed directives crafted by executive authorities to enforce legislation. In the United States, prominent regulatory
agencies include the Environmental Protection Agency (EPA), Food and Drug Administration (FDA), and Federal
Communications Commission (FCC). Since the specifics of legislation often manifest through regulatory actions,
understanding the AI regulatory landscape is essential in order to develop a deeper understanding of AI policymaking.
7.4 AI Regulation
U.S. Regulation from nearly all branches of the American government,
encompassing more than 436 agencies.8
This section examines AI-related regulations enacted
by American regulatory agencies between 2016 and Overview
2023. It provides an analysis of the total number of The number of AI-related regulations has risen
regulations, as well as their topics, scope, regulatory significantly, both in the past year and over the last five
intent, and originating agencies. To compile this years (Figure 7.4.1). In 2023, there were 25 AI-related
data, the AI Index team performed a keyword search regulations, a stark increase from just one in 2016. Last
for “artificial intelligence” on the Federal Register, a year alone, the total number of AI-related regulations
comprehensive repository of government documents grew by 56.3%.
Number of AI-related regulations in the United States, 2016–23

25 25
20
Number of AI-related regulations
15
10
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.4.1
8 A full description of the project’s methodology can be found in the Appendix.

By Relevance example is the Securities and Exchange Commission’s

The AI Index categorized AI-related regulations— Cybersecurity Risk Management Strategy, Governance,
those mentioning AI—into three levels of relevance: and Incident Disclosure, which established
low, medium, and high. In 2023, the number of 9
standardized disclosure practices for public companies
high and medium relevance AI-related regulations concerning cybersecurity risk management, strategy,
increased compared to 2022. For instance, a high governance, and incidents.
relevance AI regulation was the Copyright Office
Figure 7.4.2 categorizes AI-related regulations in the
and Library of Congress’ Copyright Registration
United States based on their relevance to AI. A growing
Guidance: Works Containing Material Generated by
proportion of these regulations is highly relevant to
Artificial Intelligence. This policy statement clarified
AI. Among the 25 AI-related regulations enacted in
registration practices for works incorporating AI-
2023, four were identified as being highly relevant, the
generated material. Meanwhile, a medium-relevance
greatest amount since tracking began in 2016.
Number of AI-related regulations in the United States by relevance to AI, 2016–23

25
25 Low Medium High
20
7
16 16
15
5
12
6
10
10
2 6
14
11
5
9
8
3
6
1
2
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.4.2
9 A high relevance regulation focuses entirely on AI or AI-related issues. A medium relevance regulation includes meaningful mentions of AI but is not solely centered on it. A low relevance
regulation mentions AI in passing, without a significant focus on AI-related matters.

By Agency10
Which agencies are the primary sources of AI (Figure 7.4.3). Furthermore, the number of agencies
regulations? In 2023, the Executive Office of the issuing AI regulations increased from 17 in 2022 to 21 in
President and the Commerce Department led with 2023, indicating a growing need for clarity and concern
five AI-related regulations each, followed by the regarding AI among a broader array of American
Health and Human Services Department and the regulatory bodies.
Industry and Security Bureau, with each issuing four
Number of AI-related regulations in the United States by agency, 2016–23

2016 2017 2018 2019 2020 2021 2022 2023

Census Bureau 0 NaN 0 0 0 0 0 1
Centers for Medicare & Medicaid Services 0 NaN 1 2 4 5 4 3
Children and Families Administration 0 NaN 0 0 1 1 1 0
Commerce Department 0 NaN 0 0 1 1 3 5
Comptroller of the Currency 0 NaN 0 0 0 0 1 0
Consumer Financial Protection Bureau 0 NaN 0 1 0 0 1 1
Copyright O�ce, Library of Congress 0 NaN 0 1 0 0 0 1
Education Department 0 NaN 0 0 2 0 0 0
Employee Bene�ts Security Administration 0 NaN 0 0 0 0 1 0
Employment and Training Administration 0 NaN 0 0 0 1 0 0
Energy Department 0 NaN 0 0 0 0 0 1
Executive O�ce of the President 0 NaN 2 6 5 2 3 5
Federal Communications Commission 0 NaN 0 0 1 0 0 0
Federal Railroad Administration 0 NaN 0 0 0 0 0 1
Food and Drug Administration 0 NaN 0 0 1 1 2 1
Health and Human Services Department 0 NaN 1 2 5 5 5 4
Agency
Homeland Security Department 1 NaN 0 0 0 3 0 1

Housing and Urban Development Department 0 NaN 0 0 1 0 0 1
Industry and Security Bureau 0 NaN 0 0 0 0 3 4
Investment Security O�ce 0 NaN 0 0 1 0 0 0
Labor Department 0 NaN 0 0 0 1 1 1
Library of Congress 0 NaN 0 1 0 0 0 1
National Credit Union Administration 0 NaN 0 0 0 0 0 1
National Science Foundation 0 NaN 0 0 0 0 0 1
Nuclear Regulatory Commission 0 NaN 0 0 0 0 1 0
Occupational Safety and Health Administration 0 NaN 0 0 0 0 0 1
O�ce of Inspector General 0 NaN 0 0 2 0 0 0
O�ce of the Inspector General 0 NaN 0 0 0 1 1 0
O�ce of the Secretary 0 NaN 0 0 1 3 1 1
Patent and Trademark O�ce 0 NaN 0 0 1 0 0 0
Public Health Service 0 NaN 0 0 0 1 1 0
Securities and Exchange Commission 0 NaN 0 0 0 0 1 2
Transportation Department 0 NaN 0 0 0 0 0 1
Treasury Department 0 NaN 0 0 1 0 1 0
Figure 7.4.3
10 Regulations can originate from multiple agencies, so the annual totals in Figure 7.4.3 may exceed those in Figure 7.4.1.

By Approach restriction (Figure 7.4.4). In 2023, there were 10

The AI Index categorized regulations based on their restrictive AI regulations compared to just three that
approach: whether they expanded or restricted AI were expansive. Conversely in 2020, there were four
capabilities. Over time, the trend in AI regulations
11
regulations that were expansive and one that was
in the United States has shifted significantly toward restrictive.
Number of AI-related regulations in the United States by approach, 2016–23

13
Expansive Restrictive
12
10
8 10
6
6
5
4 4
3 3
2 4
3
1 1
2 2 2
0
2016 2017 2018 2019 2020 2021 2022 2023
Figure 7.4.4
11 Expansive regulations refer to actions by regulatory agencies or governments aimed at augmenting AI capacity, including investments in supercomputing infrastructure. Restrictive
regulations involve steps to curtail AI capabilities, such as imposing restrictions on the use of facial recognition algorithms. Restrictive AI regulations may also be intended to address
underlying policy concerns, such as AI’s potential impact on citizens’ civil liberties. According to this coding typology, a regulation can be classified as both expansive and restrictive, or it may
fit neither category. The AI Index assigned the labels “expansive” or “restrictive” only to regulations deemed to have medium to high relevance to AI. Therefore the regulation totals in Figure
7.4.4 are less than those reported earlier in the section.

By Subject Matter
In 2023, American AI regulations were categorized by topics tied for second place, with two occurrences
primary subject matter. The most prevalent subject each: health; commerce; and science, technology, and
matter in AI-related regulation was foreign trade and communications (Figure 7.4.5).
international finance, with three instances. Three
Number of AI-related regulations in the United States by primary subject matter, 2016–23
2016 2017 2018 2019 2020 2021 2022 2023
Armed forces and national security 0 NaN 0 0 0 0 0 1
Civil rights and liberties, minority issues 0 NaN 0 0 0 0 0 1
Commerce 0 NaN 0 0 0 1 0 2
Crime and law enforcement 0 NaN 0 0 0 1 0 0
Education 0 NaN 0 0 1 0 0 0
Primary subject matter
Finance and nancial sector 0 NaN 0 0 0 0 1 0
Foreign trade and international nance 0 NaN 0 0 0 0 2 3
Government operations and politics 0 NaN 0 0 2 0 0 0
Health 0 NaN 0 0 3 4 1 2
Housing and community development 0 NaN 0 0 1 0 0 0
Immigration 1 NaN 0 0 0 0 0 0
Labor and employment 0 NaN 1 0 0 0 0 0
Science, technology, communications 0 NaN 0 2 0 0 1 2
Figure 7.4.5
12 The AI Index team used Congress’ policy categorization typology. Only regulations that have medium and high AI relevance were coded for their primary subject matter.

EU Regulation regulatory authority. The search for AI-related regulation

in the European Union was limited to legal acts,
The AI Index also gathered information on AI-related
international agreements, and consolidated texts. The
regulations enacted in the European Union between
same methodological approach was used to code EU
2017 and 2023. To compile this data, the Index
regulations, as was used to code U.S. regulations.13
team conducted a keyword search for “artificial
intelligence” on EUR-Lex, a comprehensive database Overview
of EU legislation, regulations, and case law. EUR- The number of AI-related regulations passed by the
Lex provides access to a wide range of regulatory European Union increased from 22 in 2022 to 32 in
documents, such as legal acts, consolidated texts, 2023 (Figure 7.4.6). Despite this increase, the number
international agreements, preparatory documents, of AI-related regulations passed by the European
and legislative procedures. The analysis in this section Union peaked in 2021 with 46.
focused exclusively on documents with binding
Number of AI-related regulations in the European Union, 2017–23

45
40
35
32
30
25
20
15
10
0
2017 2018 2019 2020 2021 2022 2023
Figure 7.4.6
13 The methodological approach refers to coding regulations based on relevance, originating agency, approach, and subject matter.

By Relevance Horizon Europe, a framework program for research

In 2021, the European Union passed its first highly and innovation. Of the 32 regulations passed in
relevant AI-related regulations. These regulations 2023, two had high relevance to AI, 13 had medium
established the Digital Europe Programme and relevance, and 17 had low relevance (Figure 7.4.7).
Number of AI-related regulations in the European Union by relevance to AI, 2017–23

46
Low Medium High
45
40
35 19
32
30
25
22 13
20
15 10
12 25
11
10 3
5 17
5 4 9 10
2 6
3
0
2017 2018 2019 2020 2021 2022 2023
Figure 7.4.7

By Agency
The two most prominent originator agencies for European Union AI regulations in 2023 were the Council of the
European Union (13) and European Parliament (9) (Figure 7.4.8).14
Number of AI-related regulations in the European Union by institution and body, 2017–23
2017 2018 2019 2020 2021 2022 2023

Council of the European Union 1 3 4 7 25 16 13
DG Competition (EC) 0 0 3 0 3 2 2
DG Connect (EC) 0 0 1 0 1 0 3
DG DEFIS (EC) 0 0 0 0 0 0 1
DG ECFIN (EC) 0 0 0 0 2 0 0
DG Energy (EC) 0 0 0 0 1 0 0
DG FISMA (EC) 0 0 0 0 1 0 0
DG GROW (EC) 0 0 1 0 0 0 0
Institution and body
DG HOME (EC) 0 0 0 0 0 0 1
DG JUST (EC) 0 0 0 0 0 0 3
DG MOVE (EC) 0 0 0 0 0 0 1
DG SANTE (EC) 0 0 1 0 0 0 0
DG Trade (EC) 0 0 0 1 2 0 0
EU-Egypt Association Council 0 0 0 0 0 1 0
EU-Moldova Association Council 0 0 0 0 0 1 0
Euratom 0 0 0 2 0 0 0
EuroNest Parliamentary Assembly 1 0 1 0 1 0 0
European Commission 1 1 0 2 4 1 0
European Parliament 1 2 2 2 18 9 9
Eurostat (EC) 0 0 0 1 0 1 1
Joint Research Centre (EC) 0 0 0 0 1 0 0
Secretariat-General (EC) 0 0 0 0 2 0 0
Figure 7.4.8
14 Institutions abbreviated with DG are Directorates-General. These are departments with specific areas of ministerial responsibility.

By Approach
In recent years, AI-related regulation in the European Union has tended to take a more expansive approach (Figure
7.4.9). In 2023, there were eight regulations with a restrictive focus compared to 12 with an expansive one.
Number of AI-related regulations in the European Union by approach, 2017–23

21
Expansive Restrictive 20
20
8
15
15
10
15
12
5 10
4 4
2 2
2
1
2 2
0
2017 2018 2019 2020 2021 2022 2023
Figure 7.4.9

By Subject Matter
In 2023, the most common subject matters for and resilient electoral processes in the Union and
AI-related regulations in the European Union were enhancing the European nature and efficient conduct
science, technology, and communications (5); of the elections to the European Parliament. This
followed by government operations and politics (3) regulation acknowledged that AI could be used
(Figure 7.4.10). Regulations concerning government to generate political misinformation and outlined
operations and politics involve setting rules for steps the Commission has taken to ensure AI does
how governments and associated governmental not challenge the legitimacy of elections. Evidently,
processes operate. One such regulation was the European Union legislators are considering how AI
Commission Recommendation (EU) on inclusive will impact their government’s work.
Number of AI-related regulations in the European Union by primary subject matter, 2017–23
2017 2018 2019 2020 2021 2022 2023
Armed forces and national security 1 0 0 0 0 1 0
Arts, culture, religion 0 0 0 0 1 0 0
Civil rights and liberties, minority issues 0 0 0 0 1 0 1
Commerce 0 0 0 1 2 2 1
Crime and law enforcement 0 0 0 0 0 1 0
Economics and public �nance 0 0 0 0 2 0 2
Education 0 0 1 0 0 0 0
Energy 0 0 0 0 1 0 0
Finance and �nancial sector 0 0 1 0 0 0 0
Foreign trade and international �nance 0 0 1 0 3 0 0
Government operations and politics 0 0 0 2 5 2 3
Health 0 0 0 0 0 4 0
International a�airs 0 0 0 0 0 0 1
Science, technology, communications 0 1 2 0 6 2 5
Social welfare 0 0 0 0 0 0 1
Transportation and public works 0 0 0 0 0 0 1
Figure 7.4.10

Index Report 2024 7.5 U.S. Public Investment in AI
This section examines public AI investment in the United States based on data from the U.S. government and Govini,
a company that uses AI and machine learning technologies to track U.S. public and commercial spending.
7.5 U.S. Public Investment in AI

Federal Budget for AI R&D information on classified AI R&D investment.
Every year in December, the National Science and According to the 2023 report, in the fiscal year 2023,
Technology Council publishes a report on the public U.S. government agencies allocated a total of $1.8 billion
sector AI R&D budget across various departments to AI research and development spending (Figure 7.5.1).
and agencies that participate in the Networking and The funding for AI R&D has risen annually since FY 2018,
Information Technology Research and Development more than tripling since then. For FY 2024, a larger
(NITRD) Program and National Artificial Intelligence budget of $1.9 billion has been requested.15
Initiative. These reports, however, do not include
US federal NITRD budget for AI, FY 2018–24

Source: U.S. NITRD Program, 2023 | Chart: 2024 AI Index report
1.87
1.79
1.80 1.73
1.60 1.53
1.43
1.40
Budget (in billions of U.S. dollars)
1.20
1.11
1.00
0.80
0.60 0.56
0.40
0.20
0.00
FY18 (enacted) FY19 (enacted) FY20 (enacted) FY21 (enacted) FY22 (enacted) FY23 (enacted) FY24 (requested)
Figure 7.5.1
Figure 7.5.2 details the breakdown of NITRD AI R&D budget requests by agency. For FY 2024, the National
Science Foundation (NSF) had the highest request at $531 million, followed by the Defense Advanced Research
Projects Agency (DARPA) at $322.1 million, and the National Institutes of Health (NIH) at $284.5 million.
15 Previous editions of the NITRD report have included spending figures for past years that differ slightly from those reported in the most recent edition. The AI Index reports the spending
amounts documented in the latest NITRD reports.

Budget (in millions of U.S. dollars)
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
FY 2021 FY 2021 FY 2021 FY 2021 FY 2021
1.20
3.80
8.00
0.00
(enacted) (enacted) (enacted) (enacted) (enacted)
FY 2022 FY 2022 FY 2022 FY 2022 FY 2022
9.50
8.80
0.00
0.00
Table of Contents
DOI
NTIA
NARA
NIOSH
DARPA
FY 2023 FY 2023 FY 2023 FY 2023 FY 2023
8.40
0.50
0.00
13.70
412.00 429.80 400.50

Index Report 2024
FY 2024 FY 2024 FY 2024 FY 2024 FY 2024
0.10
8.40
0.00
(requested) (requested) (requested) (requested) (requested)
34.20
322.10
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
Source: U.S. NITRD Program | Chart: 2024 AI Index report
FY 2021 FY 2021 FY 2021 FY 2021 FY 2021
2.90
0.00
37.70
12.40
25.90
Chapter 7 Preview
FY 2022 FY 2022 FY 2022 FY 2022 FY 2022
6.40
0.40
2.90
41.40
26.00
DHS
DOT
NIST
NASA
TREAS
FY 2023 FY 2023 FY 2023 FY 2023 FY 2023
8.80
0.30
4.00
31.00
38.60
FY 2024 FY 2024 FY 2024 FY 2024 FY 2024
0.30
4.00
31.00
10.30
34.30
0
400
600
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
US governmental agency NITRD budgets for AI, FY 2021–24
FY 2021 FY 2021 FY 2021 FY 2021 FY 2021
1.20
(enacted) (enacted) (enacted) (enacted)
10.00
(enacted)
107.50
178.20
200 145.20
FY 2022 FY 2022 FY 2022 FY 2022 FY 2022
0.00
0.00
93.20
NIH
DOD
USDA
NOAA
ED-IES
FY 2023 FY 2023 FY 2023 FY 2023 FY 2023
0.00
0.00
95.20
227.50 241.30
FY 2024 FY 2024 FY 2024 FY 2024 FY 2024
0.00
0.00
104.20
274.00
268.80 288.20 284.50
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
0
200
400
600
FY 2021 FY 2021 FY 2021 FY 2021 FY 2021

9.10
0.00
46.40
395.90
FY 2022 FY 2022 FY 2022 FY 2022 FY 2022
5.80
15.00
32.40
506.50
VA
NIJ
NSF
FDA
DOE
FY 2023 FY 2023 FY 2023 FY 2023 FY 2023

8.00
21.00
39.80
418.40
FY 2024 FY 2024 FY 2024 FY 2024 FY 2024

8.00

35.80
20.00
531.30
180.80 164.40 169.90 169.90
404
Figure 7.5.2
7.5 U.S. Public Investment in AI
U.S. Department of Defense

Budget Requests development, test, and evaluation. According to its
2023 report, the DoD requested $1.8 billion in FY 2024,
Every year the DoD releases the amount of funding a significant increase from the $1.1 billion that was
they request for nonclassified AI-specific research, requested in FY 2023 (Figure 7.5.3).
US DoD budget request for AI-speci c research, development, test, and evaluation (RDT&E), FY 2020–24
Source: U.S. O ce of the Under Secretary of Defense (Comptroller), 2023 | Chart: 2024 AI Index report
1.80
1.80
1.60
Budget request (in billions of U.S. dollars)
1.40
1.20
1.10
1.00 0.93
0.87
0.84
0.80
0.60
0.40
0.20
0.00
Sum of FY20 funding Sum of FY21 funding Sum of FY22 funding Sum of FY23 funding Sum of FY24 funding
Figure 7.5.3

U.S. Government AI-Related volumes of federal contracts data, including prime

contracts, grants, and other transaction authority
Contract Spending (OTA) awards. The use of AI models enables Govini to
analyze data that is otherwise often inaccessible.
Public investment in AI can also be measured by
federal government spending on the contracts AI Contract Spending
awarded to private companies for goods and Figure 7.5.4 highlights total U.S. government spending
services. Such contracts typically occupy the largest on AI, subdivided by various AI segments. From
share of an agency’s budget. 2022 to 2023, total AI spending increased marginally
from $3.2 billion to $3.3 billion.16 Since 2018, total
Data in this section comes from Govini, which created
spending has increased nearly 2.4 times. In 2023,
a taxonomy of spending by the U.S. government on
the AI subsegments that saw the greatest amount of
critical technologies including AI. Govini applied
government spending included machine learning ($1.5
supervised machine learning and natural language
billion) and computer vision ($1.0 billion).
processing to parse, analyze, and categorize large
US government spending in AI/ML and autonomy by segment, FY 2018–23

Source: Govini, 2023 | Chart: 2024 AI Index report
Machine learning Computer vision Autonomy Natural language processing
3.50
3.33
3.19
U.S. government spending (in billions of U.S. dollars)
0.23
3.00
0.55
2.59
2.50 2.47 0.21
0.23 1.24
2.00 1.04
1.90
0.81 1.01
1.50 1.38 0.54

0.89
0.33 0.55
0.6
1.00 0.53
1.51
0.54
0.50
0.88 0.77 0.88
0.69
0.42
0.00
2018 2019 2020 2021 2022 2023
Figure 7.5.4
16 In 2023, Govini made minor adjustments to their classification methodology. Consequently, the contract totals presented in Figure 7.5.4 may vary slightly from those reported in earlier
editions of the AI Index.

Figure 7.5.5 shows U.S. government spending by AI segment in FY 2022 and FY 2023. Spending significantly
increased for machine learning. Computer vision and natural language processing spending also rose, albeit less
prominently.
US government spending in AI/ML and autonomy by segment, FY 2022 vs. 2023

1.51 (+72%)
Machine learning
0.88
1.04 (+17%)
Computer vision
0.89
0.55 (-56%)
Autonomy
1.24
0.23 (+28%)
Natural language
processing 2023
0.18
2022
0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60

Figure 7.5.5

In FY 2023, the majority of federal AI contracts were prime contracts (50.6%), followed by grants (47.6%) (Figure
7.5.6). In the last year, the share of contracts has declined, while the share of grants has increased.
Total value of contracts, grants, and OTAs awarded by the US government for AI/ML and autonomy,
FY 2018–23
2.00
Total value awarded (in billions of U.S. dollars)
1.68, Contracts
1.58, Grants
1.50
1.00
0.50
0.06, OTAs
0.00
2018 2019 2020 2021 2022 2023
Figure 7.5.6

Microelectronics and Semiconductor FIgure 7.5.7 visualizes U.S. government spending

Spending on microelectronics by segment. Total spending on
Govini also monitors U.S. government microelectronics has grown significantly in the last
microelectronics spending, which is becoming year, increasing to $3.9 billion from $2.5 billion in
increasingly vital due to the crucial role that 2022. The large majority of American government
semiconductors, like GPUs, have played in powering microelectronic spending is allocated as contracts
recent AI technical improvements. The way (Figure 7.5.8).
governments allocate funds for semiconductors is
poised to increase in geopolitical significance.
US government spending in microelectronics by segment, FY 2018–23

4.00 Memory and processing Semiconductor 3.89
0.48
3.50
3.00
2.53
2.50
0.24
2.10
2.00 0.12
1.70 1.66 3.41

0.13 0.09
1.49
1.50 0.09
2.29
1.00 1.98
1.57 1.57
1.4
0.50
0.00
2018 2019 2020 2021 2022 2023
Figure 7.5.7

Total value of contracts, grants, and OTAs awarded by the US government for microelectronics,
FY 2018–23
3.50
3.33, Contracts
3.00
Total value awarded (in billions of U.S. dollars)
2.50
2.00
1.50
1.00
0.50 0.55, Grants
0.00 0.01, OTAs

2018 2019 2020 2021 2022 2023
Figure 7.5.8

Index Report 2024
Diversity
CHAPTER 8:
Index Report 2024 Diversity
Preview
Overview 413
8.1 AI Postsecondary Education 415

North America 415
CS Bachelor’s Graduates 415
CS Master’s Graduates 417
CS PhD Graduates 419
Disability Status of CS, CE, and Information Students 421
CS, CE, and Information Faculty 422
Europe 425
Informatics, CS, CE, and IT Bachelor’s Graduates 425
Informatics, CS, CE, and IT Master’s Graduates 425
Informatics, CS, CE, and IT PhD Graduates 425
8.2 AI Conferences 429

Women in Machine Learning (WiML) NeurIPS Workshop 429
Workshop Participants 429
Demographic Breakdown 430
8.3 K–12 Education 432

AP Computer Science: Gender 432
AP Computer Science: Ethnicity 433

CHAPTER 8:
Overview
The demographics of AI developers often differ from those of users. For instance, a
considerable number of prominent AI companies and the datasets utilized for model training
originate from Western nations, thereby reflecting Western perspectives. The lack of diversity
can perpetuate or even exacerbate societal inequalities and biases.
This chapter delves into diversity trends in AI. The chapter begins by drawing on data from
the Computing Research Association (CRA) to provide insights into the state of diversity in
American and Canadian computer science (CS) departments. A notable addition to this year’s
analysis is data sourced from Informatics Europe, which sheds light on diversity trends within
European CS education. Next, the chapter examines participation rates at the Women in
Machine Learning (WiML) workshop held annually at NeurIPS. Finally, the chapter analyzes
data from Code.org, offering insights into the current state of diversity in secondary CS
education across the United States.
The AI Index is dedicated to enhancing the coverage of data shared in this chapter.
Demographic data regarding AI trends, particularly in areas such as sexual orientation, remains
scarce. The AI Index urges other stakeholders in the AI domain to intensify their endeavors to
track diversity trends associated with AI and hopes to comprehensively cover such trends in
future reports.

CHAPTER 8:
Chapter Highlights
1. U.S. and Canadian bachelor’s, master’s, and PhD CS students continue to grow more
ethnically diverse. While white students continue to be the most represented ethnicity among new resident
graduates at all three levels, the representation from other ethnic groups, such as Asian, Hispanic, and Black or
African American students, continues to grow. For instance, since 2011, the proportion of Asian CS bachelor’s
degree graduates has increased by 19.8 percentage points, and the proportion of Hispanic CS bachelor’s degree
graduates has grown by 5.2 percentage points.
2. Substantial gender gaps persist in European informatics, CS, CE, and IT graduates at
all educational levels. Every surveyed European country reported more male than female graduates in
bachelor’s, master’s, and PhD programs for informatics, CS, CE, and IT. While the gender gaps have narrowed in
most countries over the last decade, the rate of this narrowing has been slow.
3. U.S. K–12 CS education is growing more diverse, reflecting changes in both gender and
ethnic representation. The proportion of AP CS exams taken by female students rose from 16.8% in 2007 to
30.5% in 2022. Similarly, the participation of Asian, Hispanic/Latino/Latina, and Black/African American students
in AP CS has consistently increased year over year.

Artificial Intelligence Chapter 8: Diversity
Index Report 2024 8.1 AI Postsecondary Education
This section examines trends in diversity within CS and AI postsecondary education across North America and Europe.
8.1 AI Postsecondary Education

North America CS Bachelor’s Graduates
The percentage of female CS bachelor’s graduates
Data on American and Canadian postsecondary CS reached 22.2% in 2022, continuing a decade-long
and AI postsecondary education comes from the rise (Figure 8.1.1). Nonbinary/other-identifying CS
Computing Research Association’s (CRA) annual bachelor’s graduates accounted for 0.1% in 2022.
Taulbee Survey.1 2
Gender of new CS bachelor’s graduates (% of total) in the United States and Canada, 2010–22
90%
80%
77.70%, Male
70%
New CS bachelor’s graduates (% of total)
60%
50%
40%
30%
22.20%, Female
20%
10%
0% 0.10%, Nonbinary/Other
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.1
Over the past decade, the number of CS bachelor’s graduates of all ethnicities has grown, notably 4.7 times
for Hispanics and 2.5 times for African Americans (Figure 8.1.2). As a proportion of ethnicities among all CS
bachelor’s graduates, Asians have risen the fastest, doubling in the last 10 years (Figure 8.1.3).
1 The charts in this section look only at the ethnicity of domestic or native CS students and faculty. Although the CRA reports data on the proportion of nonresident aliens at each educational
level (i.e., bachelor’s, master’s, PhD, and faculty), data on the ethnicity of nonresident aliens is not included.
2 Not all PhD-granting departments targeted in the survey provided responses. Of the 297 departments targeted, only 182 responded, resulting in an overall response rate of 61%. The AI Index
advises against making per capita comparisons between the CRA North American data and the data on European CS graduates detailed in the subsequent sections due to the European data
being collected from national statistical offices, which affords it broader coverage.

Ethnicity of new resident CS bachelor’s graduates in the United States and Canada, 2011–22
10,970, White
10,000
8,795, Asian
Number of new CS bachelor’s graduates
8,000
6,000
4,000
2,708, Hispanic (any race)

2,000
1,072, Multiracial (not Hispanic)
1,004, Black or African American
33, American Indian or Alaska Native
0 28, Native Hawaiian or Paci c Islander
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.2
Ethnicity of new resident CS bachelor’s graduates (% of total) in the United States and Canada, 2011–22
70%
60%
New CS bachelor’s graduates (% of total)
50%
44.58%, White
40%
35.74%, Asian
30%
20%
11.00%, Hispanic (any race)

10%
4.36%, Multiracial (not Hispanic)
4.08%, Black or African American
0.13%, American Indian or Alaska Native
0% 0.11%, Native Hawaiian or Paci c Islander
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.3

CS Master’s Graduates
The proportion of female CS master’s graduates has seen minimal growth in the last decade, increasing
to 26.3% in 2022 from 24.6% in 2011. Additionally, in 2022, 0.08% of CS master’s graduates identified as
nonbinary/other (Figure 8.1.4).
Gender of new CS master’s graduates (% of total) in the United States and Canada, 2011–22
80%
73.65%, Male
70%
New CS master’s graduates (% of total)
60%
50%
40%
30%
26.26%, Female
20%
10%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.4

Among North American students, the most represented ethnicities are white (47.9%), Asian (35.8%), and
Hispanic (8.2%) (Figure 8.1.5 and Figure 8.1.6). Similar to CS bachelor’s graduates, the pool of CS master’s
graduates has become increasingly ethnically diverse over the last decade.
Ethnicity of new resident CS master’s graduates in the United States and Canada, 2011–22
3,000 3,050, White
2,500
Number of new CS master’s graduates
2,278, Asian
2,000
1,500
1,000
500 522, Hispanic (any race)

269, Black or African American
222, Multiracial (not Hispanic)
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.5
Ethnicity of new resident CS master’s graduates (% of total) in the United States and Canada, 2011–22
70%
60%
New CS master’s graduates (% of total)
50%
47.87%, White
40%
35.76%, Asian
30%
20%
10% 8.19%, Hispanic (any race)

2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.6

CS PhD Graduates
In 2022, the percentage of female PhD graduates in CS slightly decreased to 22.1% (Figure 8.1.7), but the long-
term trend is unchanged.
Gender of new CS PhD graduates (% of total) in the United States and Canada, 2010–22
80%
77.78%, Male
70%
New CS PhD graduates (% of total)
60%
50%
40%
30%
22.10%, Female
20%
10%
2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.7

From 2011 to 2022, the diversity among CS PhD graduates significantly increased (Figure 8.1.8 and Figure
8.1.9). In 2022, 41.1% of CS PhD graduates were Asian, Black, Hispanic, multiracial, American Indian, or Native
Hawaiian, marking a considerable rise from 2011.
Ethnicity of new resident CS PhD graduates in the United States and Canada, 2011–22
500
400
Number of new CS PhD graduates
327, White
300
200
164, Asian
100
28, Black or African American
26, Hispanic (any race)
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.8
Ethnicity of new resident CS PhD graduates (% of total) in the United States and Canada, 2011–22
70%
60% 58.92%, White

New CS PhD graduates (% of total)
50%
40%
30% 29.55%, Asian
20%

10%
4.68%, Hispanic (any race)
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.9

Disability Status of CS, CE, and Information relatively low: 4.1% of bachelor’s, 1.5% of master’s,
Students and 1.1% of PhD students indicated a need for
For the second consecutive year, the CRA requested accommodations (Figure 8.1.10). Year over year,
departments to report the number of students at each the proportion of students requesting disability
degree level who received disability accommodations accommodations has remained consistent.
over the preceding year. The reported numbers were
CS, CE, and information students (% of total) with disability accomodations in United States and Canada,
2021 vs. 2022
2022 2021
4.10% 4.10%
4%
CS, CE, and information students (% of total)
3%
2%
1.50%
1.10%
1.00%
1%
0.80%
0%
PhD Master’s Bachelor’s
Figure 8.1.10

CS, CE, and Information Faculty male (75.6%), with women comprising 24.3% and
Data regarding the ethnicity and gender of faculty nonbinary individuals accounting for 0.1% (Figure
in CS, CE, and information fields highlight diversity 8.1.11). Although the proportion of female faculty in
trends in academic AI and CS. As of 2022, a majority these fields has risen since 2011, the increase has
of faculty members in CS, CE, and information are been small.
Gender of CS, CE, and information faculty (% of total) in the United States and Canada, 2011–22
80%
75.60%, Male
70%
CS, CE, and information faculty (% of total)
60%
50%
40%
30%
24.27%, Female
20%
10%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.11

While the majority of new faculty hires in CS, CE, and information at American and Canadian universities
remain male (71.7%), the proportion of women reached 28.0% in 2022 (Figure 8.1.12), well above the proportion
of new female PhDs.
Gender of new CS, CE, and information faculty hires (% of total) in the United States and Canada, 2011–22
80%
New CS, CE, and information faculty hires (% of total)
71.67%, Male
70%
60%
50%
40%
30%
28.00%, Female
20%
10%
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.1.12

As of 2022, the majority of resident faculty in CS, CE, and information were white (57.3%), with Asian faculty
following at 30.1% (Figure 8.2.13 and Figure 8.1.14). The ethnic diversity gap is gradually closing: In 2011, the difference
between white faculty and the next largest ethnic group was 46.1%, but by 2021, it had narrowed to 27.2%.
Ethnicity of resident CS, CE, and information faculty in the United States and Canada, 2011–22
4,000 3,997, White
3,500
Number of CS, CE, and information faculty
3,000
2,500
2,100, Asian
2,000
1,500
1,000
406, Unknown
207, Hispanic (any race)
500 176, Black or African American
29, Native Hawaiian or Paci c Islander
0 17, American Indian or Alaska Native
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.13
Ethnicity of resident CS, CE, and information faculty (% of total) in the United States and Canada, 2011–22
70%
60%
57.27%, White
CS, CE, and information faculty (% of total)
50%
40%
30% 30.09%, Asian
20%
5.82%, Unknown
10% 2.97%, Hispanic (any race)
0.42%, Native Hawaiian or Paci c Islander
0% 0.24%, American Indian or Alaska Native
2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 Figure 8.1.14

Europe
Data on diversity trends about European CS graduates In all surveyed European countries, informatics,
comes from Informatics Europe.3 CS, CE, and IT PhD graduates are predominantly
male. However, in nations such as the United
Informatics, CS, CE, and IT Bachelor’s
Kingdom, Germany, and Switzerland, the gender
Graduates
gap has narrowed over the last decade, with women
In the majority of surveyed European nations, there
constituting a growing share of PhD graduates (Figure
is a persistent gender disparity among bachelor’s-
8.1.17).4 In contrast, countries like Finland and Spain
level graduates in informatics, computer science,
have seen the gap slightly widen.
computer engineering, and information technology.
Despite some narrowing since 2011, men continue to
dominate. For example, France (14.8%), the United
Kingdom (17.8%), and Germany (21.5%) show relatively
low proportions of female graduates in these fields
(Figure 8.1.15). Bulgaria stands out among the surveyed
countries with the highest proportion of female
graduates (35.2%).
Informatics, CS, CE, and IT Master’s Graduates

Similar gender disparities are observed among
European informatics, CS, CE, and IT master’s
graduates, with a significantly greater proportion of
males than females in most surveyed countries. As of
2022, Estonia (42.0%), Romania (41.9%), and Bulgaria
(40.4%) reported the greatest proportion of female
master’s graduates (Figure 8.1.16). In contrast, Belgium
(13.7%), Italy (14.1%), and Switzerland (15.8%) reported
the smallest proportion of female master’s graduates.
Informatics, CS, CE, and IT PhD Graduates
3 The year label refers to the year in which an academic year ends. For example, the figures visualizing new graduates for 2022 reflect the number of graduates reported for the 2021/2022
academic year. For the sake of visual simplicity, the Index opts to focus on the year in which students graduated.
4 In countries where the number of PhD graduates is relatively small, trends in gender proportions can be prone to sudden year-over-year changes. For example, in 2022 Bulgaria produced 24
total PhDs, Latvia 12, and Estonia 26.

Gender of new informatics, CS, CE, and IT bachelor’s graduates (% of total) in Europe, 2011–22
Austria Belgium Bulgaria Czech Republic

100% 100% 100% 100%
92.07%
80.26% 83.78%
64.77%
50% 50% 50% 50%
35.23%
19.74% 16.22%
7.93%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Estonia Finland France Germany

100% 100% 100% 100%
85.25%
76.00% 75.35% 78.51%
New informatics, CS, CE, and IT bachelor’s graduates (% of total)
50% 50% 50% 50%
24.00% 24.65% 21.49%

14.75%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Ireland Italy Latvia Lithuania

100% 100% 100% 100%
83.71% 85.95% 87.02%
78.97%
50% 50% 50% 50%
16.29% 21.03%
14.05% 12.98%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Netherlands Norway Poland Portugal

100% 100% 100% 100%
85.53% 85.04%
78.42% 78.63%
50% 50% 50% 50%
21.58% 21.37%
14.47% 14.96%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Romania Spain Switzerland Turkey

100% 100% 100% 100%
85.68% 86.81%
66.60% 70.73%
50% 50% 50% 50%
33.40% 29.27%
14.32% 13.19%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
United Kingdom
100%
82.21%
50%
Male
17.79% Female
0%
2013 2016 2019 2022
Figure 8.1.15

Gender of new informatics, CS, CE, and IT master’s graduates (% of total) in Europe, 2011–22
Austria Belgium Bulgaria Czech Republic

100% 100% 100% 100%
86.33%
78.98%
73.32%
59.57%
50% 50% 50% 50%
40.43%
26.68%
21.02%
13.67%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Estonia Finland France Germany

100% 100% 100% 100%
72.43% 76.86% 76.49%

58.02%
50% 50% 50% 50%
New informatics, CS, CE, and IT master’s graduates (% of total)
41.98%
27.57% 23.14% 23.51%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Ireland Italy Latvia Lithuania

100% 100% 100% 100%
85.91% 81.51%
74.60%
68.83%
50% 50% 50% 50%
31.17% 25.40%
14.09% 18.49%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Netherlands Norway Poland Portugal

100% 100% 100% 100%
76.50% 81.41% 81.24%
71.43%
50% 50% 50% 50%
23.50% 28.57%
18.59% 18.76%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Romania Spain Switzerland Turkey

100% 100% 100% 100%
84.19%
78.06%
65.30%
58.09%
50% 50% 50% 50%
41.91%
34.70%
21.94% 15.81%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
United Kingdom
100%
68.67%
50%
31.33% Male
Female
0%
2013 2016 2019 2022
Figure 8.1.16

Gender of new informatics, CS, CE, and IT PhD graduates (% of total) in Europe, 2011–22
Male Female
Austria Bulgaria Czech Republic Estonia

100% 100% 100% 100%
86.46% 81.33%
75.00%
61.54%
50% 50% 50% 50%
New informatics, CS, CE, and IT PhD graduates (% of total)
38.46%
25.00%
13.54% 18.67%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Finland France Germany Ireland

100% 100% 100% 100%
84.21% 82.31%
76.94% 76.92%
50% 50% 50% 50%
23.06% 17.69% 23.08%

15.79%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Latvia Netherlands Portugal Romania

100% 100% 100% 100%
85.83% 86.11%
66.67% 68.09%
50% 50% 50% 50%
33.33% 31.91%
14.17% 13.89%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Spain Switzerland Turkey United Kingdom

100% 100% 100% 100%
79.75% 82.76%
74.83% 70.75%
50% 50% 50% 50%
25.17% 29.25%
20.25% 17.24%
0% 0% 0% 0%
2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022 2013 2016 2019 2022
Figure 8.1.17

8.2 AI Conferences
Women in Machine Learning Attendance at NeurIPS Women in Machine Learning
workshop, 2010–23
Source: Women in Machine Learning, 2023 | Chart: 2024 AI Index report
(WiML) NeurIPS Workshop
1,400
Women in Machine Learning (WiML), founded in
2006, is an organization dedicated to supporting 1,200
and increasing the impact of women in machine
Number of attendees
1,000
learning. This section of the AI Index presents data
800
from the WiML annual technical workshop, hosted 714
at NeurIPS. 600
400
Workshop Participants
Despite a decline in participation over the last 200
two years, the 2023 NeurIPS WiML workshop

0
attendance of 714 was nearly eight times higher
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
than the attendance of 89 in 2010 (Figure 8.2.1).
The recent drop in WiML workshop attendance Figure 8.2.1
may be linked to the overall decrease in NeurIPS

Attendance at NeurIPS Women in Machine Learning
attendance, which could be attributed to the shift workshop (% of total), 2010–23
away from a purely virtual format.5 As a share of total
conference attendance, the 2023 WiML workshop
10%
represented 4.4% of attendees (Figure 8.2.2).
Attendees at NeurIPS WiML (% of total)
8%
6%
4.36%
4%
2%
0%
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
Figure 8.2.2
5 Figure 8.1.1 accounts for total attendance, which in some conference years comprised both in-person and virtual attendance.

Demographic Breakdown where they live. Among respondents, 56.4% hailed

The data in the subsequent figures is derived from from North America, followed by Europe (21.8%), Asia
a survey completed by participants who agreed to (11.4%), and Africa (8.9%) (Figure 8.2.3). At this year’s
aggregate their information. One component of the workshop, there was a greater proportion of North
WiML survey asked attendees at the WiML workshop American attendees than in 2022.
Continent of residence of participants at NeurIPS Women in Machine Learning workshop, 2022 vs. 2023
North 56.43%
America 41.50%
21.79%
Europe
34.20%
11.43%
Asia
17.10%
8.93%
Africa
3.40%
Australia/ 1.07%
Oceania 1.40%
South 0.36%
America 1.60%
2023
0.00% 2022
Antarctica
0.20%
0% 10% 20% 30% 40% 50% 60%

% of respondents
Figure 8.2.3

The majority of participants at the 2022 WiML workshop were female-identifying (84.2%), another 10.0% were
male-identifying, and 3.2% were nonbinary-identifying (Figure 8.2.4).
Gender breakdown of participants at NeurIPS Women in Machine Learning workshop, 2022 vs. 2023
84.23%
Female
37.00%
10.04%
Male
25.80%
Nonbinary/ 3.23%
Genderqueer/
Third gender 0.50%
1.43%
Prefer not to say
36.30%
0.72%
Questioning
Gender uid/ 0.36% 2023

Gender 2022
nonconforming 0.40%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

% of respondents
Figure 8.2.4

Index Report 2024 8.3 K–12 Education
This section uses data from Code.org, a U.S. nonprofit dedicated to advancing CS education in K–12 schools across the
country, to paint a picture of how AI diversity trends are reflected at the K–12 level.
8.3 K–12 Education

AP Computer Science: Gender
In 2022, male students accounted for 68.9% of AP CS 8.3.1).6 While male students continue to dominate
exam-takers, female students 30.5%, and students AP CS exam participation, the proportion of female
identifying as neither male nor female 0.7% (Figure students has nearly doubled over the past decade.
AP computer science exams taken (% of total) by gender, 2007–22

80%
70%
AP computer science exams taken (% of total)
68.87%, Male
60%
50%
40%
30% 30.46%, Female
20%
10%
0% 0.67%, Other
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.3.1
6 There are two types of AP CS exams: Computer Science A and Computer Science Principles. Data on computer science exams taken includes both exams. AP CS Principles was initially
offered in 2017.

On a percentage basis, the AP computer science exams taken by female students (% of total),
states with the highest number
2022
of female AP CS test-takers AK ME
26% 22%
in 2022 were Mississippi
(41%), Alabama (37%), and VT NH MA
23% 26% 31%
Washington, D.C. (37%) (Figure
8.3.2). California, Texas, and 31% 33% 18% 20% 24% 22% 29% 36% 29% 23%
Washington, states known for OR ID WY NE IA IL IN OH PA NJ
26% 27% 23% 25% 18% 33% 24% 27% 28% 31%
significant CS and AI activity,
also saw notable participation, 31% 32% 22% 25% 16% 22% 28% 30% 37% 33% 23%
with approximately 30% of AP AZ NM OK AR TN VA NC
25% 25% 26% 30% 30% 29% 30%
CS exam-takers being female.
TX LA MS AL GA SC
31% 34% 41% 37% 29% 33%
HI FL
30% 31%
Figure 8.3.2
AP Computer Science: the largest group, the participation of Asian, Hispanic/
Ethnicity Latino/Latina, and Black/African American students

in AP CS exams has grown over time (Figure 8.3.3). In
Code.org’s data highlights the evolving ethnic 2022, white students constituted the largest share of
diversity among AP CS test-takers. Similar to trends exam-takers (38.2%), followed by Asian (27.8%) and
in postsecondary CS, the ethnic diversity of AP CS Hispanic/Latino/Latina students (17.6%) (Figure 8.3.3
test-takers is increasing. While white students remain and Figure 8.3.4).

AP computer science exams taken by race/ethnicity, 2007–22

80,000
77,070, White
70,000
Number of AP computer science exams taken
60,000
56,098, Asian
50,000
40,000
35,528, Hispanic/Latino/Latina
30,000
20,000
13,577, Black/African American

10,000 9,293, Two or more races
1,285, Native American/Alaskan
295, Native Hawaiian/Paci c Islander
0 0, Other
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.3.3
AP computer science exams taken (% of total responding students) by race/ethnicity, 2007–22

AP computer science exams taken (% of total responding students)
60%
50%
40%
38.23%, White
30%
27.82%, Asian
20%
17.62%, Hispanic/Latino/Latina
10% 6.73%, Black/African American

4.61%, Two or more races
0.64%, Native American/Alaskan
0.15%, Native Hawaiian/Paci c Islander
0% 0.00%, Other
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
Figure 8.3.4

Index Report 2024
Public
Opinion
CHAPTER 9:
Index Report 2024 Public Opinion
Preview
Overview 437
9.1 Survey Data 439

Global Public Opinion 439
AI Products and Services 439
AI and Jobs 444
AI and Livelihood 446
Attitudes on ChatGPT 448
AI Concerns 451
US Public Opinion 452
9.2 Social Media Data 454

Dominant Models 454
Highlight: AI-Related Social Media
Discussion in 2023 456

CHAPTER 9:
Overview
As AI becomes increasingly ubiquitous, it is important to understand how public
perceptions regarding the technology evolve. Understanding this public opinion is vital in
better anticipating AI’s societal impacts and how the integration of the technology may
differ across countries and demographic groups.
This chapter examines public opinion on AI through global, national, demographic, and
ethnic perspectives. It draws upon several data sources: longitudinal survey data from Ipsos
profiling global AI attitudes over time, survey data from the University of Toronto exploring
public perception of ChatGPT, and data from Pew examining American attitudes regarding
AI. The chapter concludes by analyzing mentions of significant AI models on Twitter, using
data from Quid.
437
CHAPTER 9:
Chapter Highlights
A survey from Ipsos shows that, over the last year, the proportion of those who think AI will dramatically affect
their lives in the next three to five years has increased from 60% to 66%. Moreover, 52% express nervousness
toward AI products and services, marking a 13 percentage point rise from 2022. In America, Pew data suggests
that 52% of Americans report feeling more concerned than excited about AI, rising from 38% in 2022.
2. AI sentiment in Western nations continues to be low, but is slowly improving. In 2022,

several developed Western nations, including Germany, the Netherlands, Australia, Belgium, Canada, and
the United States, were among the least positive about AI products and services. Since then, each of these
countries has seen a rise in the proportion of respondents acknowledging the benefits of AI, with the Netherlands
experiencing the most significant shift.
3. The public is pessimistic about AI’s economic impact. In an Ipsos survey, only 37% of
respondents feel AI will improve their job. Only 34% anticipate AI will boost the economy, and 32% believe it will
enhance the job market.
4. Demographic differences emerge regarding AI optimism. Significant demographic

differences exist in perceptions of AI’s potential to enhance livelihoods, with younger generations generally
more optimistic. For instance, 59% of Gen Z respondents believe AI will improve entertainment options,
versus only 40% of baby boomers. Additionally, individuals with higher incomes and education levels are more
optimistic about AI’s positive impacts on entertainment, health, and the economy than their lower-income and
less-educated counterparts.
5. ChatGPT is widely known and widely used. An international survey from the University of Toronto
suggests that 63% of respondents are aware of ChatGPT. Of those aware, around half report using ChatGPT at
least once weekly.
438
Artificial Intelligence Chapter 9: Public Opinion
Index Report 2024 9.1 Survey Data
9.1 Survey Data

Global Public Opinion near future, while 54% believe AI’s benefits surpass
its drawbacks. About half of the respondents trust AI
This section explores global differences in AI opinions companies’ data-protection capabilities.
through surveys conducted by Ipsos in 2022 and 2023.
The figure also contrasts Ipsos survey responses
These surveys reveal that public perceptions of AI vary
from 2022 and 2023, highlighting a shift in public
widely across countries and demographic groups.
AI sentiment following the release of ChatGPT—a
AI Products and Services milestone in public AI recognition. Over the last
In 2023, Ipsos ran a survey on global attitudes toward AI year, there has been a noticeable 6 percentage point
products and services. The survey consisted of interviews increase in those who think AI will dramatically affect
with 22,816 adults ages 16 to 74 in 31 countries.1 their lives in the next three to five years. Moreover,
52% now express nervousness toward AI products and
Figure 9.1.1 shows the percentage of respondents
services, marking a 13 percentage point rise from 2022.
who agree with specific statements. A significant
The public across the globe is becoming increasingly
66% anticipate AI will greatly change their lives in the
cognizant of and nervous about AI’s growing impact.
Global opinions on products and services using AI (% of total), 2022 vs. 2023
Source: Ipsos, 2022–23 | Chart: 2024 AI Index report
I have a good understanding of what 67%

arti cial intelligence is 64%
Products and services using arti cial
66%
intelligence will profoundly change
my daily life in the next 3–5 years
60%
54%
intelligence have more bene ts than
drawbacks
52%
I know which types of products and 51%

services use arti cial intelligence 50%
I trust companies that use arti cial
52%
intelligence as much as I trust other
companies
50%
49%
intelligence have profoundly changed
my daily life in the past 3–5 years
49%
Products and services using arti cial 52%

intelligence make me nervous 39%
I trust arti cial intelligence to not
56%
discriminate or show bias towards
any group of people 2023
I trust that companies that use
50% 2022
arti cial intelligence will protect
my personal data
0% 10% 20% 30% 40% 50% 60% 70%
% of respondents that “Agree”
Figure 9.1.1
1 See Appendix for more details about the survey methodology. The survey was conducted from May to June 2023.

Perceptions of AI’s benefits versus drawbacks vary developed Western nations, including Germany,
considerably by country, according to the Ipsos survey. the Netherlands, Australia, Belgium, Canada, and
78% of Indonesian, 74% of Thai, and 73% of Mexican the United States, were among the least positive
respondents view AI products and services as more about AI products and services. Since then, each of
beneficial than harmful (Figure 9.1.2). In contrast, only these countries has seen a rise in the proportion of
37% of Americans agree with this perspective. Among respondents acknowledging the benefits of AI, with
the 31 countries surveyed, the United States and France the Netherlands experiencing the most significant
exhibited the most skepticism. shift. By 2023, 43% of Dutch respondents viewed AI
products and services positively, up from 33% the
Attitudes toward AI are becoming more positive in
previous year.
countries that were previously critical. In 2022, several

‘Products and services using AI have more bene ts than drawbacks,’ by country (% of total), 2022 vs. 2023
‘Products
Source: Ipsos, and 2024
2022–23 | Chart: services
AI Indexusing
report AI have more bene ts than drawbacks,’ by country (% of total), 2022 vs. 2023
78%
Indonesia 78%
Indonesia
74% 74%
Thailand Thailand
73% 73%
Mexico Mexico 65%
65%
69%
Malaysia 69%
65%
Malaysia
65%
67%
Peru
70% 67%
Peru
67% 70%
Turkey
60%
67%
Turkey 66%
South Korea 60%
62%
65% 66%
South Korea India 62%71%
65% 65%
IndiaColombia 64% 71%
64%
Singapore 65%
Colombia
64%
64%
Brazil
57% 64%
Singapore
61%
Romania
64%
Brazil 59% 57%
South Africa
57%
61%
Romania Chile
59%
63%
57% 59%
South AfricaArgentina 55% 57%
55%
Italy
50% 59%
Chile
63%
52%
Japan
42% 57%
Argentina
50%
55%
Spain
53%
55%
Italy
Hungary
48% 50%
49%
52%
Japan Poland 42%47%
48%
46% 50%
Spain
Great Britain
38% 53%
44% 48%
New Zealand
Hungary
49%
43%
Netherlands
33% 47%
Poland
48%
42%
Germany
37%
46%
Great Britain
38%
40%
Ireland
44%
New Zealand 40%
Australia
37%
39% 43%
Netherlands Sweden
33% 40%
39% 42%
Germany Belgium 38%
37%
38%
Canada 40% 2023
Ireland 32%
2022
37%
United States
35% 40%
Australia
37%
37%
France
31%
39%
Sweden 0% 10% 20% 30% 40% 40% 50% 60% 70% 80%
% of respondents that “Agree” Figure 9.1.2
39%
Belgium
38%
38%
Canada 2023
Table of Contents Chapter 9 Preview 32% 441
2022
37%
United States
35%
37%
Figure 9.1.3 shows responses to Ipsos’ survey on Conversely, Japanese respondents show the least
AI products and services by country. Indonesian understanding of AI (43%) and also report the lowest
respondents are notably optimistic: 84% claim a solid level of nervousness about AI (23%). Meanwhile,
understanding of AI, 79% believe AI will significantly Thai respondents exhibit the highest trust in AI’s
change their lives in the next three to five years, and 75% impartiality, believing it will not discriminate or show
express excitement about AI products and services. bias toward any group.
Opinions about AI by country (% agreeing with statement), 2023

Source: Ipsos, 2023 | Chart: 2024 AI Index report
I have a good understanding of what

arti cial intelligence is
67% 59% 56% 74% 59% 70% 73% 58% 61% 64% 73% 64% 84% 58% 53% 43% 65% 75% 71% 62% 73% 69% 77% 67% 78% 76% 66% 67% 78% 73% 67%

intelligence will profoundly change 64% 62% 52% 70% 54% 71% 67% 51% 56% 58% 62% 65% 79% 53% 63% 65% 78% 71% 63% 61% 76% 59% 73% 78% 70% 82% 61% 55% 79% 81% 57%
I trust arti cial intelligence to not
discriminate or show bias towards 61% 44% 41% 66% 44% 60% 61% 37% 47% 47% 66% 66% 76% 41% 59% 43% 68% 72% 42% 45% 72% 59% 63% 63% 65% 55% 51% 33% 83% 63% 38%
any group of people
intelligence have more bene ts than 57% 40% 39% 64% 38% 59% 65% 37% 42% 46% 48% 65% 78% 40% 55% 52% 69% 73% 43% 44% 67% 47% 61% 64% 59% 66% 50% 39% 74% 67% 37%
drawbacks

intelligence make me excited
46% 40% 35% 66% 37% 51% 62% 36% 43% 42% 45% 66% 75% 38% 50% 51% 74% 74% 42% 43% 72% 50% 62% 65% 59% 76% 50% 32% 80% 74% 36%

intelligence as much as I trust other 52% 42% 39% 60% 39% 51% 56% 37% 45% 45% 46% 67% 69% 39% 53% 44% 70% 66% 44% 43% 60% 50% 62% 57% 55% 55% 49% 42% 73% 65% 36%
companies

intelligence make me nervous
46% 69% 50% 51% 63% 54% 45% 52% 46% 65% 46% 58% 48% 62% 50% 23% 55% 48% 50% 63% 47% 38% 50% 53% 53% 44% 51% 53% 57% 54% 63%
I know which types of products and

services use arti cial intelligence
44% 38% 35% 62% 38% 58% 53% 37% 39% 43% 37% 62% 76% 36% 50% 38% 68% 61% 42% 35% 65% 46% 62% 57% 60% 68% 46% 36% 73% 71% 35%
I trust that companies that use

arti cial intelligence will protect 45% 38% 41% 56% 34% 44% 53% 32% 44% 42% 60% 64% 68% 38% 59% 32% 61% 66% 47% 42% 60% 55% 61% 54% 58% 40% 49% 39% 72% 57% 32%
my personal data
intelligence have profoundly changed 45% 40% 31% 63% 34% 54% 54% 32% 33% 34% 36% 63% 72% 31% 44% 36% 71% 66% 36% 35% 65% 46% 54% 64% 57% 73% 41% 31% 72% 63% 34%
Argentina
Australia
Belgium
Brazil
Canada
Chile
Colombia
France
Germany
Great Britain
Hungary
India
Indonesia
Ireland
Italy
Japan
Malaysia
Mexico
Netherlands
New Zealand
Peru
Poland
Romania
Singapore
South Africa
South Korea
Spain
Thailand
Turkey
United States
Sweden
Figure 9.1.3

A large majority of the countries surveyed by Ipsos in (24 percentage points), France (19), Chile (18), and
2022 were surveyed again in 2023, enabling cross- Australia (18).
year comparisons. Figure 9.1.4 highlights the year-over-
Likewise, except for South Africa, all countries in the
year percentage point change in answers to particular
survey sample are now more inclined to believe that
AI-related questions. For every country surveyed
AI will significantly impact their lives in the next three
in both 2022 and 2023, an increase was reported
to five years. The highest increase of 12 percentage
in the degree to which AI products make people
points was reported in Japan, Great Britain, Germany,
nervous. The sharpest increases were reported in Italy
and Australia.
Percentage point change in opinions about AI by country (% agreeing with statement), 2022–23
I have a good understanding of what

3% 0% -4% 5% -1% -6% 2% 9% 11% 7% 6% 11% 3% 4% 1% 6% -2% 3% 0% 3% 4% 7% 5% 4%
arti cial intelligence is

intelligence will profoundly change 4% 12% 0% 10% 11% 3% 2% 6% 12% 12% 7% 11% 12% 6% 5% 10% 5% 3% -2% 6% 5% 5% 8% 11%
intelligence have more bene ts than 2% 2% 2% 7% 6% -4% 1% 6% 5% 8% -1% 4% 10% 5% 8% 10% -3% -1% 2% 4% -3% -1% 7% 2%
drawbacks
intelligence as much as I trust other -3% 6% -1% 9% 5% -5% 0% 3% 3% 9% -1% 5% 6% 9% 6% 7% 0% -1% -1% 8% 0% 3% 3% 1%
companies

14% 18% 9% 16% 14% 18% 6% 19% 9% 16% 14% 24% 3% 7% 10% 14% 12% 8% 1% 13% 3% 16% 6% 11%
intelligence make me nervous
I know which types of products

-3% -1% -3% 4% 1% -1% -9% 3% 2% 6% 0% 4% 5% 7% -1% 1% 2% -6% 2% 8% 0% -1% 11% -4%
and services use AI

intelligence have profoundly changed -8% 4% -6% 12% 2% -4% -5% -1% 2% 1% 2% 3% 5% 6% 4% -3% 0% 2% 1% 11% -8% 1% 3% -2%
Argentina
Australia
Belgium
Brazil
Canada
Chile
Colombia
France
Germany
Great Britain
Hungary
Italy
Japan
Malaysia
Mexico
Netherlands
Peru
Poland
South Africa
South Korea
Spain
Sweden
Turkey
United States
Figure 9.1.4

AI and Jobs
This year’s Ipsos survey included more questions about of respondents think AI is likely to change how they
how people perceive AI’s impact on their current jobs. perform their current job within the next five years,
Figure 9.1.5 illustrates the various global perspectives and 36% fear AI may replace their job in the same
on the expected impact of AI on employment. 57% time frame.
Global opinions on the impact of AI on current jobs, 2023

Likely Don’t know Not likely
AI will change how you do your

57% 8% 35%
current job in the next 5 years
AI will replace your current job

36% 8% 56%
in the next 5 years
0% 20% 40% 60% 80% 100%

% of respondents
Figure 9.1.5

Opinions on whether AI will significantly impact an Specifically, 66% of Gen Z compared to 46% of
individual’s job vary significantly across demographic boomer respondents agree with the statement that
groups (Figure 9.1.6). Younger generations, such as AI will likely affect their current jobs. Additionally,
Gen Z and millennials, are more inclined to agree individuals with higher incomes, more education, and
that AI will change how they do their jobs compared decision-making roles are more likely to foresee AI
to older generations like Gen X and baby boomers. impacting their current employment.
Global opinions on the impact of AI on current jobs by demographic group, 2023

70% 68%
66%
61% 60% 61%
60% 58%
55% 56% 56%
53% 54%
52% 51%
% of respondents
50% 46%
40%
30%
20%
10%
0%
Male
Female
Gen Z
Millennial
Gen X
Boomer
Lower income
Middle income
Upper income
Lower education
Medium education
Higher education
Yes
No
Gender Generation Household income Education Decision-maker
Figure 9.1.6

AI and Livelihood
The Ipsos survey explored the impact respondents and 37% think it will improve their job. Only 34%
believe AI will have on various aspects of their lives, anticipate AI will boost the economy, and just 32%
such as health and entertainment. On topics like time believe it will enhance the job market.
management and entertainment, the majority viewed
Similar to questions about AI products and services,
AI positively (Figure 9.1.7). For instance, 54% of global
responses showed intracountry consistency, with
respondents agree that AI will improve the efficiency
Japanese, Swedes, and Americans generally
of their tasks, and 51% believe AI will enhance
pessimistic about AI’s potential to improve
entertainment options like TV, movies, music, and
livelihoods, whereas Brazilians, Indonesians, and
books. However, skepticism was more prominent in
Mexicans were more optimistic.
other areas. Only 39% feel AI will benefit their health,
Global opinions on the potential of AI improving life by country, 2023

The amount of time it takes

54% 62% 44% 39% 67% 39% 60% 65% 45% 37% 46% 54% 57% 72% 42% 48% 33% 62% 68% 55% 50% 70% 48% 51% 64% 71% 61% 48% 38% 71% 62% 46%
me to get things done
My entertainment options
(television/video content, 51% 63% 43% 37% 64% 42% 63% 64% 32% 39% 45% 41% 57% 71% 44% 45% 33% 57% 68% 50% 47% 71% 40% 46% 57% 66% 54% 51% 37% 68% 60% 40%
movies, music, books)
My health 39% 47% 29% 28% 55% 29% 47% 50% 37% 25% 33% 28% 50% 58% 29% 37% 16% 49% 61% 30% 31% 55% 23% 43% 40% 49% 38% 33% 25% 56% 47% 32%
My job 37% 36% 30% 21% 55% 25% 36% 41% 26% 23% 32% 24% 48% 62% 24% 32% 19% 47% 52% 25% 30% 56% 30% 30% 41% 51% 23% 27% 28% 66% 47% 28%
The economy in my country 34% 28% 26% 18% 51% 20% 31% 40% 24% 27% 31% 25% 54% 58% 25% 29% 22% 50% 48% 27% 28% 46% 29% 34% 50% 39% 34% 25% 21% 62% 39% 23%
The job market 32% 38% 20% 18% 49% 19% 32% 38% 21% 20% 21% 24% 49% 50% 22% 30% 22% 39% 54% 23% 22% 54% 26% 33% 37% 38% 17% 23% 21% 53% 50% 21%
Global
Argentina
Australia
Belgium
Brazil
Canada
Chile
Colombia
France
Germany
Great Britain
Hungary
India
Indonesia
Ireland
Italy
Japan
Malaysia
Mexico
Netherlands
New Zealand
Peru
Poland
Romania
Singapore
South Africa
South Korea
Spain
Thailand
Turkey
United States
Sweden
Figure 9.1.7

Significant demographic differences also exist in entertainment, health, and the economy compared to
perceptions of AI’s potential to enhance livelihoods, their lower-income and less-educated counterparts.
with younger generations generally expressing greater In general, members of Gen Z, those in higher
optimism. For instance, 59% of Gen Z respondents income brackets, and those with more education are
believe AI will improve entertainment options, the most optimistic about AI’s potential to improve
versus only 40% of baby boomers. Additionally, life, while those from the boomer generation, lower
individuals with higher incomes and education levels income brackets, and with less education are the
are more optimistic about AI’s positive impacts on least optimistic.
Global opinions on the potential of AI improving life by demographic group, 2023

The amount of time it takes me

to get things done
55% 53% 63% 57% 52% 43% 47% 53% 61% 46% 53% 60% 56% 50%
My entertainment options
(television/video content, 52% 51% 59% 55% 51% 40% 47% 51% 57% 47% 51% 55% 54% 47%
movies, music, books)
My health 42% 36% 46% 42% 37% 30% 34% 40% 43% 35% 39% 41% 41% 35%
My job 38% 35% 46% 41% 31% 26% 33% 36% 41% 34% 35% 40% 0% 0%
The economy in my country 37% 31% 40% 38% 31% 26% 32% 35% 38% 30% 33% 38% 37% 29%
The job market 34% 30% 39% 36% 28% 23% 31% 32% 34% 30% 32% 33% 34% 28%
Male
Female
Gen Z
Millennial
Gen X
Boomer
Lower income
Middle income
Upper income
Lower education
Medium education
Higher education
Employed
Nonemployed
Gender Generation Household income Education Employment status
Figure 9.1.8

Attitudes on ChatGPT Toronto. In October and November 2023, researchers

Many argue that the launch of ChatGPT by OpenAI from SRI and PEARL conducted a 21-country survey
in November 2022 was a watershed moment in examining global attitudes toward AI.
familiarizing the public with AI. While AI encompasses
Figure 9.1.9 explores the extent of global public
much more than ChatGPT or LLMs, the prominence
awareness of ChatGPT. Among global respondents,
of ChatGPT as one of the most well-known AI
63% claim awareness of ChatGPT. Countries with the
tools makes gauging public sentiment toward it an
highest awareness rates include India (82%), Kenya
interesting approach for better understanding broader
(81%), Indonesia (76%), and Pakistan (76%). Poland
opinions on AI.
reported the lowest awareness, at 43%.
Global Public Opinion on Artificial Intelligence
Figure 9.1.10 highlights how frequently respondents
(GPO-AI) is a report created by the Schwartz
who report being familiar with ChatGPT use the tool.
Reisman Institute for Technology and Society (SRI)
Globally, 17% of users utilize it daily, 36% weekly, and
in collaboration with the Policy, Elections and
16% monthly. India (36%), Pakistan (28%), and Kenya
Representation Lab (PEARL) at the Munk School of
(27%) report the highest levels of daily usage.
Global Affairs and Public Policy at the University of

Global awareness of ChatGPT (% of total), 2023

Source: Global Public Opinion on Arti cial Intelligence (GPO-AI), 2024 | Chart: 2024 AI Index report
Yes Unsure No
Global 63% 7% 30%
Argentina 67% 7% 26%
Australia 60% 5% 36%
Brazil 64% 7% 30%
Canada 64% 7% 28%
Chile 52% 11% 38%
China 60% 9% 31%
France 60% 7% 33%
Germany 60% 7% 33%
India 82% 3% 15%
Indonesia 76% 6% 17%
Italy 52% 8% 41%
Japan 61% 12% 27%
Kenya 81% 17%
Mexico 58% 10% 31%
Pakistan 76% 6% 18%
Poland 43% 16% 41%
Portugal 59% 7% 34%
South Africa 69% 7% 24%
Spain 61% 6% 32%
United Kingdom 61% 6% 32%
United States 55% 7% 39%
0% 20% 40% 60% 80% 100%

% of respondents
Figure 9.1.9

Global usage frequency of ChatGPT (% of total), 2023

Daily Weekly Monthly Rarely
Global 17% 36% 16% 30%
Argentina 9% 33% 20% 37%
Australia 8% 38% 17% 36%
Brazil 21% 38% 14% 27%
Canada 13% 30% 20% 36%
Chile 9% 39% 18% 34%
China 24% 49% 15% 12%
France 12% 27% 19% 43%
Germany 9% 32% 20% 39%
India 36% 39% 10% 15%
Indonesia 20% 39% 11% 30%
Italy 9% 35% 17% 39%
Japan 6% 32% 20% 42%
Kenya 27% 42% 13% 18%
Mexico 13% 42% 19% 25%
Pakistan 28% 34% 16% 22%
Poland 10% 31% 14% 44%
Portugal 10% 27% 18% 44%
South Africa 13% 35% 21% 30%
Spain 11% 35% 16% 38%
United Kingdom 10% 28% 20% 42%
United States 18% 27% 21% 34%
0% 20% 40% 60% 80% 100%

% of respondents
Figure 9.1.10

AI Concerns purposes (49%), its impact on jobs (49%), and

GPO-AI also reported on respondents’ AI-related its potential to violate citizens’ privacy (45%). In
concerns. Figure 9.1.11 presents the percentage of contrast, global citizens were comparatively less
survey respondents who expressed concern about concerned about issues of unequal access to AI
11 specific impacts. Globally, individuals were most (26%), AI’s potential for bias and discrimination
concerned about AI being misused for nefarious (24%), and their own ability to use AI (22%).
Global concerns on the impacts of AI in the next few years, 2023

Misuse/use for
49% 63% 45% 46% 52% 56% 39% 44% 57% 34% 66% 47% 52% 34% 52% 31% 51% 62% 47% 58% 48% 43%
nefarious purposes
Impact of AI
49% 53% 45% 47% 49% 58% 39% 43% 42% 48% 59% 40% 36% 54% 54% 51% 37% 57% 68% 49% 49% 43%
on jobs
Violation of
45% 53% 45% 46% 48% 57% 39% 43% 44% 35% 46% 40% 33% 34% 43% 31% 49% 57% 55% 53% 40% 43%
citizens’ privacy
Dehumanization of
41% 52% 43% 44% 49% 54% 22% 51% 40% 28% 28% 42% 38% 37% 47% 21% 27% 59% 49% 54% 44% 39%
services
Lack of transparency
34% 35% 42% 35% 38% 35% 27% 30% 33% 36% 36% 31% 24% 36% 33% 28% 32% 41% 47% 35% 35% 32%
in decision-making
Impact of AI
33% 32% 28% 33% 34% 33% 26% 28% 27% 41% 35% 24% 25% 47% 30% 43% 28% 37% 43% 33% 27% 27%
on education
Ethical implications 30% 32% 38% 28% 37% 34% 25% 22% 27% 32% 23% 25% 24% 33% 29% 23% 25% 37% 30% 34% 35% 32%
Accuracy of results
28% 23% 38% 18% 32% 24% 24% 23% 24% 39% 30% 20% 29% 43% 25% 34% 20% 22% 38% 20% 31% 30%
and analysis
Uneven access
26% 35% 21% 35% 24% 32% 23% 23% 23% 29% 29% 19% 27% 22% 30% 19% 20% 32% 28% 28% 18% 19%
to AI
Potential for bias
24% 20% 34% 26% 30% 23% 20% 17% 22% 29% 33% 18% 18% 28% 23% 18% 16% 27% 29% 24% 32% 26%
and discrimination
My own ability
22% 22% 18% 20% 18% 24% 18% 19% 16% 36% 30% 17% 16% 33% 27% 31% 14% 22% 29% 18% 19% 19%
to use AI
Global
Argentina
Australia
Brazil
Canada
Chile
China
France
Germany
India
Indonesia
Italy
Japan
Kenya
Mexico
Pakistan
Poland
Portugal
South Africa
Spain
United Kingdom
United States
Figure 9.1.11

Americans’ feelings toward increased use of AI

US Public Opinion in daily life (% of total), 2021–23
Source: Pew Research, 2023 | Chart: 2024 AI Index report
Since 2021, Pew Research Center has been More excited than concerned Equally concerned and excited
investigating sentiment toward AI in the United More concerned than excited

100%
States. They received 11,000 responses to their most
recent 2023 survey. 37%
80% 38%
Figure 9.1.12 shows that over the last year, Americans 52%
% of respondents
have grown increasingly concerned about the use of 60%
18% 15%
AI in their daily lives. In 2021 and 2022, only 37% and
38% of Americans, respectively, reported feeling 40% 10%
more concerned than excited about AI technology.

By 2023, this figure had risen to 52%, indicating that 20%
45% 46%
36%
a majority of Americans now feel more concerned
than excited about AI technology. 0%
2021 2022 2023
Figure 9.1.12
Pew also surveyed Americans’ opinions on whether they believed AI helped or hindered in specific contexts
(Figure 9.1.13). Respondents reported that AI was more likely to be beneficial, particularly in assisting people to
find products or services online, with 49% expressing this view. However, 53% of respondents indicated that AI
was more likely to be detrimental than beneficial in safeguarding personal information privacy.
Americans’ opinions of whether AI helps or hurts in speci c settings (% of total), 2023

Helps more than it hurts Not sure Hurts more than it helps
People nding products, services

49% 35% 15%
they are interested in online
Doctors providing quality care

37% 42% 20%
to patients
Companies making safe cars

37% 44% 19%
and trucks
People nding accurate

33% 40% 27%
information online
People taking care of their health 33% 47% 19%
Companies providing quality

28% 37% 34%
customer service
Police maintaining public safety 24% 49% 26%
People keeping their personal

10% 37% 53%
information private
0% 20% 40% 60% 80% 100%

% of respondents Figure 9.1.13

Pew further segmented the data by education level degrees are more likely to report that AI can
level (Figure 9.1.14). Across various use categories, significantly aid doctors in delivering quality care to
Americans with higher education levels are more patients and assist people in discovering products
likely to believe in AI’s potential to help rather than and services online that interest them.
harm. For instance, individuals with college or higher-
Di erences in Americans’ view of AI’s impact by education level (% of total), 2023

People nding products and services they are interested in online Police maintaining public safety People keeping their personal information private
College degree+ 60% 27% 27% 49% 24% 32% 59%
Some college or less 44% 39% 17% 23% 49% 27% 39% 50%
0% 50% 100% 0% 50% 100% 0% 50% 100%
Companies making safe cars and trucks Doctors providing quality care to patients People taking care of their health
College degree+ 45% 40% 15% 46% 38% 16% 41% 42% 16%
Some college or less 33% 46% 20% 32% 44% 23% 29% 50% 20%
0% 50% 100% 0% 50% 100% 0% 50% 100%
People nding accurate information online Companies providing quality customer service
College degree+ 36% 36% 28% 34% 33% 33%

Help
Not sure
Some college or less 31% 42% 26% 25% 39% 35%
Hurt
0% 50% 100% 0% 50% 100%

% of respondents
Figure 9.1.14

Index Report 2024 9.2 Social Media Data
9.2 Social Media Data

Dominant Models sentiment score expresses the ratio of positive to
negative sentiment around a given topic. A net
Public attitudes toward AI can be assessed through
sentiment score of +100 means that all conversation
both quantitative and qualitative analyses of
is positive; a score of -100 means that all conversation
posts made on social media. Quid analyzed social
is negative. Many models released in 2023 received
conversations surrounding AI models across various
positive social media sentiment. Some of the models
sectors from January to December 2023, examining
that garnered the highest degree of positive attention
over 7 million social media posts.
were GraphCast, a new AI-powered weather
Figure 9.2.1 shows the net sentiment score of various forecasting system from DeepMind, and Claude 2.1,
AI models released throughout the year. The net one of Anthropic’s most recent LLMs.
Net sentiment score of AI models by quarter, 2023

2023/Q1 2023/Q2 2023/Q3 2023/Q4
Stable Di�usion 40 37 53 45
Copilot 56 66 73 67
GPT-4 42 48 51 44
PaLM 2 NaN 62 66 75
DALL-E 3 NaN NaN 74 72
Mistral 7B NaN NaN 92 56
Grok NaN NaN NaN 57
GPT-4 Turbo NaN NaN NaN 68
Whisper V3 NaN NaN NaN 83
GraphCast NaN NaN NaN 94
Claude 2.1 NaN NaN NaN 87
Stable Video Di�usion NaN NaN NaN 66
Orca 2 NaN NaN NaN 83
In�ection-2 NaN NaN NaN 81
Gemini NaN NaN NaN 36
Midjourney v6 NaN NaN NaN 71
Figure 9.2.1

Figure 9.2.2 highlights the proportion of AI-related the fourth quarter of 2023, GPT-4 still captured 45%
social media conversation that was dominated by the of social media attention. Other models that garnered
release of particular models. GPT-4 remained a dom-
2
significant attention included Grok, Stable Diffusion,
inant topic of consumer conversation throughout the and Gemini.
year. Despite the release of numerous new models by
Select models’ share of AI social media attention by quarter, 2023

2023/Q1 2023/Q2 2023/Q3 2023/Q4
Stable Di�usion 46% 21% 24% 12%
Copilot 0% 1% 1% 1%
GPT-4 53% 71% 62% 45%
PaLM 2 NaN% 5% 4% 2%
DALL-E 3 NaN% NaN% 3% 7%
Mistral 7B NaN% NaN% 0% 2%
Grok NaN% NaN% NaN% 16%
GPT-4 Turbo NaN% NaN% NaN% 2%
Whisper V3 NaN% NaN% NaN% 0%
GraphCast NaN% NaN% NaN% 0%
Claude 2.1 NaN% NaN% NaN% 2%
Stable Video Di�usion NaN% NaN% NaN% 0%
Orca 2 NaN% NaN% NaN% 0%
In�ection-2 NaN% NaN% NaN% 0%
Gemini NaN% NaN% NaN% 11%
Midjourney v6 NaN% NaN% NaN% 0%
Figure 9.2.2
2 The figures in this section consider all AI-related social media conversation. The percentage associated with a model in a quarter in Figure 9.2.2 represents the share of all AI-related social
media conversation in that quarter that was concerned with that model.

Highlight:
AI-Related Social Media Discussion in 2023

The following section, featuring data from Quid, In Q4 2023, discussions surrounding the release of
profiles specific narratives surrounding the GPT-4 Turbo, launched in November, saw a significant
discussion of AI that occurred on social media increase. Positive sentiment centered around its
in 2023. GPT-4 gathered most of the discussion innovative features and upgrades that could transform
volume in Q2 after its launch on March 14, programmers’ workflows. These enhancements
2023. Positive sentiment was primarily driven by included longer conversation capabilities, improved
its improvements, including faster processing contextual understanding, and multimodal ability to
speed, improved accuracy, and praise for its generate images. However, some negative feedback
ability to enhance productivity across different arose due to disappointment with the model’s
types of work tasks, such as coding, corporate knowledge cutoff in April 2023 and slower loading
collaboration, and content creation. Negative speeds compared to GPT-4. Some of the sample social
sentiment primarily stemmed from complaints media posts from this time included:
about occasional crashes of the ChatGPT
“This is just insane… My GPT-4 coding assistant
website, along with an open letter led by Elon
can now: - build and design a frontend - create a
Musk and supported by over 1,300 artificial
backend with working db - correctly hook them up
intelligence experts, urging AI laboratories to
- upload code to GitHub - deploy it to Vercel[.] I can
pause training of powerful AI systems. Moreover,
now build *complete* apps with nothing more than
there was disagreement regarding the “open
my voice. The future is here!” — @mckaywrigley
letter” and the suggestion to halt AI research,
particularly considering its potential to have “Trying to make my LinkedIn profile more
a positive impact across multiple fields. For interesting if a recruiter is using a large language
example, Andrew Ng posted: model like GPT-4 to send me a message. Looks like
it works on the public version of my profile!”
“1/The call for a 6 month moratorium
— @brdskggs
on making AI progress beyond GPT-4
is a terrible idea. I’m seeing many new “GPT-4 Turbo has knowledge of the world up
applications in education, healthcare, food, to April 2023. @sama says the team is ‘just as
... that’ll help many people. Improving GPT-4 annoyed as you, maybe more’ that the knowledge
will help. Lets balance the huge value AI is is not more updated and that @openai will work to
creating vs. realistic risks.” — @AndrewYNg make sure it never gets that outdated again.”
— @VentureBeat

Highlight:
AI-Related Social Media Discussion in 2023 (cont’d)

Discussions about Stable Diffusion were more “Stable Diffusion XL with ControlNet is insane
prominent in the first half of 2023, but decreased Discover the future of AI with Stability AI’s
toward the year’s end. More posts mentioned latest innovation: Stable Diffusion XL (SDXL) 1.0!
Stable Diffusion XL models than Stable Diffusion This powerful text-to-image generation model
2.0 (around 16 times more). Positive sentiment improves image quality and makes it easier for
was mainly driven by the tool’s rapid increase in users to create highly detailed images. Built on a
popularity, the potential benefits of AI in enhancing massive 3.5 billion-parameter base model, SDXL
creativity, and the excitement surrounding technical 1.0 boasts better accuracy and understanding of
advancements and improvements (e.g., enhanced various concepts. Want to know more? Check
accuracy, better understanding of various concepts, out my video where I delve deeper into this
and higher resolution). On the other hand, negative groundbreaking technology!” — @work.with.ai
sentiment revolved around concerns about legal and
Both Gemini (from Google) and Grok (from xAI)
ethical issues related to AI-generated content, such
saw an increase in conversations during Q4 due
as copyright violations, ownership of AI-created
to their late year launches. Positive feedback for
material, and the possible replacement of human
Gemini focused on its improved accuracy and
artists by AI. Additionally, worries were expressed
multilingual capabilities, as well as its potential
about the risks and threats linked to artificial
to enhance various Google services like Search
intelligence, like its potential harmful effects, the
and Ads. On the other hand, negative opinions
spread of misinformation, and the possibility of AI
stemmed from concerns about inaccurate results,
being used for academic cheating.
disappointment over Gemini’s delayed release, and
“Very happy about sharing smashed Stable skepticism toward the Gemini AI demo.
Diffusion models! - In one line of code, we
“WHAT IS GOOGLE GEMINI AND HOW CAN
compressed popular text-to-image Stable
YOU USE IT?” — Erik Hyrkas
Diffusion models for A100. - Evaluations across
various metrics show significant speedup “Gemini Ultra (if Google is honest) Will Blow Our
improvements, energy savings, and CO2 Minds ” — Tina Huang
emissions savings. Now looking forward [to]
sharing more compression results :) Feel free
to contact us to achieve the same on your own
models https://pruna.ai/contact ;)”
— @Bertrand_Charp

Index Report 2024 Appendix
Index Report 2024
Appendix
Chapter 1 Research and Development 460
Chapter 2 Technical Performance 465
Chapter 3 Responsible AI 472
Chapter 4 Economy 478
Chapter 5 Science and Medicine 488
Chapter 6 Education 491
Chapter 7 Policy and Governance 495
Chapter 8 Diversity 500
Chapter 9 Public Opinion 501

Artificial Intelligence Appendix
Index Report 2024 Chapter 1: Research and Development

Acknowledgments Technology Observatory’s website.1 Using CAT, users
can also interact with country bibliometric, patent,
The AI Index would like to acknowledge Ben Cottier and investment data.2
and Robi Rahman from Epoch for leading the work
Publications From CSET Merged Corpus of
analyzing machine learning training costs; Robi Rahman Scholarly Literature
for leading work regarding the national affiliation of Sources
notable systems; and James da Costa, for doing coding CSET’s merged corpus of scholarly literature
work instrumental to the sectoral and national affiliation combines distinct publications from Clarivate’s Web
analysis of foundation models. of Science, OpenAlex, The Lens, Semantic Scholar,
arXiv, and Papers With Code.
AI Conference Attendance
Updates: The source list of scholarly literature for
The AI Index reached out to the organizers of various
CSET’s merged corpus has been changed from prior
AI conferences in 2023 and asked them to provide
years, with the inclusion of OpenAlex, the Lens,
information on total attendance. Some conferences
and Semantic Scholar, and the exclusion of Digital
posted their attendance totals online; when this was the
Science’s Dimensions and the Chinese National
case, the AI Index used those reported totals and did
Knowledge Infrastructure (CNKI).
not reach out to the conference organizers.
Methodology
CSET To create the merged corpus, CSET deduplicated
Prepared by Autumn Toney across the listed sources using publication metadata,
and then combined the metadata for linked
The Center for Security and Emerging Technology
publications. For analysis of AI publications, CSET
(CSET) is a policy research organization within
used an English-language subset of this corpus
Georgetown University’s Walsh School of Foreign
published since 2010. CSET researchers developed
Service that produces data-driven research at the
a classifier for identifying AI-related publications by
intersection of security and technology, providing
leveraging the arXiv repository, where authors and
nonpartisan analysis to the policy community.
editors tag papers by subject.3
For more information about how CSET analyzes
Updates: The AI classifier was updated from the
bibliometric and patent data, see the Country Activity
version used in prior years; Dunham, Melot, and
Tracker (CAT) documentation on the Emerging
Murdick4 describe the previously implemented
1 https://eto.tech/tool-docs/cat/
2 https://cat.eto.tech/
3 Christian Schoeberl, Autumn Toney, and James Dunham, “Identifying AI Research” (Center for Security and Emerging Technology, July 2023), https://doi.org/10.51593/20220030.
4 James Dunham, Jennifer Melot, and Dewey Murdick, “Identifying the Development and Application of Artificial Intelligence in Scientific Text,” arXiv preprint, arXiv:2002.07143 (2020).
Table of Contents Appendix 460

classifier; and Schoeberl, Toney, and Dunham describe publication type (e.g., academic journal articles,
the updated classifier used in this analysis. conference papers) were provided where available.
These publication types were disaggregated by
CSET matched each publication in the analytic corpus
affiliation country as described above.
with predictions from a field-of-study model derived
from Microsoft Academic Graph (MAG)’s taxonomy, CSET also provided publication affiliation sector(s)
which yields hierarchical labels describing the where, as in the country attribution analysis, sectors
published research field(s) of study and corresponding were associated with publications through authors’
scores. CSET researchers identified the most common
5
affiliations. Not all affiliations were characterized in
fields of study in our corpus of AI-relevant publications terms of sectors; CSET researchers relied primarily
since 2010 and recorded publications in all other fields on ROR for this purpose, and not all organizations can
as “Other AI.” English-language AI-relevant publications be found in or linked to ROR.6 Where the affiliation
were then tallied by their top-scoring field and sector is available, papers were counted toward these
publication year. sectors, by year.
Updates: The methodology to assign MAG fields of CSET counted cross-sector collaborations as distinct
study was updated from the methodology used in prior pairs of sectors across authors for each publication.
years. Toney and Dunham describe the field of study Collaborations are only counted once: For example,
assignment pipeline used in this analysis; prior years if a publication has two authors with an academic
used the original MAG implementation. affiliation and two with an industry affiliation, it is
counted as a single academic-industry collaboration.
CSET also provided publication counts and year-by-
year citations for AI-relevant work associated with each Patents From CSET’s AI and Robotics Patents
country. A publication is associated with a country Dataset
if it has at least one author whose organizational Source
affiliation(s) is located in that country. If there is CSET’s AI patents dataset was developed by CSET
no observed country, the publication receives an and 1790 Analytics and includes data from The Lens,
“Unknown/Missing” country label. Citation counts 1790 Analytics, and EPO’s PATSTAT. Patents relevant
aren’t available for all publications; those without to the development and application of AI and robotics
counts weren’t included in the citation analysis. Over were identified by their CPC/IPC codes and keywords.
70% of English-language AI papers published between
Methodology
2010 and 2022 have citation data available.
In this analysis, patents were grouped by year and
Additionally, publication counts by year and by country, and then counted at the “patent family”
5 These scores are based on cosine similarities between field-of-study and paper embeddings. See Autumn Toney and James Dunham, “Multi-Label Classification of Scientific Research
Documents Across Domains and Languages,” Proceedings of the Third Workshop on Scholarly Document Processing (Association for Computational Linguistics, 2022): 105–14, https://
aclanthology.org/2022.sdp-1.12/.
6 See https://ror.org/ for more information about the ROR dataset.

7 Patents are analyzed at the “patent family” level rather than “patent document” level because patent families are a collective of patent documents all associated with a single invention and/
or innovation by the same inventors/assignees. Thus, counting at the “patent family” level mitigates artificial number inflation when there are multiple patent documents in a patent family or if
a patent is filed in multiple jurisdictions.

level.7 CSET extracted year values from the first 3. All of the landmark publications are
publication date within a family. Countries are assigned aggregated within time periods (e.g., monthly
to patents based on the country or filing office where or yearly) with the national contributions
a patent is first filed (e.g., if a patent is filed with the added up to determine what each country’s
USPTO on January 1, 2020, and then with the German contribution to landmark AI research was
Patenting Office on January 2, 2020, the patent is during each time period.
classified as a patent with U.S. inventors). Note that 8
4. The contributions of different countries are
the same patent may have multiple countries (but not compared over time to identify any trends.
years) attributed to it if the inventors filed their patent
in multiple countries on the same first filing date (e.g.,
if a patent is filed with the USPTO on January 1, 2020,
Epoch Notable Models
and then with the German Patenting Office on January Analysis
1, 2020, the patent is classified as a patent with U.S.
The AI forecasting research group Epoch maintains
inventors and as a patent with German inventors).
a dataset of landmark AI and ML models, along with
Note that patents filed with supranational accompanying information about their creators and
organizations, such as patents filed under WIPO (the publications, such as the list of their (co)authors,
World Intellectual Property Organization), EP (European number of citations, type of AI task accomplished,
Patent Organization), and EA (a special area of Spain and amount of compute used in training.
not included in the European Union), also fall under the
The nationalities of the authors of these papers have
“Rest of World” category.
important implications for geopolitical AI forecasting.
As various research institutions and technology
Ecosystems Graph Analysis companies start producing advanced ML models, the
To track the distribution of AI foundation models by global distribution of future AI development may shift
country, the AI Index team took the following steps: or concentrate in certain places, which in turn affects
the geopolitical landscape because AI is expected
1. A snapshot of the Ecosystems Graph was taken in
to become a crucial component of economic and
early January 2024.
military power in the near future.
2. Authors of foundation models are attributed to
countries based on their affiliation credited on To track the distribution of AI research contributions
the paper/technical documentation associated on landmark publications by country, the Epoch
with the model. For international organizations, dataset is coded according to the following
authors are attributed to the country where the methodology:
organization is headquartered, unless a more
specific location is indicated.
8 In CSET’s data analysis for the 2022 AI Index, we used the most recent publication date for a patent family. This method has the advantage of capturing updates within a patent
family (such as amendments). However, to remain consistent with CSET’s other data products, including the Country Activity Tracker (available at https://cat.eto.tech/), we opted to
use the first filing year instead in this data analysis.

1. A snapshot of the dataset was taken on Mapping AI Projects to Geographic Areas

January 1, 2024. This includes papers Public AI projects are mapped to geographic areas
about landmark models, selected using the using IP address geolocation to determine the mode
inclusion criteria of importance, relevance, location of a project’s owners each year. Each project
and uniqueness, as described in the Compute owner is assigned a location based on their IP address
Trends dataset documentation. when interacting with GitHub. If a project owner
2. The authors are attributed to countries based changes locations within a year, the location for the
on their affiliation credited on the paper. project would be determined by the mode location
For international organizations, authors of its owners sampled daily in the year. Additionally,
are attributed to the country where the the last known location of the project owner is
organization is headquartered, unless a more carried forward on a daily basis even if no activities
specific location is indicated. were performed by the project owner that day. For
3. All of the landmark publications are example, if a project owner performed activities
aggregated within time periods (e.g., monthly within the United States and then became inactive for
or yearly) with the national contributions six days, that project owner would be considered to
added up to determine what each country’s be in the United States for that seven-day span.
contribution to landmark AI research was
during each time period.
4. The contributions of different countries are Training Cost Analysis
compared over time to identify any trends. To create the dataset of cost estimates, the Epoch
database was filtered for models released during the
GitHub large-scale ML era9 that were above the median of
training compute in a two-year window centered on
Identifying AI Projects
their release date. This filtered for the largest-scale
In partnership with researchers from Harvard Business
ML models. There were 138 qualifying systems based
School, Microsoft Research, and Microsoft’s AI for
on these criteria. Of these systems, 48 had sufficient
Good Lab, GitHub identifies public AI repositories
information to estimate the training cost.
following the methodologies of Gonzalez, Zimmerman,
and Nagappan, 2020, and Dohmke, Iansiti, and For the selected ML models, the training time and
Richards, 2023, using topic labels related to AI/ML the type, quantity, and utilization rate of the training
and generative AI, respectively, along with the topics hardware were determined from the publication,
“machine learning,” “deep learning,” or “artificial press release, or technical reports, as applicable.
intelligence.” GitHub further augments the dataset with Cloud rental prices for the computing hardware used
repositories that have a dependency on the PyTorch, by these models were collected from online historical
TensorFlow, or OpenAI libraries for Python. archives of cloud vendors’ websites.10
9 The selected cutoff date was September 1, 2015, in accordance with Compute Trends Across Three Eras of Machine Learning (Epoch, 2022).
10 Historic prices were collected from archived snapshots of Amazon Web Services, Microsoft Azure, and Google Cloud Platform price catalogs viewed through the Internet Archive
Wayback Machine.

Training costs were estimated from the hardware type,

quantity, and time by multiplying the hourly cloud rental
cost rates (at the time of training)11 by the quantity of
hardware hours. This yielded the cost to train each
model using the same hardware used by the authors
to train the same model at the time. However, some
developers purchased hardware rather than renting
cloud computers, so the true costs incurred by the
developers may vary.
Various challenges were encountered while estimating

the training cost of these models. Often, the developers
did not disclose the duration of training or the hardware
that was used. In other cases, cloud compute pricing
was not available for the hardware. The investigation
of training cost trends is continued in a forthcoming
Epoch report, including an expanded dataset with more
models and hardware prices.
11 The chosen rental cost rate was the most recent published price for the hardware and cloud vendor used by the developer of the model, at a three-year commitment rental rate, after
subtracting the training duration and two months from the publication date. If this price was not available, the most analogous price was used: the same hardware and vendor at a different
date, otherwise the same hardware from a different cloud vendor. If a three-year commitment rental rate was unavailable, this was imputed from other rental rates based on the empirical
average discount for the given cloud vendor. If the exact hardware type was not available, e.g., “NVIDIA A100 SXM4 40GB,” then a generalization was used, e.g., “NVIDIA A100.”

Index Report 2024 Chapter 2: Technical Performance

Acknowledgments HEIM, please read the original paper.
8. HELM: Data on HELM was taken from the HELM
The AI Index would like to acknowledge Andrew Shi for
leaderboard in January 2024. To learn more about
his work doing a literature review on the environmental
HELM, please read the original paper.
impact of AI models; Emily Capstick for her work studying
9. HumanEval: Data on HumanEval was taken from
the use of RLHF in machine learning models; Sukrut Oak
the HumanEval Papers With Code leaderboard in
for his work generating sample Midjourney generations;
January 2024. To learn more about HumanEval,
and Emma Williamson for her work identifying significant
please read the original paper.
AI technical advancements for the timeline.
10. M
ATH: Data on MATH was taken from the MATH
Benchmarks Papers With Code leaderboard in January 2024.

To learn more about MATH, please read the
1. AgentBench: Data on AgentBench was taken from original paper.
the AgentBench paper in January 2024. To learn more 11. M
LAgentBench: Data on MLAgentBench was
about AgentBench, please read the original paper. taken from the MLAgentBench paper in January
2. BigToM: Data on BigToM was taken from the BigToM 2024. To learn more about MLAgentBench, please
paper in January 2024. To learn more about BigToM, read the original paper.
please read the original paper. 12. MMLU: Data on MMLU was taken from the
3. C
hatbot Arena Leaderboard: Data on the Chatbot MMLU Papers With Code leaderboard in January
Arena Leaderboard was taken from the Chatbot 2024. To learn more about MMLU, please read the
Arena Leaderboard in January 2024. To learn more original paper.
about the Chatbot Arena Leaderboard, please read 13. MMMU: Data on MMMU was taken from the
the original paper. MMMU leaderboard in January 2024. To learn
4. EditVal: Data on EditVal was taken from the EditVal more about MMMU, please read the original paper.
paper in January 2024. To learn more about EditVal, 14. MoCa: Data on MoCa was taken from the MoCa
please read the original paper. paper in January 2024. To learn more about
5. G
PQA: Data on GPQA was taken from the GPQA MoCa, please read the original paper.
paper in January 2024. To learn more about GPQA, 15. PlanBench: Data on PlanBench was taken from the
please read the original paper. PlanBench paper in January 2024. To learn more
6. GSM8K: Data on GSM8K was taken from the about PlanBench, please read the original paper.
GSM8K Papers With Code leaderboard in January 16. S
WE-bench: Data on SWE-bench was taken from
2024. To learn more about GSM8K, please read the the SWE-bench leaderboard in January 2024. To
original paper. learn more about SWE-bench, please read the
7. HEIM: Data on HEIM was taken from the HEIM original paper.

17. TruthfulQA: Data on TruthfulQA was taken from

the TruthfulQA Papers With Code leaderboard in
January 2024. To learn more about TruthfulQA,
please read the original paper.
18. UCF101: Data on UCF101 was taken from the
UCF101 Papers With Code leaderboard in January
2024. To learn more about UCF101, please read the
original paper.
19. VCR: Data on VCR was taken from the VCR
VCR, please read the original paper.
20. VisIT-Bench: Data on VisIT-Bench was taken from
the VisIT-Bench leaderboard in January 2024. To
learn more about VisIT-Bench, please read the
original paper.
Environmental Impact
To assess the environmental impact of AI models, the
AI Index team surveyed technical reports of prominent
foundation models to determine whether the model
developers disclosed carbon emissions. The Index
also reviewed papers by researchers that estimated
the carbon footprint of various models. The technical
reports surveyed, as well as the papers estimating the
carbon impact of various models, are included in the
works cited for this chapter.
RLHF
To identify foundation models using RLHF, the AI Index
team reviewed the technical documentation of every
foundation model included in the Ecosystem Graph,
and searched for evidence that RLHF had been used in
the model’s development process. The year in which a
model is said to have used RLHF refers to the year the
model was released.

Works Cited
Agostinelli, A., Denk, T. I., Borsos, Z., Engel, J., Verzetti, M., Caillon, A., Huang, Q., Jansen, A., Roberts, A., Tagliasacchi, M.,
Sharifi, M., Zeghidour, N. & Frank, C. (2023). MusicLM: Generating Music From Text (arXiv:2301.11325). arXiv.
http://arxiv.org/abs/2301.11325.
Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., Finn, C., Fu, C., Gopalakrishnan, K., Hausman, K., Herzog,
A., Ho, D., Hsu, J., Ibarz, J., Ichter, B., Irpan, A., Jang, E., Ruano, R. J., Jeffrey, K., … Zeng, A. (2022). Do As I Can, Not As I Say:
Grounding Language in Robotic Affordances (arXiv:2204.01691). arXiv. https://doi.org/10.48550/arXiv.2204.01691.
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C.,
Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., … Kaplan, J. (2022). Constitutional
AI: Harmlessness From AI Feedback (arXiv:2212.08073). arXiv. https://doi.org/10.48550/arXiv.2212.08073.
Bairi, R., Sonwane, A., Kanade, A., C, V. D., Iyer, A., Parthasarathy, S., Rajamani, S., Ashok, B. & Shet, S. (2023). CodePlan:
Repository-Level Coding Using LLMs and Planning (arXiv:2309.12499). arXiv. https://doi.org/10.48550/arXiv.2309.12499.
Basu, S., Saberi, M., Bhardwaj, S., Chegini, A. M., Massiceti, D., Sanjabi, M., Hu, S. X. & Feizi, S. (2023). EditVal: Benchmarking
Diffusion Based Text-Guided Image Editing Methods (arXiv:2310.02426). arXiv. http://arxiv.org/abs/2310.02426.
Besta, M., Blach, N., Kubicek, A., Gerstenberger, R., Podstawski, M., Gianinazzi, L., Gajda, J., Lehmann, T., Niewiadomski,
H., Nyczyk, P. & Hoefler, T. (2024). Graph of Thoughts: Solving Elaborate Problems with Large Language Models
(arXiv:2308.09687). arXiv. http://arxiv.org/abs/2308.09687.
Bitton, Y., Bansal, H., Hessel, J., Shao, R., Zhu, W., Awadalla, A., Gardner, J., Taori, R. & Schmidt, L. (2023). VisIT-Bench:
A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use (arXiv:2308.06595). arXiv.
Blattmann, A., Rombach, R., Ling, H., Dockhorn, T., Kim, S. W., Fidler, S. & Kreis, K. (2023). Align Your Latents: High-Resolution
Video Synthesis With Latent Diffusion Models (arXiv:2304.08818). arXiv. http://arxiv.org/abs/2304.08818.
Brohan, A., Brown, N., Carbajal, J., Chebotar, Y., Chen, X., Choromanski, K., Ding, T., Driess, D., Dubey, A., Finn, C.,
Florence, P., Fu, C., Arenas, M. G., Gopalakrishnan, K., Han, K., Hausman, K., Herzog, A., Hsu, J., Ichter, B., … Zitkovich, B.
(2023). RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. (arXiv:2307.15818). arXiv.
https://arxiv.org/abs/2307.15818.
Castaño, J., Martínez-Fernández, S., Franch, X. & Bogner, J. (2023). Exploring the Carbon Footprint of Hugging Face’s ML
Models: A Repository Mining Study. 2023 ACM/IEEE International Symposium on Empirical Software Engineering and
Measurement (ESEM), 1–12. https://doi.org/10.1109/ESEM56168.2023.10304801.
Chen, L., Chen, Z., Zhang, Y., Liu, Y., Osman, A. I., Farghali, M., Hua, J., Al-Fatesh, A., Ihara, I., Rooney, D. W. & Yap, P.-S. (2023).
“Artificial Intelligence-Based Solutions for Climate Change: A Review.” Environmental Chemistry Letters 21, no. 5: 2525–57.
https://doi.org/10.1007/s10311-023-01617-y.
Chen, L., Zaharia, M. & Zou, J. (2023). How Is ChatGPT’s Behavior Changing Over Time? (arXiv:2307.09009). arXiv.
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A.,
Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., … Zaremba, W. (2021). Evaluating Large
Language Models Trained on Code (arXiv:2107.03374; Version 2). arXiv. https://doi.org/10.48550/arXiv.2107.03374.
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S. & Amodei, D. (2023). Deep Reinforcement Learning From Human
Preferences (arXiv:1706.03741). arXiv. https://doi.org/10.48550/arXiv.1706.03741.

Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., Hesse, C. &
Schulman, J. (2021). Training Verifiers to Solve Math Word Problems (arXiv:2110.14168). arXiv. http://arxiv.org/abs/2110.14168.
Copet, J., Kreuk, F., Gat, I., Remez, T., Kant, D., Synnaeve, G., Adi, Y. & Défossez, A. (2024). Simple and Controllable Music
Generation (arXiv:2306.05284). arXiv. https://doi.org/10.48550/arXiv.2306.05284.
Dettmers, T., Pagnoni, A., Holtzman, A. & Zettlemoyer, L. (2023). QLoRA: Efficient Finetuning of Quantized LLMs
Driess, D., Xia, F., Sajjadi, M. S. M., Lynch, C., Chowdhery, A., Ichter, B., Wahid, A., Tompson, J., Vuong, Q., Yu, T., Huang,
W., Chebotar, Y., Sermanet, P., Duckworth, D., Levine, S., Vanhoucke, V., Hausman, K., Toussaint, M., Greff, K., … Florence, P.
(2023). PaLM-E: An Embodied Multimodal Language Model (arXiv:2303.03378). arXiv. http://arxiv.org/abs/2303.03378.
Gandhi, K., Fränken, J.-P., Gerstenberg, T. & Goodman, N. D. (2023). Understanding Social Reasoning in Language Models
With Language Models (arXiv:2306.15448). arXiv. http://arxiv.org/abs/2306.15448.
Gemini Team: Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A., Millican, K.,
Silver, D., Petrov, S., Johnson, M., Antonoglou, I., Schrittwieser, J., Glaese, A., Chen, J., Pitler, E., … Vinyals, O. (2023). Gemini:
A Family of Highly Capable Multimodal Models (arXiv:2312.11805). arXiv. http://arxiv.org/abs/2312.11805.
Girdhar, R., Singh, M., Brown, A., Duval, Q., Azadi, S., Rambhatla, S. S., Shah, A., Yin, X., Parikh, D. & Misra, I. (2023). Emu Video:
Factorizing Text-to-Video Generation by Explicit Image Conditioning (arXiv:2311.10709). arXiv. http://arxiv.org/abs/2311.10709.
Guha, N., Nyarko, J., Ho, D. E., Ré, C., Chilton, A., Narayana, A., Chohlas-Wood, A., Peters, A., Waldon, B., Rockmore, D. N.,
Zambrano, D., Talisman, D., Hoque, E., Surani, F., Fagan, F., Sarfaty, G., Dickinson, G. M., Porat, H., Hegland, J., … Li, Z. (2023).
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models (arXiv:2308.11462).
arXiv. http://arxiv.org/abs/2308.11462.
Haque, A., Tancik, M., Efros, A. A., Holynski, A. & Kanazawa, A. (2023). Instruct-NeRF2NeRF: Editing 3D Scenes With
Instructions (arXiv:2303.12789). arXiv. http://arxiv.org/abs/2303.12789.
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. & Steinhardt, J. (2021). Measuring Massive Multitask
Language Understanding (arXiv:2009.03300). arXiv. http://arxiv.org/abs/2009.03300.
Hendrycks, D., Burns, C., Kadavath, S., Arora, A., Basart, S., Tang, E., Song, D. & Steinhardt, J. (2021). Measuring Mathematical
Problem Solving With the MATH Dataset (arXiv:2103.03874). arXiv. http://arxiv.org/abs/2103.03874.
Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M. & Leskovec, J. (2021). Open Graph Benchmark:
Datasets for Machine Learning on Graphs (arXiv:2005.00687). arXiv. https://doi.org/10.48550/arXiv.2005.00687.
Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X. & Zhou, D. (2024). Large Language Models Cannot
Self-Correct Reasoning Yet (arXiv:2310.01798). arXiv. http://arxiv.org/abs/2310.01798.
Huang, Q., Vora, J., Liang, P. & Leskovec, J. (2023). Benchmarking Large Language Models as AI Research Agents
Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O. & Narasimhan, K. (2023). SWE-bench: Can Language Models
Resolve Real-World GitHub Issues? (arXiv:2310.06770). arXiv. https://doi.org/10.48550/arXiv.2310.06770.
Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H. & Szolovits, P. (2020). What Disease Does This Patient Have?
A Large-Scale Open Domain Question Answering Dataset From Medical Exams (arXiv:2009.13081). arXiv.
Kıcıman, E., Ness, R., Sharma, A. & Tan, C. (2023). Causal Reasoning and Large Language Models: Opening a New Frontier
for Causality (arXiv:2305.00050). arXiv. http://arxiv.org/abs/2305.00050.

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., Dollár, P. &
Girshick, R. (2023). Segment Anything (arXiv:2304.02643). arXiv. http://arxiv.org/abs/2304.02643.
Kočiský, T., Schwarz, J., Blunsom, P., Dyer, C., Hermann, K. M., Melis, G. & Grefenstette, E. (2018). “The NarrativeQA
Reading Comprehension Challenge.” Transactions of the Association for Computational Linguistics 6: 317–28.
https://doi.org/10.1162/tacl_a_00023.
Krizhevsky, A. (2009). Learning Multiple Layers of Features From Tiny Images. https://www.semanticscholar.org/paper/
Learning-Multiple-Layers-of-Features-from-Tiny-Krizhevsky/5d90f06bb70a0a3dced62413346235c02b1aa086.
Kwiatkowski, T., Palomaki, J., Redfield, O., Collins, M., Parikh, A., Alberti, C., Epstein, D., Polosukhin, I., Devlin, J., Lee, K.,
Toutanova, K., Jones, L., Kelcey, M., Chang, M.-W., Dai, A. M., Uszkoreit, J., Le, Q. & Petrov, S. (2019). “Natural Questions:
A Benchmark for Question Answering Research.” Transactions of the Association for Computational Linguistics 7: 452–66.
https://doi.org/10.1162/tacl_a_00276.
Lee, H., Phatale, S., Mansoor, H., Mesnard, T., Ferret, J., Lu, K., Bishop, C., Hall, E., Carbune, V., Rastogi, A. & Prakash, S.
(2023). RLAIF: Scaling Reinforcement Learning From Human Feedback With AI Feedback (arXiv:2309.00267). arXiv.
Lee, T., Yasunaga, M., Meng, C., Mai, Y., Park, J. S., Gupta, A., Zhang, Y., Narayanan, D., Teufel, H. B., Bellagente, M., Kang, M.,
Park, T., Leskovec, J., Zhu, J.-Y., Fei-Fei, L., Wu, J., Ermon, S. & Liang, P. (2023). Holistic Evaluation of Text-to-Image Models
(arXiv:2311.04287). arXiv. https://doi.org/10.48550/arXiv.2311.04287.
Li, J., Cheng, X., Zhao, W. X., Nie, J.-Y. & Wen, J.-R. (2023). HaluEval: A Large-Scale Hallucination Evaluation Benchmark
for Large Language Models (arXiv:2305.11747). arXiv. https://doi.org/10.48550/arXiv.2305.11747.
Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayanan, D., Wu, Y., Kumar, A., Newman,
B., Yuan, B., Yan, B., Zhang, C., Cosgrove, C., Manning, C. D., Ré, C., Acosta-Navas, D., Hudson, D. A., … Koreeda, Y. (2023).
Holistic Evaluation of Language Models (arXiv:2211.09110). arXiv. https://doi.org/10.48550/arXiv.2211.09110.
Lin, S., Hilton, J. & Evans, O. (2022). TruthfulQA: Measuring How Models Mimic Human Falsehoods (arXiv:2109.07958). arXiv.
https://doi.org/10.48550/arXiv.2109.07958.
Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., Zhang, S., Deng, X., Zeng, A., Du, Z., Zhang,
C., Shen, S., Zhang, T., Su, Y., Sun, H., … Tang, J. (2023). AgentBench: Evaluating LLMs as Agents (arXiv:2308.03688). arXiv.
Luccioni, A. S., Jernite, Y. & Strubell, E. (2023). Power Hungry Processing: Watts Driving the Cost of AI Deployment?
Luo, J., Paduraru, C., Voicu, O., Chervonyi, Y., Munns, S., Li, J., Qian, C., Dutta, P., Davis, J. Q., Wu, N., Yang, X., Chang, C.-
M., Li, T., Rose, R., Fan, M., Nakhost, H., Liu, T., Kirkman, B., Altamura, F., … Mankowitz, D. J. (2022). Controlling Commercial
Cooling Systems Using Reinforcement Learning (arXiv:2211.07357). arXiv. https://doi.org/10.48550/arXiv.2211.07357.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y. & Potts, C. (2011). “Learning Word Vectors for Sentiment Analysis.” In D.
Lin, Y. Matsumoto & R. Mihalcea, eds., Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies: 142–50. Association for Computational Linguistics. https://aclanthology.org/P11-1015.
Melas-Kyriazi, L., Rupprecht, C., Laina, I. & Vedaldi, A. (2023). RealFusion: 360° Reconstruction of Any Object From a Single
Image (arXiv:2302.10663). arXiv. http://arxiv.org/abs/2302.10663.
Mihaylov, T., Clark, P., Khot, T. & Sabharwal, A. (2018). “Can a Suit of Armor Conduct Electricity? A New Dataset for
Open Book Question Answering.” In E. Riloff, D. Chiang, J. Hockenmaier & J. Tsujii, eds., Proceedings of the 2018
Conference on Empirical Methods in Natural Language Processing: 2381–91. Association for Computational Linguistics.
https://doi.org/10.18653/v1/D18-1260.

Mirchandani, S., Xia, F., Florence, P., Ichter, B., Driess, D., Arenas, M. G., Rao, K., Sadigh, D. & Zeng, A. (2023). Large Language
Models as General Pattern Machines (arXiv:2307.04721). arXiv. https://doi.org/10.48550/arXiv.2307.04721.
Mitchell, M., Palmarini, A. B. & Moskvichev, A. (2023). Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning
Tasks (arXiv:2311.09247). arXiv. http://arxiv.org/abs/2311.09247.
Mokady, R., Hertz, A., Aberman, K., Pritch, Y. & Cohen-Or, D. (2022). Null-Text Inversion for Editing Real Images Using Guided
Diffusion Models (arXiv:2211.09794). arXiv. https://doi.org/10.48550/arXiv.2211.09794.
Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J. & Schölkopf, B. (2016). “Distinguishing Cause From Effect Using
Observational Data: Methods and Benchmarks.” The Journal of Machine Learning Research 17, no. 1: 1103–1204.
Nie, A., Zhang, Y., Amdekar, A., Piech, C., Hashimoto, T. & Gerstenberg, T. (2023). MoCa: Measuring Human-Language Model
Alignment on Causal and Moral Judgment Tasks (arXiv:2310.19677). arXiv. http://arxiv.org/abs/2310.19677.
Olabi, A. G., Abdelghafar, A. A., Maghrabie, H. M., Sayed, E. T., Rezk, H., Radi, M. A., Obaideen, K. & Abdelkareem, M. A.
(2023). “Application of Artificial Intelligence for Prediction, Optimization, and Control of Thermal Energy Storage Systems.”
Thermal Science and Engineering Progress, 39: 101730. https://doi.org/10.1016/j.tsep.2023.101730.
OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S.,
Anadkat, S., Avila, R., Babuschkin, I., Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., … Zoph, B. (2024).
GPT-4 Technical Report (arXiv:2303.08774). arXiv. https://doi.org/10.48550/arXiv.2303.08774.
Rafailov, R., Sharma, A., Mitchell, E., Ermon, S., Manning, C. D. & Finn, C. (2023). Direct Preference Optimization: Your
Language Model Is Secretly a Reward Model (arXiv:2305.18290). arXiv. http://arxiv.org/abs/2305.18290.
Rein, D., Hou, B. L., Stickland, A. C., Petty, J., Pang, R. Y., Dirani, J., Michael, J. & Bowman, S. R. (2023). GPQA: A Graduate-
Level Google-Proof Q&A Benchmark (arXiv:2311.12022). arXiv. http://arxiv.org/abs/2311.12022.
Rustia, D. J. A., Chiu, L.-Y., Lu, C.-Y., Wu, Y.-F., Chen, S.-K., Chung, J.-Y., Hsu, J.-C. & Lin, T.-T. (2022). “Towards Intelligent and
Integrated Pest Management Through an AIoT-Based Monitoring System.” Pest Management Science 78, no. 10: 4288–4302.
https://doi.org/10.1002/ps.7048.
Schaeffer, R., Miranda, B. & Koyejo, S. (2023). Are Emergent Abilities of Large Language Models a Mirage? (arXiv:2304.15004).
Schneider, F., Kamal, O., Jin, Z. & Schölkopf, B. (2023). Moûusai: Text-to-Music Generation With Long-Context Latent Diffusion
Shams, S. R., Jahani, A., Kalantary, S., Moeinaddini, M. & Khorasani, N. (2021). “Artificial Intelligence Accuracy Assessment in NO2
Concentration Forecasting of Metropolises Air.” Scientific Reports 11, no. 1: 1805. https://doi.org/10.1038/s41598-021-81455-6.
Shi, Y., Wang, P., Ye, J., Long, M., Li, K. & Yang, X. (2024). MVDream: Multi-View Diffusion for 3D Generation (arXiv:2308.16512).
Soomro, K., Zamir, A. R. & Shah, M. (2012). UCF101: A Dataset of 101 Human Actions Classes From Videos in the Wild
(arXiv:1212.0402; Version 1). arXiv. http://arxiv.org/abs/1212.0402.
Stone, A., Xiao, T., Lu, Y., Gopalakrishnan, K., Lee, K.-H., Vuong, Q., Wohlhart, P., Kirmani, S., Zitkovich, B., Xia, F., Finn, C. &
Hausman, K. (2023). Open-World Object Manipulation Using Pre-trained Vision-Language Models (arXiv:2303.00905). arXiv.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D.,
Blecher, L., Ferrer, C. C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., … Scialom, T. (2023). Llama 2: Open
Foundation and Fine-Tuned Chat Models (arXiv:2307.09288). arXiv. https://doi.org/10.48550/arXiv.2307.09288.

Valmeekam, K., Marquez, M., Olmo, A., Sreedharan, S. & Kambhampati, S. (2023). PlanBench: An Extensible Benchmark
for Evaluating Large Language Models on Planning and Reasoning About Change. Thirty-Seventh Conference on Neural
Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=YXogl4uQUO.
Voynov, O., Bobrovskikh, G., Karpyshev, P., Galochkin, S., Ardelean, A.-T., Bozhcnko, A., Karmanova, E., Kopanev, P., Labutin-
Rymsho, Y., Rakhimov, R., Safin, A., Serpiva, V., Artemov, A., Burnaev, E., Tsetserukou, D. & Zorin, D. (2023). Multi-sensor Large-
Scale Dataset for Multi-view 3D Reconstruction. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 21392–403. https://doi.org/10.1109/CVPR52729.2023.02049.
Walker, C. M. & Gopnik, A. (2014). “Toddlers Infer Higher-Order Relational Principles in Causal Learning.” Psychological Science
25, no. 1: 161–69.
Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L. & Anandkumar, A. (2023). Voyager: An Open-Ended
Embodied Agent With Large Language Models (arXiv:2305.16291). arXiv. http://arxiv.org/abs/2305.16291.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q. & Zhou, D. (2023). Chain-of-Thought Prompting
Elicits Reasoning in Large Language Models (arXiv:2201.11903). arXiv. https://doi.org/10.48550/arXiv.2201.11903.
Xiao, T., Chan, H., Sermanet, P., Wahid, A., Brohan, A., Hausman, K., Levine, S. & Tompson, J. (2023). Robotic Skill Acquisition
via Instruction Augmentation With Vision-Language Models (arXiv:2211.11736). arXiv. https://doi.org/10.48550/arXiv.2211.11736.
Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D. & Chen, X. (2023). Large Language Models as Optimizers
Yang, D., Tian, J., Tan, X., Huang, R., Liu, S., Chang, X., Shi, J., Zhao, S., Bian, J., Wu, X., Zhao, Z., Watanabe, S. & Meng, H.
(2023). UniAudio: An Audio Foundation Model Toward Universal Audio Generation (arXiv:2310.00704). arXiv.
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T. L., Cao, Y. & Narasimhan, K. (2023). Tree of Thoughts: Deliberate Problem
Solving With Large Language Models (arXiv:2305.10601). arXiv. http://arxiv.org/abs/2305.10601.
Zellers, R., Bisk, Y., Farhadi, A. & Choi, Y. (2019). From Recognition to Cognition: Visual Commonsense Reasoning
Zhang, L., Rao, A. & Agrawala, M. (2023). Adding Conditional Control to Text-to-Image Diffusion Models (arXiv:2302.05543).
Zhang, Z., Han, L., Ghosh, A., Metaxas, D. & Ren, J. (2022). SINE: SINgle Image Editing With Text-to-Image Diffusion Models

Index Report 2024 Chapter 3: Responsible AI
Acknowledgments
The AI Index would like to acknowledge Amelia Hardy The keywords searched include:
for her work contributing as a research assistant to
visualizations and supplementary analysis for this Fairness and bias: algorithmic fairness, bias detection,
chapter, and Andrew Shi for his work spearheading bias mitigation, discrimination, equity in AI, ethical
the analysis of responsible AI-related conference algorithm design, fair data practices, fair ML, fairness
submissions. The AI Index also acknowledges that and bias, group fairness, individual fairness, justice,
the Global State of Responsible AI analysis was non-discrimination, representational fairness, unfair,
done in collaboration with Accenture. The AI Index unfairness.
specifically wants to highlight the contributions of Arnab
Privacy and data governance: anonymity,
Chakraborty, Patrick Connolly, Jakub Wiatrak, Ray Eitel-
confidentiality, data breach, data ethics, data
Porter, Dikshita Venkatesh, and Shekhar Tewari to the
governance, data integrity, data privacy, data
data collection and analysis.
protection, data transparency, differential privacy,
inference privacy, machine unlearning, privacy by
Conference Submissions design, privacy-preserving, secure data storage,
Analysis trustworthy data curation.
For the analysis on responsible AI-related conference Security: adversarial attack, adversarial learning, AI
submissions, the AI Index examined the number of incident, attacks, audits, cybersecurity, ethical hacking,
responsible AI–related academic submissions at the forensic analysis, fraud detection, red teaming, safety,
following conferences: AAAI, AIES, FAccT, ICML, security, security ethics, threat detection, vulnerability
ICLR, and NeurIPS. Specifically, the team scraped the assessment.
conference websites or repositories of conference
submissions for papers containing relevant keywords Transparency and explainability: algorithmic
indicating they could fall into a particular responsible transparency, audit, auditing, causal reasoning,
AI category. The papers were then manually verified causality, explainability, explainable AI, explainable
by a human team to confirm their categorization. It is models, human-understandable decisions,
possible that a single paper could belong to multiple interpretability, interpretable models, model
responsible AI categories. explainability, outcome explanation, transparency, xAI.

Consistency of Responsible AI Given the limited scalability of user interviews, the

researchers opted for a questionnaire-based approach
Benchmark Reporting to ensure broad coverage of organizations in different
countries and industries. They contracted McGuire
For each of the analyzed models (GPT-4, Gemini,
Research to run the recruitment and data collection.
Claude 2, Llama 2, Mistral 7B), the AI Index reviewed
The team received more than 15,897 responses from
the official papers published by the model developers
22 countries and 19 industries. The respondents were
at the time of model release for reported academic
asked 10 qualifier questions in the survey. Companies
benchmarks. The AI Index did not consider subsequent
were excluded if their global annual revenue was less
benchmark reports by the model developers or external
than 500 million USD and/or the respondent had no
parties. The AI Index also did not include benchmarks
visibility into the RAI decision-making process of the
on academic or professional exams (e.g., the GRE),
company. Included in the final sample were more
benchmarks for modalities other than text, or internal
than 1,000 organizations. The survey had a total of 38
evaluation metrics.
questions, including the 10 qualifier questions.
Global Responsible State of Below is the full list of measures respondents were
AI Survey asked about in the survey and which were referenced

in the AI Index subchapters. The organizations could
Researchers from Stanford conducted a global answer on a scale from Not applied, Ad-hoc, Rolling
responsible AI (RAI) survey in collaboration with out, or Fully operationalized. The companies were
Accenture. The objective of the questionnaire was further given the option to select Other and provide
to gain an understanding of the current level of RAI information on mitigation measures not listed.
adoption globally and allow for a comparison of RAI
activities across 19 industries and 22 countries. The Fairness measures:
survey is further used to develop an early snapshot •C
ollection of representative data based on the
of current perceptions around the responsible anticipated user demographics
development, deployment, and use of generative AI •M
aking methodology and data sources accessible
and how this might affect RAI adoption and mitigation to third parties (auditors/general public) for
techniques. The survey covers a total of 10 RAI independent oversight
dimensions: Reliability; Privacy and Data Governance; • Involvement of diverse stakeholders in model
Fairness and Nondiscrimination; Transparency and development and/or review process
Explainability; Human Interaction; Societal and • Assessment of performance across different
Environmental Well-Being; Accountability; Leadership/ demographic groups
Principles/Culture; Lawfulness and Compliance; and • Use of technical bias mitigation techniques during
Organizational Governance. Only some of the survey model development
findings are presented in the AI Index, with a more • Other (selection of this option opened an optional
detailed report, the Global State of Responsible AI free-text field)
Report, coming out in May/June 2024.

Data governance measures: Reliability measures:

• Checks to ensure that the data complies with all • Mitigation measures for model errors and handling
relevant laws and regulations and is used with low confidence outputs
consent, where applicable • Failover plans or other measures to ensure the
• Data collection and preparation include assessment system’s/model’s availability
of the completeness, uniqueness, consistency, and • Evaluation of models/systems for vulnerabilities or
accuracy of the data harmful behavior (i.e., red teaming)
• Checks to ensure that the data is representative with • Measures to prevent adversarial attacks
respect to the demographic setting within which the • Confidence scoring for model outputs
final model/system is used • Comprehensive test cases that cover a wide range of
• Regular data audits and updates to ensure the scenarios and metrics
relevancy of the data • Other (selection of this option opened an optional
• Process for dataset documentation and traceability free-text field)
throughout the AI life cycle
• Remediation plans for and documentation of datasets Security measures:
with shortcomings • Basic cybersecurity hygiene practices (e.g.,
• Other (selection of this option opened an optional multifactor authentication, access controls, and
free-text field) employee training)
• Vetting and validation of cybersecurity measures of
Transparency and explainability: third parties in the supply chain
• Documentation of the development process, detailing • Dedicated AI cybersecurity team and/or personnel
algorithm design choices, data sources, intended use explicitly trained for AI-specific cybersecurity
cases, and limitations • Technical AI-specific cybersecurity checks and
• Training programs for stakeholders (incl. users) measures, e.g., adversarial testing, vulnerability
covering the intended use cases and limitations of the assessments, and data security measures
model • Resources dedicated to research and monitoring
• Prioritization of simpler models where high of evolving AI-specific cybersecurity risks and
interpretability is crucial, even if it sacrifices some integration in existing cybersecurity processes
performance • Other (selection of this option opened an optional
• Use model explainability tools (e.g., saliency maps) to free-text field)
elucidate model decisions
• Other (selection of this option opened an optional
free-text field)

Works Cited
Agarwal, A. & Agarwal, H. (2023). “A Seven-Layer Model With Checklists for Standardising Fairness Assessment Throughout
the AI Lifecycle.” AI Ethics. https://doi.org/10.1007/s43681-023-00266-9.
Alawida, M., Abu Shawar, B., Abiodun, O. I., Mehmood, A., Omolara, A. E. & Al Hwaitat, A. K. (2024). “Unveiling the
Dark Side of ChatGPT: Exploring Cyberattacks and Enhancing User Awareness.” Information 15, no. 1: 27.
https://doi.org/10.3390/info15010027.
Andreotta, A. J., Kirkham, N. & Rizzi, M. (2022). “AI, Big Data, and the Future of Consent.” AI & Society 37, no. 4: 1715–28.
https://doi.org/10.1007/s00146-021-01262-5.
Arous, A., Guesmi, A., Hanif, M. A., Alouani, I. & Shafique, M. (2023). Exploring Machine Learning Privacy/Utility Trade-Off
From a Hyperparameters Lens (arXiv:2303.01819). arXiv. http://arxiv.org/abs/2303.01819.
Balasubramaniam, N., Kauppinen, M., Rannisto, A., Hiekkanen, K. & Kujala, S. (2023). “Transparency and Explainability
of AI Systems: From Ethical Guidelines to Requirements.” Information and Software Technology 159: 107197.
https://doi.org/10.1016/j.infsof.2023.107197.
Bommasani, R., Klyman, K., Longpre, S., Kapoor, S., Maslej, N., Xiong, B., Zhang, D. & Liang, P. (2023). The Foundation Model
Transparency Index (arXiv:2310.12941). arXiv. https://doi.org/10.48550/arXiv.2310.12941.
Dhamala, J., Sun, T., Kumar, V., Krishna, S., Pruksachatkun, Y., Chang, K.-W. & Gupta, R. (2021). “BOLD: Dataset and Metrics
for Measuring Biases in Open-Ended Language Generation.” Proceedings of the 2021 ACM Conference on Fairness,
Accountability, and Transparency, 862–72. https://doi.org/10.1145/3442188.3445924.
Durmus, E., Nyugen, K., Liao, T. I., Schiefer, N., Askell, A., Bakhtin, A., Chen, C., Hatfield-Dodds, Z., Hernandez, D.,
Joseph, N., Lovitt, L., McCandlish, S., Sikder, O., Tamkin, A., Thamkul, J., Kaplan, J., Clark, J. & Ganguli, D. (2023).
Towards Measuring the Representation of Subjective Global Opinions in Language Models (arXiv:2306.16388). arXiv.
Ganguli, D., Lovitt, L., Kernion, J., Askell, A., Bai, Y., Kadavath, S., Mann, B., Perez, E., Schiefer, N., Ndousse, K., Jones, A.,
Bowman, S., Chen, A., Conerly, T., DasSarma, N., Drain, D., Elhage, N., El-Showk, S., Fort, S., Clark, J. (2022).
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned (arXiv:2209.07858).
Gehman, S., Gururangan, S., Sap, M., Choi, Y. & Smith, N. A. (2020). RealToxicityPrompts: Evaluating Neural Toxic Degeneration
in Language Models (arXiv:2009.11462). arXiv. https://doi.org/10.48550/arXiv.2009.11462.
Grinbaum, A. & Adomaitis, L. (2024). “Dual Use Concerns of Generative AI and Large Language Models.”
Journal of Responsible Innovation 11, no. 1. https://doi.org/10.1080/23299460.2024.2304381.
Hartvigsen, T., Gabriel, S., Palangi, H., Sap, M., Ray, D. & Kamar, E. (2022). ToxiGen: A Large-Scale Machine-Generated
Dataset for Adversarial and Implicit Hate Speech Detection (arXiv:2203.09509v4). arXiv. http://arxiv.org/abs/2203.09509.
Ippolito, D., Tramèr, F., Nasr, M., Zhang, C., Jagielski, M., Lee, K., Choquette-Choo, C. A. & Carlini, N. (2023).
Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy (arXiv:2210.17546v3). arXiv.
Janssen, M., Brous, P., Estevez, E., Barbosa, L. S. & Janowski, T. (2020). “Data Governance: Organizing Data for Trustworthy
Artificial Intelligence.” Government Information Quarterly 37, no. 3: 101493. https://doi.org/10.1016/j.giq.2020.101493.
Li, B., Sun, J. & Poskitt, C. M. (2023). How Generalizable Are Deepfake Detectors? An Empirical Study (arXiv:2308.04177). arXiv.

Masood, M., Nawaz, M., Malik, K. M., Javed, A., Irtaza, A. & Malik, H. (2023). “Deepfakes Generation and Detection:
State-of-the-Art, Open Challenges, Countermeasures, and Way Forward.” Applied Intelligence 53, no. 4: 3974–4026.
https://doi.org/10.1007/s10489-022-03766-z.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. & Galstyan, A. (2022). “A Survey on Bias and Fairness in Machine Learning.”
ACM Computing Surveys 54, no. 6: 1–35. https://doi.org/10.1145/3457607.
Morreale, F., Bahmanteymouri, E., Burmester, B., Chen, A. & Thorp, M. (2023). “The Unwitting Labourer: Extracting Humanness
in AI Training.” AI & Society. https://doi.org/10.1007/s00146-023-01692-3.
Motoki, F., Pinho Neto, V. & Rodrigues, V. (2024). “More Human Than Human: Measuring ChatGPT Political Bias.”
Public Choice 198, no. 1: 3–23. https://doi.org/10.1007/s11127-023-01097-2.
Nasr, M., Carlini, N., Hayase, J., Jagielski, M., Cooper, A. F., Ippolito, D., Choquette-Choo, C. A., Wallace, E., Tramèr,
F. & Lee, K. (2023). Scalable Extraction of Training Data From (Production) Language Models (arXiv:2311.17035). arXiv.
Omiye, J. A., Lester, J. C., Spichak, S., Rotemberg, V. & Daneshjou, R. (2023). “Large Language Models Propagate Race-Based
Medicine.” npj Digital Medicine 6, no. 1: 1–4. https://doi.org/10.1038/s41746-023-00939-z.
P, D., Simoes, S. & MacCarthaigh, M. (2023). “AI and Core Electoral Processes: Mapping the Horizons.” AI Magazine 44, no. 3:
218–39. https://doi.org/10.1002/aaai.12105.
Pan, A., Chan, J. S., Zou, A., Li, N., Basart, S., Woodside, T., Ng, J., Zhang, H., Emmons, S. & Hendrycks, D. (2023).
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI
Benchmark (arXiv:2304.03279). arXiv. https://doi.org/10.48550/arXiv.2304.03279.
Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M. & Bowman, S. R. (2022). BBQ:
A Hand-Built Bias Benchmark for Question Answering (arXiv:2110.08193). arXiv. https://doi.org/10.48550/arXiv.2110.08193.
Pessach, D. & Shmueli, E. (2023). “Algorithmic Fairness.” In L. Rokach, O. Maimon & E. Shmueli (eds.), Machine Learning for Data
Science Handbook: Data Mining and Knowledge Discovery Handbook: 867–86. https://doi.org/10.1007/978-3-031-24628-9_37.
Petrov, A., La Malfa, E., Torr, P. H. S. & Bibi, A. (2023). Language Model Tokenizers Introduce Unfairness Between Languages
Rudin, C. (2019). “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models
Instead.” Nature Machine Intelligence 1, no. 5: 206–15. https://doi.org/10.1038/s42256-019-0048-x.
Senavirathne, N. & Torra, V. (2020). “On the Role of Data Anonymization in Machine Learning Privacy.” 2020 IEEE 19th
International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom): 664–75.
https://doi.org/10.1109/TrustCom50675.2020.00093.
Sheth, A., Roy, K. & Gaur, M. (2023). Neurosymbolic AI — Why, What, and How (arXiv:2305.00813). arXiv.
Shevlane, T., Farquhar, S., Garfinkel, B., Phuong, M., Whittlestone, J., Leung, J., Kokotajlo, D., Marchal, N., Anderljung, M., Kolt,
N., Ho, L., Siddarth, D., Avin, S., Hawkins, W., Kim, B., Gabriel, I., Bolina, V., Clark, J., Bengio, Y., … Dafoe, A. (2023). Model
Evaluation for Extreme Risks (arXiv:2305.15324). arXiv. http://arxiv.org/abs/2305.15324.
Steinke, T., Nasr, M. & Jagielski, M. (2023). Privacy Auditing With One (1) Training Run (arXiv:2305.08846). arXiv.
Sun, X., Yang, D., Li, X., Zhang, T., Meng, Y., Qiu, H., Wang, G., Hovy, E. & Li, J. (2021). Interpreting Deep Learning Models in
Natural Language Processing: A Review (arXiv:2110.10470). arXiv. http://arxiv.org/abs/2110.10470.

Trinh, L. & Liu, Y. (2021). An Examination of Fairness of AI Models for Deepfake Detection (arXiv:2105.00558). arXiv.
Wang, B., Chen, W., Pei, H., Xie, C., Kang, M., Zhang, C., Xu, C., Xiong, Z., Dutta, R., Schaeffer, R., Truong, S. T., Arora, S.,
Mazeika, M., Hendrycks, D., Lin, Z., Cheng, Y., Koyejo, S., Song, D. & Li, B. (2024). DecodingTrust: A Comprehensive Assessment
of Trustworthiness in GPT Models (arXiv:2306.11698). arXiv. https://doi.org/10.48550/arXiv.2306.11698.
Wang, W., Bai, H., Huang, J., Wan, Y., Yuan, Y., Qiu, H., Peng, N. & Lyu, M. R. (2024). New Job, New Gender?
Measuring the Social Bias in Image Generation Models (arXiv:2401.00763). arXiv. http://arxiv.org/abs/2401.00763.
Wang, Y., Li, H., Han, X., Nakov, P. & Baldwin, T. (2023). Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
Zou, A., Wang, Z., Carlini, N., Nasr, M., Kolter, J. Z. & Fredrikson, M. (2023). Universal and Transferable Adversarial Attacks on
Aligned Language Models (arXiv:2307.15043). arXiv. http://arxiv.org/abs/2307.15043.
Table of Contents Appendix 47 7

Index Report 2024 Chapter 4: Economy
Chapter 4: Economy
Acknowledgments removes duplicates, and extracts data from job
postings text. This includes information on job title,
The AI Index would like to acknowledge James da Costa
employer, industry, and region, as well as required
for his work collecting information on significant AI-
experience, education, and skills.
related economic events, and Emma Williamson for her
work collecting data from the Stack Overflow survey. Job postings are useful for understanding trends in
the labor market because they allow for a detailed,
International Federation of real-time look at the skills employers seek. To assess
Robotics (IFR) the representativeness of job postings data, Lightcast

conducts a number of analyses to compare the
Data presented in the Robot Installations section was distribution of job postings to the distribution of
sourced from the “World Robotics 2023” report. official government and other third-party sources in
the United States. The primary source of government
Lightcast data on U.S. job postings is the Job Openings
Prepared by Cal McKeever, Julia Nitschke, and and Labor Turnover Survey (JOLTS) program,
Layla O’Kane conducted by the Bureau of Labor Statistics. Based
Lightcast delivers job market analytics that empower on comparisons between JOLTS and Lightcast, the
employers, workers, and educators to make data- labor market demand captured by Lightcast data
driven decisions. The company’s artificial intelligence represents over 99% of the total labor demand. Jobs
technology analyzes hundreds of millions of job not posted online are usually in small businesses (the
postings and real-life career transitions to provide classic example being the “Help Wanted” sign in the
insight into labor market patterns. This real-time restaurant window) and union hiring halls.
strategic intelligence offers crucial insights, such

Measuring Demand for AI
as what jobs are most in demand, the specific skills
In order to measure the demand by employers of
employers need, and the career directions that offer
AI skills, Lightcast uses its skills taxonomy of over
the highest potential for workers. For more information,
30,000 skills. The list of AI skills from Lightcast data
visit www.lightcast.io.
are shown below, with associated skill clusters. While
Job Postings Data some skills are considered to be in the AI cluster
To support these analyses, Lightcast mined its dataset specifically, for the purposes of this report, all skills
of millions of job postings collected since 2010. below were considered AI skills. A job posting was
Lightcast collects postings from over 51,000 online job considered an AI job if it mentioned any of these skills
sites to develop a comprehensive, real-time portrait in the job text.
of labor market demand. It aggregates job postings,

Artificial Intelligence: AI/ML Inference, AIOps Natural Language Understanding, Optical Character
(Artificial Intelligence for IT Operations), Applications Recognition (OCR), Screen Reader, Semantic
of Artificial Intelligence, Artificial General Intelligence, Analysis, Semantic Parsing, Semantic Search,
Artificial Intelligence, Artificial Intelligence Sentiment Analysis, Seq2Seq, Speech Recognition,
Development, Artificial Intelligence Markup Language Speech Recognition Software, Speech Synthesis,
(AIML), Artificial Intelligence Systems, Azure Cognitive Statistical Language Acquisition, Text Mining, Text-
Services, Baidu, Cognitive Automation, Cognitive to-Speech, Tokenization, Voice Assistant Technology,
Computing, Computational Intelligence, Cortana, Voice Interaction, Voice User Interface, Word
Ethical AI, Expert Systems, Explainable AI (XAI), Embedding, Word2Vec Models, fastText
IPSoft Amelia, Intelligent Control, Intelligent Systems,
Neural Networks: Apache MXNet, Artificial
Interactive Kiosk, Knowledge Engineering, Knowledge-
Neural Networks, Autoencoders, Caffe2, Chainer
Based Configuration, Knowledge-Based Systems,
(Deep Learning Framework), Convolutional Neural
Multi-agent Systems, Open Neural Network Exchange
Networks, Cudnn, Deep Learning, Deep Learning
(ONNX), OpenAI Gym, Operationalizing AI, Reasoning
Methods, Deeplearning4j, Evolutionary Acquisition
Systems, Watson Conversation, Watson Studio, Weka
of Neural Topologies, Fast.ai, Keras (Neural Network
Autonomous Driving: Advanced Driver Assistance Library), Long Short-Term Memory (LSTM),
Systems, Autonomous Cruise Control Systems, OpenVINO, PaddlePaddle, Recurrent Neural Network
Autonomous System, Autonomous Vehicles, Guidance (RNN), TensorFlow
Navigation and Control Systems, Light Detection and
Machine Learning: AdaBoost (Adaptive Boosting),
Ranging (LiDAR), OpenCV, Path Analysis, Path Finding,
Adversarial Machine Learning, Apache MADlib,
Remote Sensing, Unmanned Aerial Systems (UAS)
Apache Mahout, Apache SINGA, Apache Spark,
Generative Artificial Intelligence: ChatGPT, Association Rule Learning, Automated Machine
Generative Adversarial Networks, Generative Artificial Learning, Autonomic Computing, AWS SageMaker,
Intelligence, Large Language Modeling, Prompt Azure Machine Learning, Boosting, CHi-Squared
Engineering, Variational Autoencoders Automatic Interaction Detection (CHAID),
Classification and Regression Tree (CART), Cluster
Natural Language Processing (NLP): AI Copywriting,
Analysis, Collaborative Filtering, Confusion Matrix,
ANTLR, Amazon Textract, Apache OpenNLP, BERT
Cyber-Physical Systems, Dask (Software), Data
(NLP Model), Chatbot, Computational Linguistics,
Classification, Dbscan, Decision Models, Decision
Conversational AI, Dialog Systems, Fuzzy Logic,
Tree Learning, Dimensionality Reduction, Dlib (C++
Handwriting Recognition, Hugging Face (NLP
Library), Ensemble Methods, Feature Engineering,
Framework), Hugging Face Transformers, Intelligent
Feature Extraction, Feature Learning, Feature
Agent, Intelligent Virtual Assistant, Kaldi, Language
Selection, Gaussian Process, Genetic Algorithm,
Model, Latent Dirichlet Allocation, Lexalytics, Machine
Google AutoML, Gradient Boosting, H2O.ai, Hidden
Translation, Microsoft LUIS, Natural Language
Markov Model, Hyperparameter Optimization,
Generation, Natural Language Processing, Natural
Inference Engine, K-Means Clustering, Kernel
Language Programming, Natural Language Toolkits,

Methods, Kubeflow, Loss Functions, Machine Learning,

Machine Learning Algorithms, Machine Learning
LinkedIn
Prepared by Murat Erer, Carl Shan, and Akash Kaura
Methods, Machine Learning Model Monitoring And
Evaluation, Machine Learning Model Training, Markov Country Sample
Chain, Matrix Factorization, Meta Learning, Microsoft Included countries represent a select sample of
Cognitive Toolkit (CNTK), MLflow, MLOps (Machine eligible countries with at least 40% labor force
Learning Operations), mlpack (C++ Library), ModelOps, coverage by LinkedIn and at least 10 AI hires in
Naive Bayes Classifier, Perceptron, Predictive any given month. India, despite not reaching 40%
Modeling, PyTorch (Machine Learning Library), PyTorch coverage, was included in this sample because of its
Lightning, Random Forest Algorithm, Recommender increasing importance in the global economy.
Systems, Reinforcement Learning, Scikit-Learn (Python
Skills (AI Engineering and AI Literacy skills)
Package), Semi-supervised Learning, Soft Computing,
LinkedIn members self-report their skills on their
Sorting Algorithm, Supervised Learning, Support
LinkedIn profiles. Currently, more than 41,000
Vector Machine, Test Datasets, Theano (Software),
distinct, standardized skills are identified by LinkedIn.
Torch (Machine Learning), Training Datasets, Transfer
These have been coded and classified by taxonomists
Learning, Transformer (Machine Learning Model),
at LinkedIn into 249 skill groupings, which are the skill
Unsupervised Learning, Vowpal Wabbit, Xgboost
groups represented in the dataset.
Robotics: Advanced Robotics, Bot Framework,
Skill groupings are derived by expert taxonomists
Cognitive Robotics, Motion Planning, Nvidia Jetson,
through a similarity-index methodology that
Robot Framework, Robot Operating Systems, Robotic
measures skill composition at the industry level.
Automation Software, Robotic Liquid Handling
LinkedIn’s industry taxonomy and their corresponding
Systems, Robotic Programming, Robotic Systems,
NAICS codes can be found here.
Servomotor, SLAM Algorithms (Simultaneous
Localization and Mapping) This year LinkedIn made updates to the AI skills list
and categorized them into “AI Engineering” and “AI
Visual Image Recognition: 3D Reconstruction,
Literacy” skills. See “AI Skills List Update Compared
Activity Recognition, Computer Vision, Contextual
to Last Year” section for more details.
Image Classification, Digital Image Processing, Eye
Tracking, Face Detection, Facial Recognition, Gesture Skills Genome
Recognition, Image Analysis, Image Matching, Image For any entity (occupation or job, country, sector,
Recognition, Image Segmentation, Image Sensor, etc.), the skill genome is an ordered list (a vector) of
Imagenet, Machine Vision, Motion Analysis, Object the 50 “most characteristic skills” of that entity. These
Recognition, OmniPage, Pose Estimation most characteristic skills are identified using a TF-IDF
algorithm to identify the most representative skills of
the target entity, while down-ranking ubiquitous skills
that add little information about that specific entity
(e.g., Microsoft Word).

TF-IDF is a statistical measure that evaluates how members’ profiles. If four of the skills that engineers
representative a word (in this case, a skill) is to a selected possess belong to the AI skill group, this measure
entity). This is done by multiplying two metrics: indicates that the penetration of AI skills is estimated
1. The term frequency of a skill in an entity (“TF”). to be 8% among engineers (e.g., 4/50).
2. The logarithmic inverse entity frequency of the
Jobs or Occupations
skill across a set of entities (“IDF”). This indicates
LinkedIn member titles are standardized and grouped
how common or rare a word is in the entire entity
into approximately 15,000 occupations. These are
set. The closer IDF is to 0, the more common a
not sector- or country-specific. These occupations
word is.
are further standardized into approximately
So, if the skill is very common across LinkedIn entities, 3,600 occupation representatives. Occupation
and appears in many job or member descriptions, the representatives group occupations with a common
IDF will approach 0. If, on the other hand, the skill is role and specialty, regardless of seniority.
unique to specific entities, the IDF will approach 1.
AI Jobs or Occupations
Details are available at LinkedIn’s Skills Genome and
An “AI” job is an occupation representative
LinkedIn–World Bank Methodology note.
that requires AI skills to perform the job. Skills
AI Skills Penetration penetrations are used as a signal for whether AI skills
The aim of this indicator is to measure the intensity of are prevalent in an occupation representative, in any
AI skills in an entity (in a particular country, industry, sector where the occupation representative may
gender, etc.) through the following methodology: exist. Examples of such occupations include (but are
• Compute frequencies for all self-added skills by not limited to): Machine Learning Engineer, Artificial
LinkedIn members in a given entity (occupation, Intelligence Specialist, Data Scientist, and Computer
industry, etc.) in 2015–2023. Vision Engineer.
• Re-weight skill frequencies using a TF-IDF model AI Talent
to get the top 50 most representative skills in A LinkedIn member is considered AI talent if they
that entity. These 50 skills compose the “skill have explicitly added AI skills to their profile
genome” of that entity. and/or they are occupied in an AI occupation
• Compute the share of skills that belong to representative. The counts of AI talent are used to
the AI skill group out of the top skills in the calculate talent concentration metrics. For example,
selected entity. to calculate the country-level AI talent concentration,
Interpretation: The AI skill penetration rate signals we use the counts of AI talent at the country level
the prevalence of AI skills across occupations, or the vis-a-vis the counts of LinkedIn members in the
intensity with which LinkedIn members utilize AI skills respective countries. Note that concentration metrics
in their jobs. For example, the top 50 skills for the may be influenced by LinkedIn coverage in these
occupation of engineer are calculated based on the countries and should be utilized with caution.
weighted frequency with which they appear in LinkedIn

Relative AI Skills Penetration countries’ male/female AI skill penetrations to the

To allow for skills penetration comparisons across global average of the same gender. Since the global
countries, the skills genomes are calculated and a averages are distinct for each gender, this metric
relevant benchmark is selected (e.g., global average). should only be used to compare country rankings
A ratio is then constructed between a country’s and within each gender, and not for cross-gender
the benchmark’s AI skills penetrations, controlling comparisons within countries.
for occupations.
Interpretation: A country’s AI skills penetration for
Interpretation: A country’s relative AI skills penetration women of 1.5 means that female members in that
of 1.5 indicates that AI skills are 1.5 times as frequent country are 1.5 times more likely to list AI skills than
as in the benchmark, for an overlapping set of the average female member in all countries pooled
occupations. together across the same set of occupations that
exist in the country/gender combination.
Global Comparison
For cross-country comparison, we present the relative Global Comparison: Across Genders
penetration rate of AI skills, measured as the sum of The “Relative AI Skills Penetration Across Genders”
the penetration of each AI skill across occupations metric allows for cross-gender comparisons within
in a given country, divided by the average global and across countries globally, since we compare the
penetration of AI skills across the overlapping countries’ male/female AI skill penetrations to the
occupations in a sample of countries. same global average regardless of gender.
Interpretation: A relative penetration rate of 2 means Relative AI Talent Hiring Rate YoY Ratio
that the average penetration of AI skills in that country • L
inkedIn Hiring Rate or Overall Hiring
is two times the global average across the same set Rate is a measure of hires normalized by
of occupations. LinkedIn membership. It is computed as the
percentage of LinkedIn members who added
Global Comparison: By Industry
a new employer in the same period the job
The relative AI skills penetration by country for industry
began, divided by the total number of LinkedIn
provides an in-depth sectoral decomposition of AI skill
members in the corresponding location.
penetration across industries and sample countries.
• A
I Hiring Rate is computed following the
Interpretation: A country’s relative AI skill penetration
overall hiring rate methodology, but only
rate of 2 in the education sector means that the average
considering members classified as AI talent.
penetration of AI skills in that country is two times the
global average across the same set of occupations in • R
elative AI Talent Hiring Rate YoY Ratio is
that sector. the year-over-year change in AI Hiring Rate
relative to Overall Hiring Rate in the same
Global Comparison: By Gender
country. For each month, we first calculate AI
The “Relative AI Skills Penetration by Gender”
Hiring rate in the country, then divide AI Hiring
metric provides a cross-country comparison of AI
Rate by Overall Hiring Rate in that country,
skill penetrations within each gender, comparing

then calculate YoY change of this ratio, and then and destination countries, are normalized based on
take the 12-month moving average using the last LinkedIn membership in country A at the end of each
12 months. year and multiplied by 10,000. Hence, this metric
indicates relative talent migration from all countries to
Interpretation: In 2023 in India, the ratio of AI talent
and from country A.
hiring relative to overall hiring has grown 16.8% year
over year. AI Skills List Update Compared to Last Year
1. LinkedIn introduced “AI Literacy” skills.
AI Talent Migration
a. The following skills were added to the list and
Data on migration comes from the World Bank Group–
categorized as “AI Literacy” skills: ChatGPT,
LinkedIn “Digital Data for Development” partnership
DALL-E, GPT-3, GPT-4, Generative Art, Github
(please see Zhu et al. [2018]).
Copilot, Google Bard, Midjourney, Prompt
LinkedIn migration rates are derived from the self- Engineering, and Stable Diffusion.
identified locations of LinkedIn member profiles. For
2. LinkedIn updated the former AI skills list and
example, when a LinkedIn member updates his or
categorized them as “AI Engineering” skills:
her location from Paris to London, this is counted as
a. The following skills were excluded from
a migration. Migration data is available from 2019
the list: Alexa, Common Lisp, Data
onward. LinkedIn data provides insights into countries
Structures, Gaussian 03, Graph Theory, IBM
on the AI Talent gained or lost due to migration trends.
Watson, Information Retrieval, Jena, Julia
AI Talent migration is considered for all members
(Programming Language), Linked Data, Lisp,
with AI Skills/holding AI jobs at time t for country A
Pandas (Software), Parallel Algorithms, Perl
as the country of interest and country B as the source
Automation, Resource Description Framework,
of inflows and destination for outflows. Thus, net AI
Smalltalk, and dSPACE.
Talent migration between country A and country B—for
country A—is calculated as follows: b. The following skills were added to the
list: Apache Spark ML, Applied Machine
Learning, Audio Synthesis, Autoencoders,
Automated Clustering, Automated Feature
Engineering, Automated Machine Learning
Net flows are defined as total arrivals minus departures
(AutoML), Autoregressive Models, Chatbot
within the given time period. LinkedIn membership
Development, Chatbots, Concept Drift
varies considerably between countries, which makes
Adaptation, Conditional Generation,
interpreting absolute movements of members from
Conditional Image Generation, Decision
one country to another difficult. To compare migration
Trees, Deep Convolutional Generative
flows between countries fairly, migration flows are
Adversarial Networks (DCGAN), Deep Neural
normalized for the country of interest. For example,
Networks (DNN), Generative AI, Generative
if country A is the country of interest, all absolute net
Adversarial Imitation Learning, Generative
flows into and out of country A, regardless of origin
Adversarial Networks (GANs), Generative

Design Optimization, Generative Flow Models, PostgreSQL database delivery, and so on.
Generative Modeling, Generative Neural Quid applies best-in-class AI and NLP to reveal
Networks, Generative Optimization, Generative hidden patterns in large datasets, enabling users
Pre-training, Generative Query Networks to make data-driven decisions accurately, quickly,
(GQNs), Generative Replay Memory, Generative and efficiently.
Synthesis, Google Cloud AutoML, Graph
Search, Data Sources, and Scope
Embeddings, Graph Networks, Hyperparameter
Over 8 million global public and private company
Optimization, Hyperparameter Tuning, Image
profiles from multiple data sources are indexed to
Generation, Image Inpainting, Image Synthesis,
search across company descriptions, while filtering
Image-to-Image Translation, Large Language
and including metadata ranging from investment
Models (LLM), MLOps, Machine Learning
information to firmographic information, such as
Algorithms, Machine Translation, Meta-learning,
founded year, HQ location, and more. Company
Model Compression, Model Interpretation,
information is updated on a weekly basis. The Quid
Model Training, Music Generation, Neural
algorithm reads a large amount of text data from
Network Architecture Design, Predictive
each document to make links between different
Modeling, Probabilistic Generative Models,
documents based on their similar language. This
Probabilistic Programming, Random Forest,
process is repeated at an immense scale, which
Recurrent Neural Networks (RNN), Responsible
produces a network with different clusters identifying
AI, Style Transfer, StyleGAN, Synthetic
distinct topics or focus areas. Trends are identified
Data Generation, Text Generation, Text-to-
based on keywords, phrases, people, companies,
Image Generation, Time Series Forecasting,
and institutions that Quid identifies, and the other
Transformer Models, Variational Autoencoders,
metadata that is put into the software.
Variational Autoencoders (VAEs), Video
Generation, and k-means clustering. Data
Companies
Quid Organization data is embedded from Capital IQ and

Crunchbase. These companies include all types of
Quid Insights prepared by Bill Valle and Heather English organizations (private, public, operating, operating as
a subsidiary, out of business) throughout the world.
Quid uses its own in-house LLM and other smart
The investment data includes private investments,
search features, as well as traditional Boolean query,
M&A, public offerings, minority stakes made by PE/
to search for focus areas, topics, and keywords within
VCs, corporate venture arms, governments, and
many datasets: social media, news, forums and blogs,
institutions both within and outside the United States.
companies, patents, as well as other custom feeds of
Some data is simply unreachable—for instance, when
data (e.g., survey data). Quid has many visualization
investors’ names or funding amounts are undisclosed.
options and data delivery endpoints, including network
graphs based on semantic similarity, in-platform Quid embeds Capital IQ data as a default and
dashboarding capabilities, as well as programmatic adds in data from Crunchbase for the data points

that are not captured in Capital IQ. This not only AI” OR “generative artificial intelligence”) for
yields comprehensive and accurate data on all global companies that have received over $1.5M for
organizations, but it also captures early-stage startups the last 10 years (January 1, 2013, to December
and funding events data. Company information is 31, 2023).
updated on a weekly basis.
Target Event Definitions
Earnings Calls • P
rivate investments: A private placement is a
Quid leverages earnings call transcript data embedded private sale of newly issued securities (equity
from Seeking Alpha. For this report, Quid has analyzed or debt) by a company to a selected investor or
mentions of AI-related keywords across all earnings call a selected group of investors. The stakes that
transcripts from Fortune 500 companies from January buyers take in private placements are often
2018 through December 2023. New earnings call minority stakes (under 50%), although it is
transcript data is updated in Quid on the 1st and 15th of possible to take control of a company through
every month. a private placement as well, in which case the
private placement would be a majority stake
Search Parameters
investment.
Boolean query is used to search for focus areas, topics,
• M
inority investment: These refer to minority
and keywords within the archived company database,
stake acquisitions in Quid, which take place
within their business descriptions and websites. We can
when the buyer acquires less than 50% of
filter out the search results by HQ regions, investment
the existing ownership stake in entities, asset
amount, operating status, organization type (private/
products, and business divisions.
public), and founding year. Quid then visualizes these
• M
&A: This refers to a buyer acquiring more
companies by semantic similarity. If there are more than
than 50% of the existing ownership stake in
7,000 companies from the search result, Quid selects
entities, asset products, and business divisions.
the 7,000 most relevant companies for visualization
based on the language algorithm.
Boolean search: “artificial intelligence” or “AI” or

McKinsey & Company
“machine learning” or “deep learning” Data used in the Corporate Activity–Industry
Adoption section was sourced from the McKinsey
Companies
Global Survey “The State of AI in 2023: Generative
• Global AI and ML companies that have received
AI’s Breakout Year.”
investments (private, IPO, M&A) from January 1,
2013, to December 31, 2023. The online survey was in the field April 11, 2023, to
• Global AI and ML companies that have received April 21, 2023, and garnered responses from 1,684
over $1.5M for the last 10 years (January 1, 2013, participants representing the full range of regions,
to December 31, 2023). industries, company sizes, functional specialties,
• Global data was also pulled for a Generative AI and tenures. Of those respondents, 913 said their
query (Boolean search: “generative AI” OR “gen organizations had adopted AI in at least one function

and were asked questions about their organization’s AI

use. To adjust for differences in response rates, the data
is weighted by the contribution of each respondent’s
nation to global GDP.
The AI Index also considered data from previous

iterations of the survey. More specifically, the AI index
made use of data from:
The State of AI in 2022—and a Half Decade in Review
The State of AI in 2021
The State of AI in 2020
AI Proves Its Worth, But Few Scale Impact (2019)
AI Adoption Advances, But Foundational Barriers

Remain (2018)
Stack Overflow
Data on the use of AI by developers was sourced from
the 2023 Developer Survey. The survey was conducted
from May 8, 2023, to May 19, 2023, and incorporates
the insights of 89,184 software developers from 185
countries around the world.

Works Cited
Brynjolfsson, E., Li, D. & Raymond, L. R. (2023). Generative AI at Work (Working Paper 31161). National Bureau of Economic
Research. https://doi.org/10.3386/w31161.
Cambon, A., Hecht, B., Edelman, B., Ngwe, D., Jaffe, S., Heger, A., Peng, S., Hofman, J., Farach, A., Bermejo-Cano, M.,
Knudsen, E., Sanghavi, H., Spatharioti, S., Rothschild, D., Goldstein, D. G., Kalliamvakou, E., Cihon, P., Demirer, M., Schwarz, M.
& Teevan, J. (2023). Early LLM-Based Tools for Enterprise Information Workers Likely Provide Meaningful Boosts to Productivity.
https://www.microsoft.com/en-us/research/uploads/prod/2023/12/AI-and-Productivity-Report-First-Edition.pdf.
Choi, J. H., Monahan, A. & Schwarcz, D. (2023). “Lawyering in the Age of Artificial Intelligence.” Minnesota Law Review
(forthcoming). https://doi.org/10.2139/ssrn.4626276.
Chui, M., Hazan, E., Roberts, R., Singla, A., Smaje, K., Sukharevsky, A., Yee, L. & Zemmel, R. (2023). The Economic Potential
of Generative AI: The Next Productivity Frontier. McKinsey & Company. https://www.mckinsey.com/capabilities/mckinsey-
digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier.
Chui, M., Yee, L., Hall, B., Singla, A. & Sukharevsky, A. (2023). The State of AI in 2023: Generative AI’s Breakout Year.
McKinsey & Company. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-in-2023-
generative-ais-breakout-year#widespreadhttp://ceros.mckinsey.com/commentary-ai-2023-lareina-ye-desktop.
Dell’Acqua, F. (2022). “Falling Asleep at the Wheel: Human/AI Collaboration in a Field Experiment on HR Recruiters.”
Laboratory for Innovation Science, Harvard Business School, working paper. https://static1.squarespace.com/
static/604b23e38c22a96e9c78879e/t/62d5d9448d061f7327e8a7e7/1658181956291/Falling+Asleep+at+the+Wheel+-
+Fabrizio+DellAcqua.pdf.
Dell’Acqua, F., McFowland, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F. & Lakhani,
K. R. (2023). “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on Knowledge
Worker Productivity and Quality.” Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-013.
https://doi.org/10.2139/ssrn.4573321.
Hatzius, J., Briggs, J., Kodnani, D. & Pierdomenico, G. (2023). The Potentially Large Effects of Artificial Intelligence on Economic
Growth. Goldman Sachs. https://www.gspublishing.com/content/research/en/reports/2023/03/27/d64e052b-0f6e-45d7-
967b-d7be35fabd16.html.

Index Report 2024 Chapter 5: Science and Medicine

Acknowledgments
The AI Index would like to acknowledge Emma
Williamson for her work surveying the literature on
significant AI-related science and medicine trends.
Benchmarks
1. MedQA: Data on MedQA was taken from the
MedQA Papers With Code leaderboard in January
2024. To learn more about MedQA, please read the
original paper.
FDA-Approved AI-Medical
Devices
Data on FDA-approved AI-medical devices is
sourced from the FDA website that tracks artificial
intelligence and machine learning (AI/ML)–enabled
medical devices.

Works Cited
Cao, K., Xia, Y., Yao, J., Han, X., Lambert, L., Zhang, T., Tang, W., Jin, G., Jiang, H., Fang, X., Nogues, I., Li, X., Guo, W., Wang, Y.,
Fang, W., Qiu, M., Hou, Y., Kovarnik, T., Vocka, M., Lu, J. (2023). “Large-Scale Pancreatic Cancer Detection via Non-contrast
CT and Deep Learning.” Nature Medicine 29, no. 12: 3033–3043. https://doi.org/10.1038/s41591-023-02640-w.
Chen, Z., Cano, A. H., Romanou, A., Bonnet, A., Matoba, K., Salvi, F., Pagliardini, M., Fan, S., Köpf, A., Mohtashami, A.,
Sallinen, A., Sakhaeirad, A., Swamy, V., Krawczuk, I., Bayazit, D., Marmet, A., Montariol, S., Hartley, M.-A., Jaggi, M. & Bosselut,
A. (2023). MEDITRON-70B: Scaling Medical Pretraining for Large Language Models (arXiv:2311.16079). arXiv.
Cheng, J., Novati, G., Pan, J., Bycroft, C., Žemgulytė, A., Applebaum, T., Pritzel, A., Wong, L. H., Zielinski, M., Sargeant, T.,
Schneider, R. G., Senior, A. W., Jumper, J., Hassabis, D., Kohli, P. & Avsec, Ž. (2023). “Accurate Proteome-Wide Missense
Variant Effect Prediction With AlphaMissense.” Science 381. https://doi.org/10.1126/science.adg7492.
Cid, Y. D., Macpherson, M., Gervais-Andre, L., Zhu, Y., Franco, G., Santeramo, R., Lim, C., Selby, I., Muthuswamy, K., Amlani,
A., Hopewell, H., Indrajeet, D., Liakata, M., Hutchinson, C. E., Goh, V. & Montana, G. (2024). “Development and Validation
of Open-Source Deep Neural Networks for Comprehensive Chest X-Ray Reading: A Retrospective, Multicentre Study.”
The Lancet Digital Health 6, no. 1: e44–e57. https://doi.org/10.1016/S2589-7500(23)00218-2.
Fleming, S. L., Lozano, A., Haberkorn, W. J., Jindal, J. A., Reis, E. P., Thapa, R., Blankemeier, L., Genkins, J. Z., Steinberg,
E., Nayak, A., Patel, B. S., Chiang, C.-C., Callahan, A., Huo, Z., Gatidis, S., Adams, S. J., Fayanju, O., Shah, S. J., Savage, T.,
… Shah, N. H. (2023). MedAlign: A Clinician-Generated Dataset for Instruction Following With Electronic Medical Records
Ha, T., Lee, D., Kwon, Y., Park, M. S., Lee, S., Jang, J., Choi, B., Jeon, H., Kim, J., Choi, H., Seo, H.-T., Choi, W., Hong, W., Park,
Y. J., Jang, J., Cho, J., Kim, B., Kwon, H., Kim, G., … Choi, Y.-S. (2023). “AI-Driven Robotic Chemist for Autonomous Synthesis
of Organic Molecules.” Science Advances 9, no. 44. https://doi.org/10.1126/sciadv.adj0461.
Iglesias, J. E., Billot, B., Balbastre, Y., Magdamo, C., Arnold, S. E., Das, S., Edlow, B. L., Alexander, D. C., Golland, P. & Fischl, B.
(2023). “SynthSR: A Public AI Tool to Turn Heterogeneous Clinical Brain Scans into High-Resolution T1-Weighted Images for 3D
Morphometry.” Science Advances 9, no. 5. https://doi.org/10.1126/sciadv.add3607.
Jin, D., Pan, E., Oufattole, N., Weng, W.-H., Fang, H. & Szolovits, P. (2020). What Disease Does This Patient Have?
A Large-Scale Open Domain Question Answering Dataset From Medical Exams (arXiv:2009.13081; Version 1). arXiv.
Kavungal, D., Magalhães, P., Kumar, S. T., Kolla, R., Lashuel, H. A. & Altug, H. (2023). “Artificial Intelligence–Coupled Plasmonic
Infrared Sensor for Detection of Structural Protein Biomarkers in Neurodegenerative Diseases.” Science Advances 9, no. 28.
https://doi.org/10.1126/sciadv.adg9644.
Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu,
W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S. & Battaglia, P. (2023). “Learning Skillful
Medium-Range Global Weather Forecasting.” Science 382. https://doi.org/10.1126/science.adi2336.
Liao, W.-W., Asri, M., Ebler, J., Doerr, D., Haukness, M., Hickey, G., Lu, S., Lucas, J. K., Monlong, J., Abel, H. J., Buonaiuto, S.,
Chang, X. H., Cheng, H., Chu, J., Colonna, V., Eizenga, J. M., Feng, X., Fischer, C., Fulton, R. S., … Paten, B. (2023). “A Draft
Human Pangenome Reference.” Nature 617: 312–24. https://doi.org/10.1038/s41586-023-05896-x.
Mankowitz, D. J., Michi, A., Zhernov, A., Gelmi, M., Selvi, M., Paduraru, C., Leurent, E., Iqbal, S., Lespiau, J.-B., Ahern, A., Köppe,
T., Millikin, K., Gaffney, S., Elster, S., Broshear, J., Gamble, C., Milan, K., Tung, R., Hwang, M., … Silver, D. (2023). “Faster Sorting
Algorithms Discovered Using Deep Reinforcement Learning.” Nature 618: 257–63. https://doi.org/10.1038/s41586-023-06004-9.
Merchant, A., Batzner, S., Schoenholz, S. S., Aykol, M., Cheon, G. & Cubuk, E. D. (2023). “Scaling Deep Learning for Materials

Discovery.” Nature 624: 80–85. https://doi.org/10.1038/s41586-023-06735-9.
Nearing, G., Cohen, D., Dube, V., Gauch, M., Gilon, O., Harrigan, S., Hassidim, A., Klotz, D., Kratzert, F., Metzger, A., Nevo, S.,
Pappenberger, F., Prudhomme, C., Shalev, G., Shenzis, S., Tekalign, T., Weitzner, D. & Matias, Y. (2023). AI Increases Global
Access to Reliable Flood Forecasts (arXiv:2307.16104). arXiv. http://arxiv.org/abs/2307.16104.
Nori, H., Lee, Y. T., Zhang, S., Carignan, D., Edgar, R., Fusi, N., King, N., Larson, J., Li, Y., Liu, W., Luo, R., McKinney, S. M.,
Ness, R. O., Poon, H., Qin, T., Usuyama, N., White, C. & Horvitz, E. (2023a). Can Generalist Foundation Models Outcompete
Special-Purpose Tuning? Case Study in Medicine (arXiv:2311.16452; Version 1). arXiv. http://arxiv.org/abs/2311.16452.
Schopf, C. M., Ramwala, O. A., Lowry, K. P., Hofvind, S., Marinovich, M. L., Houssami, N., Elmore, J. G., Dontchos, B. N., Lee, J.
M. & Lee, C. I. (2024). “Artificial Intelligence-Driven Mammography-Based Future Breast Cancer Risk Prediction: A Systematic
Review.” Journal of the American College of Radiology 21, no. 2: 319–28. https://doi.org/10.1016/j.jacr.2023.10.018.
Shen, T., Munkberg, J., Hasselgren, J., Yin, K., Wang, Z., Chen, W., Gojcic, Z., Fidler, S., Sharp, N. & Gao, J. (2023).
“Flexible Isosurface Extraction for Gradient-Based Mesh Optimization.” ACM Transactions on Graphics 42, no. 4: 1–16.
https://doi.org/10.1145/3592430.
Thadani, N. N., Gurev, S., Notin, P., Youssef, N., Rollins, N. J., Ritter, D., Sander, C., Gal, Y. & Marks, D. S. (2023).
“Learning From Prepandemic Data to Forecast Viral Escape.” Nature 622: 818–25. https://doi.org/10.1038/s41586-023-06617-0.

Index Report 2024 Chapter 6: Education
Code.org
State-Level Data Computing Research Association (CRA) members
The following link includes a full description of the are 200-plus North American organizations active
methodology used by Code.org to collect its data. The in computing research: academic departments
staff at Code.org also maintains a database of the state of computer science and computer engineering;
of American K–12 education and, in this policy primer, laboratories and centers in industry, government, and
provides a greater amount of detail on the state of academia; and affiliated professional societies (AAAI,
American K–12 education in each state. ACM, CACS/AIC, IEEE Computer Society, SIAM
USENIX). CRA’s mission is to enhance innovation
AP Computer Science Data
by joining with industry, government, and academia
The AP Computer Science data is provided to Code.org
to strengthen research and advance education in
as per an agreement the College Board maintains with
computing. Learn more about CRA here.
Code.org. The AP Computer Science data comes from
the college board’s national and state summary reports. The CRA Taulbee Survey gathers survey data during the
fall of each academic year by reaching out to over 200
Access to Computer Science Education
PhD-granting departments. Details about the Taulbee
Data on access to computer science education was
Survey can be found here. Taulbee doesn’t directly
drawn from Code.org’s State of Computer Science
survey the students. The department identifies each
Education 2023 report.
new PhD’s area of specialization as well as their type
of employment. Data is collected from September to
Computing Research January of each academic year for PhDs awarded in the
Association previous academic year. Results are published in May

after data collection closes.
(CRA Taulbee Survey)
The CRA Taulbee Survey is sent only to doctoral
Note: This year’s AI Index reused the methodological departments of computer science, computer
notes that were submitted by the CRA for previous engineering, and information science/systems.
editions of the AI Index. For more complete delineations Historically, (a) Taulbee covers one-quarter to one-third
of the methodology used by the CRA, please consult the of total BS CS recipients in the United States; (b) the
individual CRA surveys that are linked below. percentage of women earning bachelor’s degrees
is lower in the Taulbee schools than overall; and (c)
Taulbee tracks the trends in overall CS production.

The AI Index used data from the following iterations of and concepts provided by these national agencies
the CRA survey: and reflects the national situation in the countries
considered. Those aspects that are not exposed by the
CRA, 2022
consulted agencies are not part of the dataset. A full list
CRA, 2021
of definitions and concepts used can be found in the
CRA, 2020
footnotes shown at the bottom of the Statistics section.
CRA, 2019
CRA, 2018 Since each national data repository has its own
CRA, 2017 structure and quite often provides all supporting
CRA, 2016 information in the national language, Informatics
CRA, 2015 Europe consults with its members—academics,
CRA, 2014 active and knowledgeable in the informatics field
CRA, 2013 from respective countries—who help to interpret the
CRA, 2012 statistics available and who understand the specificities
CRA, 2011 of these countries’ higher education systems. One
of the main challenges in integrating the statistical
Impact Research data is the identification of terms used to define the

informatics discipline in different countries. Informatics
Data on the usage of ChatGPT in schools among is known under different names in different European
teachers and students came from two Impact Research languages and countries, and in English as well. A good
surveys released in 2023. To learn more about the dozen terms (presented in the Data Portal section
methodology employed for the surveys, please visit the “Subjects”) are used to denote what is fundamentally
following links: March 2023 and July 2023. the same discipline, and the role of national experts
here is to help with screening the terms and programs
Informatics Europe and identifying which part of them is pertinent to the
The statistics are annually collected by Informatics informatics field.
Europe and published on the Informatics Europe The data covers the degrees delivered by both
Higher Education Data Portal, which is updated with traditional Research Universities (RU) and University of
the most recent data at the end of the year (typically Applied Science (UAS) for the countries where these
in December). In the interest of reliability, Informatics institutions also offer bachelor’s and master’s studies
Europe collects the data from countries where a solid in informatics. The full list of institutions covered can
and reasonably complete picture could be drawn be found in the Data Portal section “Institutions &
from official sources such as national statistical Academic Units.”
offices, educational agencies, or ministries. The full
list of sources can be found in the Data Portal section
“Data Sources.” The Data Portal follows the definitions

Studyportals Studyportals categorizes the study programs on its

portals into disciplines and subdisciplines. The 15
Studyportals is the world’s most comprehensive study disciplines are broad categories of educational fields
choice platform. It lists over 200,000 English-taught to help navigate the portals. The 284 subdisciplines
programs from more than 3,500 institutions, helping are narrower topics, subdivisions, or specialized
over 50 million students per year. The Studyportals areas of disciplines. Paying clients can provide input,
analytics and consulting team uses the resulting data but ultimately, data processors manually choose the
to provide higher education organizations with real- one-to-three closest fitting subdisciplines according
time market insights. to the following scenarios, listed in decreasing order
of likelihood:
1. Classic scenario: When the study name closely matches one subdiscipline name.
a. “Chemistry” -> subdiscipline Chemistry.
2. C
lassic interdisciplinary scenario: When the study name closely matches two or three subdiscipline names.
“International Fashion Management and Marketing” -> subdisciplines Fashion Management + Marketing.
a.
3. Specializations scenario: When not all the subdisciplines are mentioned in the study name, but they are
listed as specific specializations, concentrations, or tracks.
“Business Administration with specializations in Finance and International Business” -> subdisciplines
a.
Business Administration + Finance + International Business
4. Mixed scenario: When the study name does not closely match any specific subdiscipline but can be
represented by combining two or three subdisciplines.
“Financial Economics” -> subdisciplines Finance + Economics
a.
5. Last resort scenario: When the study name does not closely match any specific subdiscipline and/or
combination, it is instead approximated as closely as possible.
a. “UK, EU, and US Copyright Law” -> subdisciplines Patent & Intellectual Property Law + International Law
+ European Law

Example assigned
Scenario Example study name
subdisciplines
Study name closely matches one BSc Chemistry Chemistry

subdiscipline name.
Study name closely matches two or BA International Fashion Fashion Management +

three subdiscipline names. Management and Marketing Marketing
Not all the subdisciplines are mentioned MBA Business Administration with Business
in the study name, but they are listed as specializations in Finance and Administration +
specific specializations, concentrations, International Business Finance + International
or tracks. Business
Study name does not closely match MSc Financial Economics Finance + Economics
any specific subdiscipline but can be
represented by combining two or three
subdisciplines.
Study name does not closely match LLM UK, EU, and US Copyright Law Patent and Intellectual
any specific subdiscipline and/ Property Law +
or combination and is instead International Law +
approximated as closely as possible. European Law

Index Report 2024 Chapter 7: Policy and Governance

Acknowledgments SAR, China, Madagascar, Malaysia, Maldives, Malta,
Mauritius, Mexico, Moldova, Netherlands, New
The AI Index would like to acknowledge Simba Jonga Zealand, Northern Mariana Islands, Norway, Pakistan,
for his work collecting information on significant Panama, Papua New Guinea, Philippines, Poland,
AI policy events and conducting a survey of AI Portugal, Romania, Russia, Samoa, San Marino,
national strategies. Additionally, the Index would like Seychelles, Sierra Leone, Singapore, Slovenia, South
to acknowledge the efforts of Ethan Duncan He-Li Africa, South Korea, Spain, Sri Lanka, Sweden,
Hellman, Julia Betts Lotufo, Alexandra Rome, and Switzerland, Tanzania, Trinidad and Tobago, Ukraine,
Emma Williamson in collecting, coding, and analyzing United Kingdom, United States, Uruguay, Zambia,
AI-related legislation and regulations. The Index is Zimbabwe
also grateful for the guidance provided by Caroline
Meinhardt on AI legislation and regulation tracking.
Global Legislation Records
Global AI Mentions on AI
For mentions of AI in AI-related legislative proceedings For AI-related bills passed into laws, the AI Index
around the world, the AI Index performed searches of performed searches of the keyword “artificial
the keyword “artificial intelligence” on the websites intelligence” on the websites of 128 countries’
of 82 countries’ congresses or parliaments (in the congresses or parliaments (in the respective
respective languages), usually under sections named languages) in the full text of bills. Note that only laws
“minutes,” “hansard,” etc. In some cases, databases passed by state-level legislative bodies and signed
were only searchable by title, so site search functions into law (i.e., by presidents or through royal assent)
were deployed. The AI Index team surveyed the from 2016 to 2023 are included. Laws that were
following databases: approved but then repealed are not included in the
Andorra, Angola, Armenia, Australia, Azerbaijan, analysis. In some cases, there were databases that
Barbados, Belgium, Bermuda, Bhutan, Brazil, Cabo, were only searchable by title, so site search functions
Verde, Canada, Cayman Islands, China,12 Czech were deployed. Future AI Index reports hope to
Republic, Denmark, Dominican Republic, Ecuador, El include analysis on other types of legal documents,
Salvador, Estonia, Fiji, Finland, France, The Gambia, such as regulations and standards, adopted by state-
Germany, Gibraltar, Greece, Hong Kong, Iceland, India, or supranational-level legislative bodies, government
Ireland, Isle of Man, Israel, Italy, Japan, Kenya, Kosovo, agencies, etc. The AI Index team surveyed databases
Latvia, Lesotho, Liechtenstein, Luxembourg, Macao for the following countries:
12 The National People’s Congress is held once per year and does not provide full legislative proceedings. Hence, the counts included in the analysis only searched mentions of “artificial
intelligence” in the only public document released from the Congress meetings, the Report on the Work of the Government, delivered by the premier.

Albania, Algeria, American Samoa, Andorra, Angola,

EU AI Regulation
Antigua and Barbuda, Argentina, Armenia, Australia
Austria, Azerbaijan The Bahamas, Bahrain, Bangladesh, The AI Index also gathered information on AI-related
Barbados, Belarus, Belgium, Belize, Bermuda, regulations enacted in the European Union between
Bhutan, Bolivia, Brazil, Brunei, Bulgaria, Burkina Faso, 2017 and 2023. To compile this data, the Index
Cameroon, Canada, Cayman Islands, Chile, China, team conducted a keyword search for “artificial
Colombia, Croatia, Cuba, Curacao, Cyprus, Czech intelligence” on EUR-Lex, a comprehensive database
Republic, Denmark, Estonia, Faroe Islands, Fiji, Finland, of EU legislation, regulations, and case law. EUR-
France, The Gambia, Georgia, Germany, Gibraltar, Lex provides access to a wide range of regulatory
Greece, Greenland, Grenada, Guam, Guatemala, documents, such as legal acts, consolidated texts,
Guyana, Hong Kong, Hungary, Iceland, India, Iran international agreements, preparatory documents,
Islamic Republic, Iraq, Ireland, Isle of Man, Israel, Italy, and legislative procedures. The analysis in this
Jamaica, Japan, Kazakhstan, Kenya, Kiribati, Korea section focused exclusively on documents with
Republic, Kosovo, Kyrgyz Republic, Latvia, Lebanon, binding regulatory authority. The search for AI-related
Liechtenstein, Lithuania, Luxembourg, Macao SAR regulation in the European Union was limited to legal
China, Malawi, Malaysia, Malta, Mauritius, Mexico, acts, international agreements, and consolidated texts.
Monaco, Montenegro, Morocco, Mozambique, Nauru,
The Netherlands, New Zealand, Nicaragua, Niger, The regulation was then coded by a team of two
Northern Marina Islands, Norway, Panama, Papua New human coders for: (1) relevance to AI, (2) regulatory
Guinea, Philippines, Poland, Portugal, Romania, Russia, approach, and (3) subject matter. The relevance
Samoa, Saudi Arabia, Serbia, Seychelles, Sierra Leone, to AI categories were low, medium, and high. The
Singapore, Slovak Republic, Slovenia, South Africa, regulatory approach categories were expansive or
Spain, Sri Lanka, St. Kitts and Nevis, Suriname, Sweden, restrictive. For the subject matter categories, the
Switzerland, Tajikistan, Tanzania, Togo, Tongo, Turkey, Index employed the Congress policy typology. In
Tuvalu, Uganda, Ukraine, United Arab Emirates, United cases where there were disagreements on the coding
Kingdom, United States, Uruguay, Vietnam, Yemen, schemas, a third coder was brought in to settle the
Zambia, Zimbabwe differences.
The legislation was then coded by a team of two Federal Budget for
human coders for: (1) relevance to AI, (2) regulatory
approach, and (3) subject matter. The relevance to AI Nondefense AI R&D
categories were low, medium, and high. The regulatory Data on the federal U.S. budget for nondefense
approach categories were expansive or restrictive. For AI R&D was taken from previous editions of the
the subject matter categories, the Index employed the AI Index (namely the 2021 and 2022 versions) and
Congress policy typology. In cases where there were from the following National Science and Technology
disagreements on the coding schemas, a third coder Council reports:
was brought in to settle the differences.

Supplement to the President’s FY 2024 Budget terms that yield erroneous results, Govini delivers
Supplement to the President’s FY 2023 Budget a comprehensive yet discriminant taxonomy of
Supplement to the President’s FY2022 Budget subsegments that are mutually exclusive. Repeated
keyword searches and filters allow a consensus,
data-driven taxonomy to come into focus. Govini
Govini SMEs conduct final review of taxonomic structure to
Govini is a defense technology company. Ark, Govini’s complement this iterative, data-driven process.
flagship software, is a suite of AI-enabled applications,
powered by integrated government and commercial The use of artificial intelligence (AI) and supervised
data, that accelerate the Defense Acquisition Process. ML models enables analysis of the large volumes
of irregular data contained in federal contracts—
With Ark, the acquisition community eliminates data that often is inaccessible through regular
slow, manual processes and gains the ability government reporting processes or human-intensive
to rapidly imagine, produce, and field critical analytical approaches.
warfighting capabilities. Analysts and decision-
makers are equipped to solve challenges across the Moreover, beyond simply making usable an
entire spectrum of Defense Acquisition, including expansive body of data sources, Govini’s Ark
Supply Chain, Science and Technology, Production, platform and National Security Knowledge Graph
Sustainment, and Modernization. establishes high-fidelity standards in categorized and
fused data to produce a comprehensive and accurate
Govini curated USG AI spend data from their annual depiction of federal spending, and the supporting
Scorecard Taxonomy by applying supervised machine vendor ecosystem, over time.
learning (ML) and natural language processing (NLP)
techniques to parse, analyze, and categorize large
volumes of federal contracts data, including prime
National AI Strategies
contracts, grants, and other transaction authority (OTA) The AI Index did a web search to identify national
awards. Govini’s most recent Scorecard focused on strategies on AI. Below is a list of countries that were
Critical Technologies, of which AI/ML Technologies and identified as having a national AI strategy, including a
Microelectronics were segments. The AI/ML segment link to said strategy. For certain counties, noted with
consisted of five subsegments: Data Integration, an asterisk (*), the actual strategy was not found, and
Computer Vision, Machine Learning, Autonomy, a news article confirming the launch of the strategy
and Natural Language Processing. Microelectronics was linked instead.
is divided into two subsegments: Memory and
Processing, and Semiconductors. By initially generating
search terms and then subsequently excluding specific

Countries with AI Strategies in Place The regulation was then coded by a team of two
Algeria,* Argentina, Azerbaijan,* Australia, Austria, human coders for: (1) relevance to AI, (2) regulatory
Bahrain, Bangladesh, Benin,* Botswana,* Brazil, approach, and (3) subject matter. The relevance
Belgium,* Bulgaria, Canada, Chile, China, Colombia, to AI categories were low, medium, and high. The
Croatia, Cyprus, Czech Republic, Denmark, Dominican regulatory approach categories were expansive or
Republic,* Egypt, Arab Republic, Ethiopia, Estonia, restrictive. For the subject matter categories, the
Finland, France, Germany, Ghana, Greece, Hong Kong, Index employed the Congress policy typology.
Hungary, India, Indonesia, Iran,* Iraq,* Ireland, Israel,* In cases where there were disagreements on the
Italy, Japan, Jordan,* Kenya, Korea Republic, Latvia, coding schemas, a third coder was brought in to
Lithuania, Luxembourg, Malta, Malaysia, Mauritius, settle differences.
Mexico, The Netherlands, North Korea, Norway, Peru,
Philippines, Poland, Portugal, Qatar, Romania, Russia,
Rwanda, Saudi Arabia, Serbia, Sierra Leone, Singapore,
US Department of Defense
Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Budget Requests
Thailand, Tunisia,* Turkey, Ukraine, United Arab
Data on the DoD nonclassified AI-related budget
Emirates, United Kingdom, United States, Uruguay,
requests was taken from previous editions of the AI
Vietnam
Index (namely the 2021 and 2022 versions) and from
the following reports:
Countries with AI Strategies in Development
Andorra,* Antigua and Barbuda,* Barbados,* Armenia,*
Defense Budget Overview United States Department
Belarus,* Costa Rica,* Cuba,* Iceland, Jamaica,*
of Defense Fiscal Year 2024 Budget Request
Kenya, Morocco, New Zealand,* Nigeria,* Pakistan,*
Senegal,* Uzbekistan
of Defense Fiscal Year 2023 Budget Request
US AI Regulation of Defense Fiscal Year 2022 Budget Request
This section examines AI-related regulations enacted

by American regulatory agencies between 2016 and US State-Level AI Legislation
2023. It provides an analysis of the total number of
For AI-related bills passed into law, the AI Index
regulations, as well as their topics, scope, regulatory
performed searches of the keyword “artificial
intent, and originating agencies. To compile this
intelligence” on the legislative websites of all 50 U.S.
data, the AI Index team performed a keyword search
states in the full text of bills. Bills are only counted
for “artificial intelligence” on the Federal Register, a
as passed into law if the final version of the bill
comprehensive repository of government documents
includes the keyword, not just the introduced version.
from nearly all branches of the American government,
Note that only laws passed from 2015 to 2022 are
encompassing more than 436 agencies.
included. The count for proposed laws includes both

laws that were proposed and eventually passed as well

as laws that were proposed that have not yet been
passed, or are now inactive. In some cases, databases
were only searchable by title, so site search functions
were deployed. The AI Index team surveyed the
following databases:
Alabama, Alaska, Arizona, Arkansas, California,

Colorado, Connecticut, Delaware, Florida, Georgia,
Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky,
Louisiana, Maine, Maryland, Massachusetts, Michigan,
Minnesota, Mississippi, Missouri, Montana, Nebraska,
Nevada, New Hampshire, New Jersey, New Mexico,
New York, North Carolina, North Dakota, Ohio,
Oklahoma, Oregon, Pennsylvania, Rhode Island,
South Carolina, South Dakota, Tennessee, Texas,
Utah, Vermont, Virginia, Washington, West Virginia,
Wisconsin, Wyoming
US Committee Mentions
In order to research trends on the United States’
committee mentions of AI, the following search
was conducted:
Website: Congress.gov
Keyword: artificial intelligence
Filters: Committee Reports

Index Report 2024 Chapter 8: Diversity
Code.org
To learn more about the diversity data from Code.org,
please read the methodological note on Code.org’s
data included in the Chapter 6 subsection of
the Appendix.
Computing Research
Association
(CRA Taulbee Survey)
To learn more about the diversity data from the CRA,
please read the methodological note on the CRA’s data
included in the Chapter 6 subsection of the Appendix.
Informatics Europe
To learn more about the diversity data from Informatics
Europe, please read the methodological note on
Informatics Europe’s data included in the Chapter 6
subsection of the Appendix.

Index Report 2024 Chapter 9: Public Opinion

Global Public Opinion on Quid Social Media Data
Artificial Intelligence (GPO-AI) Quid collects social media data from over 500 million
sources in real time and analyzes this data through
In October and November 2023, researchers at the
AI-powered natural language processing. This process
Schwartz Reisman Institute for Technology and Society
parses out language and breaks out posts by filters
(SRI) and the Policy, Elections, and Representation Lab
such as drivers of positive and negative sentiment,
(PEARL) at the Munk School of Global Affairs and Public
emotions, and behaviors, allowing for deeper insights
Policy at the University of Toronto completed a survey
to be reached. Quid analyzed 6.69 million social media
project on public perceptions of and attitudes toward
posts for 2023 to assess perceptions of AI model
AI. The survey was administered to census-targeted
releases. The substantial increase in new models in
samples of over 1,000 people in each of 21 countries,
2023 unveiled reactions to the technical advancements
for a total of 23,882 surveys conducted in 12 languages.
in AI, the impact on efficiency in business operations,
The countries sampled represent a majority of the
and ethical considerations as AI continues to be
world’s population. To learn more about the survey,
adopted into society.
please visit the survey website. The following authors
contributed to the GPO-AI survey: Peter John Loewen,
Blake Lee-Whiting, Maggie Arai, Thomas Bergeron,
Thomas Galipeau, Isaac Gazendam, Hugh Needham,
Lee Slinger, Sofiya Yusypovych.
Ipsos
For brevity, the 2024 AI Index does not republish the
methodology used by the Ipsos survey that features
in the report. More details about the Ipsos survey’s
methodology can be found in the actual survey.
Pew Research
For brevity, the 2024 AI Index does not republish the
methodology used by the Pew surveys that feature in
the report. Data was taken from the 2023 Pew Research
Center survey.

Index Report 2024

HAI AI Index Report 2024

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HAI AI Index Report 2024

Uploaded by

Copyright:

Available Formats

Artificial

Message From the

Ray Perrault and Jack Clark

2. Industry continues to dominate frontier AI research. In 2023, industry produced 51 notable

6. Generative AI investment skyrockets. Despite a decline in overall AI private investment last

Top 10 Takeaways (cont’d)

Staff and Researchers

How to Cite This Report

Public Data and Tools

AI Index and Stanford HAI

Chapter 1: Research and Development

Chapter 2: Technical Performance

Chapter 5: Science and Medicine

Chapter 7: Policy and Governance

Chapter 9: Public Opinion

Chapter 1 Research and Development 27

Chapter 2 Technical Performance 73

Chapter 3 Responsible AI 159

Chapter 4 Economy 213

Chapter 5 Science and Medicine 296

Chapter 6 Education 325

Chapter 7 Policy and Governance 366

Chapter 8 Diversity 411

Chapter 9 Public Opinion 435

ACCESS THE PUBLIC DATA

3. Harder benchmarks emerge. AI models have reached performance saturation on established

Chapter 2: Technical Performance (cont’d)

3. Researchers discover more complex vulnerabilities in LLMs. Previously, most efforts to

Chapter 3: Responsible AI (cont’d)

Chapter 4: Economy (cont’d)

2. The migration of AI PhDs to industry continues at an accelerating pace. In 2011, roughly

Chapter 6: Education (cont’d)

2. AI sentiment in Western nations continues to be low, but is slowly improving. In 2022,

4. Demographic differences emerge regarding AI optimism. Significant demographic

1.1 Publications 31 1.5 Open-Source AI Software 69

1.3 Frontier AI Research 45

Number of AI publications in the world, 2010–22

Table of Contents Chapter 1 Preview 31

Number of AI publications by type, 2010–22

Table of Contents Chapter 1 Preview 32

Number of AI publications by eld of study (excluding Other AI), 2010–22

72.23, Machine learning

Table of Contents Chapter 1 Preview 33

AI publications (% of total) by sector, 2010–22

80% 81.07%, Education

10% 7.89%, Industry

Table of Contents Chapter 1 Preview 34

AI publications (% of total) by sector and geographic area, 2022

0% 10% 20% 30% 40% 50% 60% 70% 80%

Table of Contents Chapter 1 Preview 35

Number of AI journal publications, 2010–22

Table of Contents Chapter 1 Preview 36

AI Conference Publications years, climbing from 22,727 in 2020 to 31,629 in

Number of AI conference publications, 2010–22

Table of Contents Chapter 1 Preview 37

Number of AI patents granted, 2010–22

Table of Contents Chapter 1 Preview 38

By Filing Status and Region

AI patents by application status, 2010–22

128.95, not granted

Table of Contents Chapter 1 Preview 39

AI patents by application status by geographic area, 2010–22

Not granted 80.46

Table of Contents Chapter 1 Preview 40