2022 AI Index Report - Master (181 230)

Artificial Intelligence CHAPTER 5: AI POLICY AND GOVERNANCE
Index Report 2022 5.1 AI and Policymaking
By State NUMBER of STATE-LEVEL PROPOSED AI-RELATED BILLS in the UNITED

In the United States, AI STATES by STATE, 2012–21 (SUM)
Source: Bloomberg Government, 2021 | Chart: 2022 AI Index Report
lawmaking has been relatively
widespread across all states. As AK ME
of 2021, 41 out of 50 states have 0 0
proposed at least one AI-related VT NH

8 0
bill, but certain states have been
WA ID MT ND MN IL WI MI NY RI MA
particularly active in generating 14 0 0 0 2 28 0 3 31 5 40
AI legislation. Figure 5.1.5 shows OR NV WY SD IA IN OH PA NJ CT
that Massachusetts has proposed 1 10 0 0 1 1 3 7 32 2
the most AI bills, with 40 since CA UT CO NE MO KY WV VA MD DE

29 3 2 2 4 1 9 8 8 1
2012, followed by Hawaii (35)
AZ NM KS AR TN NC SC DC
and New Jersey (32). Focusing 7 1 1 1 7 6 1 8
on just 2021 in Figure 5.1.6,
OK LA MS AL GA
Massachusetts was the state that 2 0 7 21 3
proposed the most AI-related HI TX FL
bills, with 20, followed by Illinois 35 17 22
(15) and Alabama (12).

Figure 5.1.5
NUMBER of STATE-LEVEL PROPOSED AI-RELATED BILLS in the UNITED

STATES by STATE, 2021
AK ME
0 0
VT NH
3 0
WA ID MT ND MN IL WI MI NY RI MA
6 0 0 0 0 15 0 0 8 3 20
OR NV WY SD IA IN OH PA NJ CT
1 1 0 0 1 0 1 3 4 0
CA UT CO NE MO KY WV VA MD DE
4 2 2 0 1 0 3 1 4 0
AZ NM KS AR TN NC SC DC
0 0 0 0 2 3 1 6
OK LA MS AL GA
1 0 3 12 0
HI TX FL
7 6 7
Figure 5.1.6
Table of Contents Chapter 5 Preview 181

Sponsorship by Political Party of both parties since 2012, in the past four years, the
State-level AI legislation data reveals that there is a data suggests Democrats were more likely to sponsor
partisan dynamic to AI lawmaking. Figure 5.1.7 plots the AI-related legislation. Whereas Democrats sponsored
number of AI-related bills sponsored at the state level by only two more AI bills than Republicans in 2018, they
Democratic and Republican lawmakers. Although there sponsored 39 more in 2021.
has been an increase in AI bills proposed by members
NUMBER of STATE-LEVEL PROPOSED AI-RELATED BILLS in the UNITED STATES by SPONSOR PARTY, 2012–21
79, Democratic
80
70
60
Number of AI-Related Bills
50
40, Republican
40
30
20
10
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 5.1.7

M E N T I O N S O F A I I N L E G I S L AT I V E from Bloomberg Government concerning mentions of AI-

RECORDS related keywords in congressional proceedings, broken
Another barometer of legislative interest in AI is the down by legislation, congressional committee reports,
number of mentions of “artificial intelligence” in and congressional research service reports.
governmental and parliamentary proceedings. This According to Figure 5.1.8, the current congressional
subsection considers data on mentions of AI both session (the 117th) is on track (as of the end of 2021)
in U.S. congressional records and the parliamentary to record the greatest number of AI-related mentions
proceedings of other countries based on AI Index and since 2001. The most recently completed congressional
Bloomberg Government data. session, the 116th (2019-2020), saw 506 AI mentions,
AI Mentions in U.S. Congressional Records nearly 3.4 times as many mentions as there were during
In the last five years, and especially in 2021, U.S. the 115th session (2017–2018), and 30 times as many as
congressional sessions have devoted increasing amounts the 114th session (2015–2016).
of time to discussions of AI. This section presents data
MENTIONS of AI in the U.S. CONGRESSIONAL RECORD by LEGISLATIVE SESSION, 2001–21

Legislation 506
500 Congressional Research Service Reports
Committee Reports
178
400
Number of Mentions
300 295
139
200 245
149
44
100 129
66
25 83
17 18 17 12 17
0 4 7 39 27
107th 108th 109th 110th 111th 112th 113th 114th 115th 116th 117th
(2001–02) (2003–04) (2005–06) (2007–08) (2009–10) (2011–12) (2013–14) (2015–16) (2017–18) (2019–20) (2021–)
Figure 5.1.8

AI Mentions in Global Legislative Proceedings sessions in 25 countries that contain the keyword
AI mentions in governmental proceedings are on the “artificial intelligence” from 2016 to 2021. Figure 5.1.9
rise not only in the United States but also in many other shows that the mentions of AI in legislative proceedings
countries across the world. The AI Index conducted an in 25 select countries grew 7.7 times in the past six years.2
analysis on the minutes or proceedings of legislative
NUMBER of MENTIONS of AI in LEGISLATIVE PROCEEDINGS in 25 SELECT COUNTRIES, 2016–21

Source: AI Index, 2021 | Chart: 2022 AI Index Report
1,323
1,200
1,000
Number of Mentions
800
600
400
200
0
2016 2017 2018 2019 2020 2021
Figure 5.1.9
2 See the appendix for the methodology. Countries included: Australia, Belgium, Brazil, Canada, China, Denmark, Finland, France, Germany, India, Ireland, Italy, Japan, the Netherlands, New Zealand,
Norway, Russia, Singapore, South Africa, South Korea, Spain, Sweden, Switzerland, the United Kingdom, and the United States.

By Geographic Area Kingdom, and the United States topped the list. Figure
Figure 5.1.10a shows the number of legislative 5.1.2b shows the total number of AI mentions in the past
proceedings containing mentions of AI that were six years. The United Kingdom dominated the list with
enacted in 2021. Similar to the trend in the number of 939 mentions, followed by Spain, Japan, the United
AI mentions in bills passed into laws, Spain, the United States, and Australia.
NUMBER of MENTIONS of AI in LEGISLATIVE PROCEEDINGS in NUMBER of MENTIONS of AI in LEGISLATIVE PROCEEDINGS in

SELECT COUNTRIES, 2021 SELECT COUNTRIES, 2016–2021 (SUM)
Source: AI Index, 2021 | Chart: 2022 AI Index Report Source: AI Index, 2021 | Chart: 2022 AI Index Report
Spain 269 United Kingdom 939

United Kingdom 185 Spain 559
United States 132 Japan 466
Australia 122 United States 422
Japan 95 Australia 410
Ireland 76 Singapore 282
Brazil 72 Ireland 222
Italy 64 Italy 164
Singapore 60 Germany 158
Belgium 47 France 155
Germany 46 Brazil 123
France 25 Belgium 120
Canada 22 Canada 111
Norway 20 Finland 78
Sweden 16 Sweden 71
Finland 12 Netherlands 67
Russia 12 Russia 58
South Africa 11 Norway 52
Netherlands 10 India 34
India 7 South Africa 33
New Zealand 6 Denmark 27
South Korea 6 New Zealand 27
Denmark 5 South Korea 21
Switzerland 3 Switzerland 15
0 50 100 150 200 250 300 0 200 400 600 800 1000
Number of Mentions Number of Mentions
Figure 5.1.10a Figure 5.1.10b

U.S. A I P O L I CY PA P E R S specific recommendations to policymakers. Topics of

To estimate activities outside national governments those papers are divided into primary and secondary
that are also informing AI-related rulemaking, the categories: A primary topic is the main focus of the paper,
AI Index tracks 55 U.S.-based organizations that while a secondary topic is a subtopic of the paper or an
published policy papers in the past four years. Those issue that was briefly explored.
organizations include: think tanks and policy institutes Figure 5.1.11 plots the total number of U.S.-based AI-
(19); university institutes and research programs (14); related policy papers that have been published from
civil society organizations, associations, and consortiums 2018 to 2021, which can proxy the general interest in AI
(9); industry and consultancy organizations (9); and within the U.S. policymaking space. The total number of
government agencies (4).3 A policy paper in this section policy papers has tripled since 2018, peaking in 2020 with
is defined as a research paper, research report, brief, or 273, and decreasing slightly in 2021, with 210.
blog post that addresses issues related to AI and makes
NUMBER of AI-RELATED POLICY PAPERS by U.S.-BASED ORGANIZATIONS, 2018–21

250
210
200
Number of Policy Papers
150
100
50
0
2018 2019 2020 2021
Figure 5.1.11
3 The complete list of organizations the Index followed can be found in the Appendix.

By Topic topics, but they were reported on more frequently

In 2021, the leading primary topics were Privacy, Safety, as secondary topics. Among the AI topics to receive
and Security; Innovation and Technology; and Ethics comparatively little attention from tracked organizations
(Figure 5.1.12). Certain topics, such as government and are those that relate to energy and the environment,
public administration, education and skills, as well as humanities, physical sciences, and social and behavioral
democracy, did not feature prominently as primary sciences.
NUMBER of AI-RELATED POLICY PAPERS by U.S.-BASED ORGANIZATIONS by TOPIC, 2021

Primary Topic Secondary Topic

Privacy, Safety, and Security 62 45
Innovation and Technology 62 63
Ethics 59 57
Int'l A"airs and Int'l Security 51 45
Industry and Regulation 51 58
Equity and Inclusion 36 51
Workforce and Labor 34 30
Gov't and Public Administration 33 58
Justice and Law Enforcement 29 17
Education and Skills 23 36
Communications and Media 15 17
Health and Biological Sciences 7 13
Social and Behavioral Sciences 4 1
Democracy 2 29
Physical Sciences 1 3
Energy and Environment 1 3
Humanities 0 1
0 20 40 60 0 20 40 60
Number of Policy Papers
Figure 5.1.12

Index Report 2022 5.2 U.S. Public Investment in AI
This section examines the public AI investment in the United States, based on data from the U.S. government and Bloomberg Government.
5.2 U.S. PUBLIC INVESTMENT IN AI

FEDERAL BUDGET FOR NONDEFENSE
A I R& D
The increasing amount
In December 2021, the National Science and Technology Council spent on AI R&D by
published a report on the public-sector AI R&D budget across
departments and agencies participating in the Networking and nondefense departments
Information Technology Research and Development (NITRD) indicates the U.S.
program and the National Artificial Intelligence Initiative. The report
does not include information on classified AI R&D investment by the government’s continued
defense and intelligence agencies.
strong interest in
In fiscal year (FY) 2021, nondefense U.S. government agencies
allocated a total of $1.53 billion to AI R&D spending, approximately
public sector funding
2.7 times what was spent in FY 2018 (Figure 5.2.1). This figure for AI research and
is projected to rise 8.8% for FY 2022, with a total of $1.67 billion
requested.4 The increasing amount spent on AI R&D by nondefense
development spanning
departments indicates the U.S. government’s continued strong a wide range of federal
interest in public sector funding for AI research and development
spanning a wide range of federal agencies. agencies.
U.S. FEDERAL BUDGET for NONDEFENSE AI R&D, FY 2018–22
Source: U.S. NITRD Program, 2022 | Chart: 2022 AI Index Report
1.67
1.53
1.50 1.43
Budget (in billions of U.S. Dollars)
1.11
1.00
0.56
0.50
0.00
FY18 (ENACTED) FY19 (ENACTED) FY20 (ENACTED) FY21 (ENACTED) FY22 (REQUESTED)
Figure 5.2.1
4 See NITRD website for details on AI R&D investment FY 2018-22 with the breakdown of core AI vs AI crosscut. Note that AI crosscutting budget data is not available for FY 2018.

U.S. D E PA R T M E N T O F D E F E N S E DOD allocated $9.26 billion across 500 AI R&D programs

BUDGET REQUEST (Figure 5.2.2), a 6.68% increase from the $8.68 billion
Spending on AI by the U.S. Department of Defense (DOD) spent in 2020. For FY 2022, the department has requested
can be proxied by looking at the publicly available $10 billion so far, which is likely to grow once additional
requests made by the DOD for research, development, requests and congressional appropriations are taken into
test, and evaluation (RDT&E) relating to AI. In FY 2021, account.
U.S. DOD BUDGET for AI-SPECIFIC RESEARCH, DEVELOPMENT, TEST and EVALUATION (RDT&E), FY 2020–22
Source: Bloomberg Government and U.S. Department of Defense, 2021 | Chart: 2022 AI Index Report
10.00
10
9.26
8.68
8
0.93: DOD Reported Budget on AI R&D 0.84: DOD Reported Budget on AI R&D 0.87: DOD Reported Budget on AI R&D
0
Sum of FY20 Funding Sum of FY21 Funding Sum of FY22 Funding
Figure 5.2.2
Important data caveat: This chart is indicative of one may result from the difference in defining AI-related
of the challenges of quantifying public AI spending. budget items. For example, a research project that uses
Bloomberg Government’s analysis that searches AI- AI for cyber defense may count human, hardware, and
relevant keywords in DOD budgets shows that the operations-related expenditures within the AI-related
department is requesting $10.0 billion for AI-specific R&D budget request, though the AI software component will
in FY 2022. However, DOD’s own measurement produces be much smaller.
a smaller number of $874 million. The discrepancy

DOD Top Five Highest-Funded Programs

This section highlight offers a more qualitative look at some of the AI-related research projects the
DOD prioritizes. Table 5.2.1 presents the five DOD-related AI programs that received the greatest
funding in 2021. In the past year, the DOD was interested in deploying AI for a number of purposes,
from geospatial monitoring to reducing the threat posed by weapons of mass destruction.
Funds Received
Program Name Department Purpose
(in millions)
1 Rapid Capability Army 257 Fund the development, engineering, acquisition,
Development and Maturation and operation of various AI-related technological
prototypes that could be used for military purposes.
2 Counter Weapons of Defense Threat 254 Develop technologies that could “deny, defeat and
Mass Destruction Advanced Reduction disrupt” weapons of mass destruction (WMD).
Technology Development Agency
3 Algorithmic Warfare Office of the 230 Accelerate the integration of AI technologies in DOD
Cross-Functional Teams – Secretary of systems to “improve warfighting speed and lethality.”
Software Pilot Program Defense
4 Joint Artificial Intelligence Defense 137 Develop, test, prototype, and demonstrate various AI
Center Information and machine learning capabilities with the intention
Systems of integrating these capabilities across numerous
Agency domains which include “supply chain, personal
recovery, infrastructure assessment, geospatial
monitoring during disaster and cyber sense making.”
5 High Performance Army 96 Investigate, demonstrate, and mature both general

Computing Modernization and special-purpose supercomputing environments
Program that are used to satisfy wide-ranging DOD priorities.
Table 5.2.1

DOD AI R&D Spending by Department is poised to maintain that position in 2022. They have
DOD spending on AI R&D can also be broken down on requested a total of $1.86 billion in FY 2022 for AI-related
a subdepartmental level, which reveals how individual projects, followed by the Army ($1.77 billion), the Office
defense agencies—the Army and the Navy, for instance— of the Secretary of Defense ($1.1 billion) and the Air Force
compare in their AI spending (Figure 5.2.3). The U.S. ($883 million).
Navy was the top-spending DOD agency in FY 2021 and
U.S. DOD BUDGET for AI-SPECIFIC RESEARCH, DEVELOPMENT, TEST and EVALUATION (RDT&E) by
DEPARTMENT, FY 2020–22
Air Force Army DARPA DISA Navy OSD Other
10
1.75
1.64
1.18
8
1.77
1.52 1.63
6 1.13
1.12 1.19
1.86
4 1.72 1.93
1.71
2 1.92 1.54
1.57
1.00 1.16
0
FY20 (ENACTED) FY21 (ENACTED) FY22 (REQUESTED)
Figure 5.2.3

U.S. G OV E R N M E N T A I - R E L AT E D of more than 100 AI-specific keywords in their titles or

C O N T R ACT S P E N D I N G descriptions.5
Public investment in AI can also be measured by federal Total Contract Spending
government spending on AI-related contracts. U.S. In 2021, federal departments and agencies spent a total of
government agencies often award contracts to private $1.79 billion on AI-related contracts. Although this amount
companies for the supply of various goods and services is nearly double what was spent on AI-related contracts in
that typically occupy the largest share of an agency’s 2018 (roughly $920 million), it represents a slight decrease
budget. Bloomberg Government built a model to classify from the amount spent on AI-related contracts in 2020,
whether a U.S. government contract was AI-related by which peaked at $1.97 billion (Figure 5.2.4).
adding up all contracting transactions that contain a set
U.S. GOVERNMENT TOTAL CONTRACT SPENDING on AI, FY 2000–21

2.0
1.79
Contract Spending (in billions of U.S. Dollars)
1.5
1.0
0.5
0.0
2000
2006
2009
2004
2008
2003
2005
2002
2020
2007
2001
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2021
Figure 5.2.4
5 Note that contractors may add a number of keywords into their applications during the procurement process, so some of the projects included may have a relatively small AI component relative to
other parts of technology.

Contract Spending by Department and Agency Aggregate spending on AI contracts in the last four years
Figures 5.2.5 and 5.2.6 report AI-related contract spending tells a similar story. Since 2018, the DOD has spent $5.20
by the top 10 federal agencies in 2021 and from 2000 to billion on AI contracts, approximately seven times the next
2021, respectively. The DOD outspent the rest of the U.S. highest spender, NASA ($1.41 billion). In fact, since 2018,
government on both charts by a significant margin. In the DOD has spent twice as much on AI-related contracts
2021, it spent $1.14 billion on AI-related contracts, roughly as all other government agencies combined. Following the
five times what was spent by the next highest department, DOD and NASA are the Department of Health and Human
the Department of Health and Human Services ($234 Services ($700 million), the Department of Homeland
million). Security ($362 million), and Department of the Treasury
($156 million).
TOP CONTRACT SPENDING on AI by U.S. GOVERNMENT DEPARTMENT and AGENCY, 2021

Department of Defense (DOD) 1,138

Department of Health and Human Services (HHS) 234
National Aeronautics and Space Administration (NASA) 159
Department of Homeland Security (DHS) 81
Department of Commerce (DOC) 49
Department of the Treasury (TREAS) 38
Department of Veterans A"airs (VA) 25
Department of Transportation (DOT) 12
Securities and Exchange Commission (SEC) 12
Department of Agriculture (USDA) 12
Department of Energy (DOE) 8
Agency for International Development (USAID) 6
Department of Justice (DOJ) 4
Department of State (DOS) 3
National Science Foundation (NSF) 2
0 200 400 600 800 1000 1200
Contract Spending (in millions of U.S. Dollars)
Figure 5.2.5

TOP CONTRACT SPENDING on AI by U.S. GOVERNMENT DEPARTMENT and AGENCY, 2000–21 (SUM)
Department of Defense (DOD) 5.20
National Aeronautics and Space Administration (NASA) 1.41
Department of Health and Human Services (HHS) 0.70
Department of Homeland Security (DHS) 0.45
Department of the Treasury (TREAS) 0.32
Department of Veterans A!airs (VA) 0.15
Department of Commerce (DOC) 0.15
Department of Agriculture (USDA) 0.07
Department of Energy (DOE) 0.06
Securities and Exchange Commission (SEC) 0.06
General Services Administration (GSA) 0.06
Department of State (DOS) 0.06
Social Security Administration (SSA) 0.05
Department of Transportation (DOT) 0.05

0 1 2 3 4 5
Contract Spending (in billions of U.S. Dollars)
Figure 5.2.6

Largest Contract for Five Top-Spending

Departments in 2021
To paint a better picture of how different U.S. government departments use AI, Table 5.2.2 shows the
most expensive AI-related contract that the five highest AI-related-spending departments signed in
2021. Last year, the U.S. government invested in AI to build autonomous vehicle prototypes, develop an
AI imaging system that could assist with burn classification, and create robots capable of higher-level
lunar navigation.
Amount
Contract Name Department Purpose
(in millions)
Prototype Services in the Objective DOD 70 To acquire prototypes in the domain of
Areas of Automotive Cybersecurity, automotive cybersecurity, vehicle safety
Vehicle Safety Technologies, Vehicle technologies, and autonomous vehicles and
Light Weighting, Autonomous Vehicles intelligent systems.
and Intelligent Systems, Connected
Vehicles, and Advanced Energy Storage
Technologies
Biomedical Advanced Research and HHS 20 To develop optical imaging devices and
Development Authority (BARDA) machine learning algorithms to assist
in classifying and healing wounds and
conventional burns.
Commercial Lunar Payload Services NASA 14 To develop lunar robots capable of navigating
the moon’s south pole to acquire lunar
resources and engage in lunar-based scientific
activities.
SBIR-Autonomous Surveillance DHS 37 To construct towers capable of autonomous

Towers-Delivery Order surveillance.
Schedule 70: Information Technology DOC 13 To develop a prototype using AI technology

that can improve patent search.
Table 5.2.2

Artificial Intelligence
Index Report 2022
Appendix
Artificial Intelligence
Index Report 2022
Appendix
CHAPTER 1 Research & Development 198
CHAPTER 2 Technical Performance 200
CHAPTER 3 Technical AI Ethics 212
CHAPTER 4 The Economy and Education 216
CHAPTER 5 AI Policy and Governance 222
197
Artificial Intelligence APPENDIX
Index Report 2022 Chapter 1: Research & Development
CHAPTER 1: RESEARCH & DEVELOPMENT

CENTER FOR SECURITY AND published research field(s) of study and corresponding
E M E R G I N G T E C H N O LO GY, scores.3 CSET researchers identified the most common
fields of study in their corpus of AI-relevant publications
G E O R G E TOW N U N I V E R S I T Y
since 2010 and recorded publications in all other fields as
Prepared by Sara Abdulla and James Dunham
“Other AI.” English-language AI-relevant publications were
The Center for Security and Emerging Technology (CSET) then tallied by their top-scoring field and publication year.
is a policy research organization within Georgetown
University’s Walsh School of Foreign Service that produces CSET also provided year-by-year citations for AI-relevant
data-driven research at the intersection of security and work associated with each country. A publication is
technology, providing nonpartisan analysis to the policy associated with a country if it has at least one author whose
community. organizational affiliation(s) is located in that country.
Citation counts aren’t available for all publications; those
Publications from CSET Merged Corpus of without counts weren’t included in the citation analysis.
Scholarly Literature Over 70% of English-language AI papers published between
Source 2010 and 2020 have citation data available.
CSET’s merged corpus of scholarly literature combines
distinct publications from Digital Science’s Dimensions, CSET counted cross-country collaborations as distinct
Clarivate’s Web of Science, Microsoft Academic Graph, pairs of countries across authors for each publication.
China National Knowledge Infrastructure, arXiv, and Collaborations are only counted once: For example, if a
Papers With Code.1 publication has two authors from the United States and
two authors from China, it is counted as a single United
Methodology States-China collaboration.
To create the merged corpus, CSET deduplicated across
the listed sources using publication metadata, and then Additionally, publication counts by year and by
combined the metadata for linked publications. To publication type (e.g., academic journal articles,
identify AI publications, CSET used an English-language conference papers) were provided where available.
subset of this corpus: publications since 2010 that appear These publication types were disaggregated by affiliation
AI-relevant.2 CSET researchers developed a classifier country as described above.
for identifying AI-related publications by leveraging the
arXiv repository, where authors and editors tag papers by CSET also provided publication affiliation sector(s)
subject. where, as in the country attribution analysis, sectors were
associated with publications through authors’ affiliations.
To provide a publication’s field of study, CSET matches Not all affiliations were characterized in terms of sectors;
each publication in the analytic corpus with predictions CSET researchers relied primarily on GRID from Digital
from Microsoft Academic Graph (MAG)’s field-of-study Science for this purpose, and not all organizations can
model, which yields hierarchical labels describing the be found in or linked to GRID.4 Where the affiliation
1 All CNKI content is furnished for CSET by East View Information Services, Minneapolis, MN, USA.
2 For more information, see James Dunham, Jennifer Melot, and Dewey Murdick, “Identifying the Development and Application of Artificial Intelligence in Scientific Text,” arXiv [cs.DL], May 28, 2020,
https://arxiv.org/abs/2002.07143.
3 These scores are based on cosine similarities between field-of-study and paper embeddings. See Zhihong Shen, Hao Ma, and Kuansan Wang, “A Web-Scale System for Scientific Knowledge
Exploration,” arXiv [cs.CL], May 30, 2018, https://arxiv.org/abs/1805.12216.
4 See https://www.grid.ac/ for more information about the GRID dataset from Digital Science.
Table of Contents Appendix 198

Index Report 2022 Chapter 1: Research & Development
sector is available, papers were counted toward these iperov/DeepFaceLab, Microsoft/CNTK, opencv/opencv,
sectors, by year. Cross-sector collaborations on academic pytorch/pytorch, scikit-learn/scikit-learn, scutan90/
publications were calculated using the same method as in DeepLearning-500-questions, tensorflow/tensorflow,
the cross-country collaborations analysis. Theano/Theano, Torch/Torch7.
Patents from CSET’s AI Patents Dataset Nuance

Source The GitHub Archive currently does not provide a way
CSET’s AI patents dataset was developed by CSET to count when users remove a star from a repository.
and 1790 Analytics. It includes patents relevant to the Therefore, the reported data slightly overestimates the
development and application of AI, as indicated by CPC/ number of stars. A comparison with the actual number
IPC codes and keywords. of stars for the repositories on GitHub reveals that the
numbers are fairly close and that the trends remain
Methodology unchanged.
In this analysis, patents were grouped by year and
country, and then counted at the “patent family”
level.5 CSET extracted year values from the most recent
publication date within a family. This method has the
advantage of capturing updates within a patent family
(such as amendments).
The country of origin for a patent is derived from the first

country in which a patent was filed.6
G I T H U B S TA R S
Source
GitHub: star-history (available at star history website) was
used to retrieve the data.
Methodology
The visual in the report shows the number of stars for
various GitHub repositories over time. The repositories
include the following:
apachecn/ailearning, apache/incubator-mxnet, Avik-
Jain/100-Days-Of-ML-Code, aymericdamien/TensorFlow-
Examples, BVLC/cafe, cafe2/cafe2, CorentinJ/Real-Time-
Voice-Cloning, deepfakes / faceswap, dmlc/mxnet,
exacity/deeplearningbook-chinese, fchollet/keras,
floodsung/Deep-Learning-Papers-Reading-Roadmap,
5 Patents are analyzed at the “patent family” level rather than “patent document” level because patent families are a collective of patent documents all associated with a single invention and/or
innovation by the same inventors/assignees. Thus, counting at the “patent family” level mitigates artificial number inflation when there are multiple patent documents in a patent family or if a patent is
filed in multiple jurisdictions.
6 For more details on CSET’s approach and experimentation for assigning country values, see footnote 26 in “Patents and Artificial Intelligence: A Primer,” by Dewey Murdick and Patrick Thomas. (Center
for Security and Emerging Technology, September 2020), https://doi.org/10.51593/20200038.

Index Report 2022 Chapter 2: Technical Performance
CHAPTER 2: TECHNICAL PERFORMANCE

ImageNet Generalization
Data on ImageNet accuracy was retrieved through a Xception: Deep Learning with Depthwise Separable
detailed arXiv literature review cross-referenced by Convolutions
technical progress reported on Papers with Code. The
reported dates correspond to the year during which ImageNet: Top-5 Accuracy
a paper was first published to arXiv, and the reported To highlight progress on top-5 accuracy without the use of
results (top-1 or top-5 accuracy) correspond to the result extra training data, scores were taken from the following
reported in the most recent version of each paper. papers:
The estimate of human-level performance is from Adversarial Examples Improve Image Recognition
Russakovsky et al., 2015. Learn more about the LSVRC EfficientNet: Rethinking Model Scaling for Convolutional
ImageNet competition and the ImageNet data set. Neural Networks
Exploring the Limits of Weakly Supervised Pretraining
ImageNet: Top-1 Accuracy Fixing the Train-Test Resolution Discrepancy:
To highlight progress on top-1 accuracy without the use of FixEfficientNet
extra training data, scores were taken from the following GPipe: Efficient Training of Giant Neural Networks Using
papers: Pipeline Parallelism
Adversarial Examples Improve Image Recognition High-Performance Large-Scale Image Recognition Without
Billion-Scale Semi-Supervised Learning for Image Normalization
Classification ImageNet Classification with Deep Convolutional Neural
Dual Path Networks Networks
Densely Connected Convolutional Networks Learning Transferable Architectures for Scalable Image
EfficientNet: Rethinking Model Scaling for Convolutional Recognition
Neural Networks Squeeze-and-Excitation Networks
Fixing the Train-Test Resolution Discrepancy:
FixEfficientNet To highlight progress on top-5 accuracy with the use of
ImageNet Classification with Deep Convolutional Neural extra training data, scores were taken from the following
Networks papers:
Masked Autoencoders Are Scalable Vision Learners Big Transfer (BiT): General Visual Representation Learning
Deep Residual Learning for Image Recognition
To highlight progress on top-1 accuracy with the use of EfficientNet: Rethinking Model Scaling for Convolutional
extra training data, scores were taken from the following Neural Networks
papers: Florence: A New Foundation Model for Computer Vision
Big Transfer (BiT): General Visual Representation Learning Self-Training with Noisy Student Improves ImageNet
CoAtNet: Marrying Convolution and Attention for All Data Classification
Sizes Xception: Deep Learning with Depthwise Separable
EfficientNet: Rethinking Model Scaling for Convolutional Convolutions
Neural Networks
Self-Training with Noisy Student Improves ImageNet STL-10
Classification Data on STL-10 FID scores was retrieved through a
Sharpness-Aware Minimization for Efficiently Improving detailed arXiv literature review cross-referenced by

technical progress reported on Papers with Code. The in which a method was introduced, even if it was
reported dates correspond to the year during which subsequently tested, is the year in which it is included in
a paper was first published to arXiv, and the reported the report. The reported results (accuracy) correspond
results (FID score) correspond to the result reported in the to the result reported in the most recent version of each
most recent version of each paper. Details on the STL-10 paper. Details on the FaceForensics++ benchmark can be
benchmark can be found in the STL-10 paper. found in the FaceForensics++ paper.
To highlight progress on STL-10, scores were taken from To highlight progress on FaceForensics++, scores were
the following papers: taken from the following papers:
DEGAS: Differentiable Efficient Generator Search A Deep Learning Approach to Universal Image
Dist-GAN: An Improved GAN Using Distance Constraints Manipulation Detection Using a New Convolutional Layer
Off-Policy Reinforcement Learning for Efficient and Detection of Deepfake Videos Using Long Distance
Effective GAN Architecture Attention
Search Score Matching Model for Unbounded Data Score FakeCatcher: Detection of Synthetic Portrait Videos Using
Biological Signals
CIFAR-10 FaceForensics++: Learning to Detect Manipulated Facial
Data on CIFAR-10 FID scores was retrieved through a Images
detailed arXiv literature review cross-referenced by Learning Spatiotemporal Features with 3D Convolutional
technical progress reported on Papers with Code. The Networks
reported dates correspond to the year during which Recasting Residual-Based Local Descriptors as
a paper was first published to arXiv, and the reported Convolutional Neural Networks
results (FID score) correspond to the result reported in the Rich Models for Steganalysis of Digital Images
most recent version of each paper. Details on the CIFAR-10 Thinking in Frequency: Face Forgery Detection by Mining
benchmark can be found in the CIFAR-10 paper. Frequency-Aware Clues
Xception: Deep Learning with Depthwise Separable
To highlight progress on CIFAR-10, scores were taken from Convolutions
the following papers:
AutoGAN: Neural Architecture Search for Generative Celeb-DF
Adversarial Networks Data on Celeb-DF AUC was retrieved through a detailed
Denoising Diffusion Probabilistic Models arXiv literature review. The reported dates correspond
Improved Training of Wasserstein GANs to the year during which a paper was first published to
Large Scale GAN Training for High Fidelity Natural Image arXiv or a method was introduced. With Celeb-DF, recent
Synthesis researchers have tested previously existing deepfake
Score-Based Generative Modeling in Latent Space detection methodologies. The year in which a method was
introduced, even if it was subsequently tested, is the year
FaceForensics++ in which it is included in the report. The reported results
Data on FaceForensics++ accuracy was retrieved through (AUC) correspond to the result reported in the most recent
a detailed arXiv literature review. The reported dates version of each paper. Details on the Celeb-DF benchmark
correspond to the year during which a paper was first can be found in the Celeb-DF paper.
published to arXiv or a method was introduced. With
FaceForensics, recent researchers have tested previously To highlight progress on Celeb-DF, scores were taken from
existing deepfake detection methodologies. The year the following papers:
Exposing DeepFake Videos by Detecting Face Warping

Artifacts Human3.6M benchmark can be found in the Human3.6M

FaceForensics++: Learning to Detect Manipulated Facial paper.
Images
Face X-Ray for More General Face Forgery Detection To highlight progress on Human3.6M without the use of
Spatial-Phase Shallow Learning: Rethinking Face Forgery extra training data, scores were taken from the following
Detection in Frequency Domain papers:
3D Human Pose Estimation in Video with Temporal
Leeds Sports Poses Convolutions and Semi-Supervised Training
Data on Leeds Sports Poses percentage of correct Conditional Directed Graph Convolution for 3D Human
keypoints (PCK) was retrieved through a detailed Pose Estimation
arXiv literature review cross-referenced by technical Cross View Fusion for 3D Human Pose Estimation
progress reported on Papers with Code. The reported Epipolar Transformers
dates correspond to the year during which a paper was Human3.6M: Large Scale Datasets and Predictive Methods
first published to arXiv, and the reported results (PCK) for 3D Human Sensing in Natural Environments
correspond to the result reported in the most recent Learning 3D Human Pose from Structure and Motion
version of each paper. Details on the Leeds Sports Poses Robust Estimation of 3D Human Poses from a Single
benchmark can be found in the Leeds Sports Poses paper. Image
To highlight progress on Leeds Sports Poses, scores were To highlight progress on Human3.6M with the use of
taken from the following papers: extra training data, scores were taken from the following
Articulated Pose Estimation by a Graphical Model with papers:
Image Dependent Pairwise Relations Epipolar Transformers
Human Pose Estimation via Convolutional Part Heatmap Learnable Triangulation of Human Pose
Regression TesseTrack: End-to-End Learnable Multi-Person
Jointly Optimize Data Augmentation and Network Articulated 3D Pose Tracking
Training: Adversarial Data Augmentation in Human Pose
Estimation Cityscapes Challenge, Pixel-Level Semantic
Knowledge-Guided Deep Fractal Neural Networks for Labeling Task
Human Pose Estimation Data on the Cityscapes challenge, pixel-level semantic
OmniPose: A Multi-Scale Framework for Multi-Person Pose labeling task mean IoU was taken from the Cityscapes
Estimation dataset, more specifically their pixel-level semantic
Toward Fast and Accurate Human Pose Estimation via labeling leaderboard. More details about the Cityscapes
Soft-Gated Skip Connections dataset and other corresponding semantic segmentation
challenges can be accessed at the Cityscapes dataset
Human 3.6M webpage.
Data on Human3.6M average mean per joint position error
was retrieved through a detailed arXiv literature review CVC-ClinicDB and Kvasir-SEG
cross-referenced by technical progress reported on Papers Data on CVC-ClinicDB and Kvasir-SEG mean dice was
with Code. The reported dates correspond to the year retrieved through a detailed arXiv literature review cross-
during which a paper was first published to arXiv, and the referenced by technical progress reported on Papers
reported results (MPJPE) correspond to the result reported with Code (CVC-ClinicDB and Kvasir-SEG). The reported
in the most recent version of each paper. Details on the dates correspond to the year during which a paper was
first published to arXiv, and the reported results (mean

dice) correspond to the result reported in the most Visual Question Answering (VQA)
recent version of each paper. Details on the CVC-ClinicDB Data on VQA was taken from recent iterations of the VQA
benchmark can be found in the CVC-ClinicDB database challenge. To learn more about the VQA challenge in
page. Details on the Kvasir-SEG benchmark can be found general, please consult the following link. To learn more
in the Kvasir-SEG paper. about the 2021 iteration of the VQA challenge, please
consult the following link. More specifically, the Index
To highlight progress on CVC-ClinicDB, scores were taken makes use of data from the following iterations of the VQA
from the following papers: challenge:
DoubleU-Net: A Deep Convolutional Neural Network for VQA Challenge 2016
Medical Image Segmentation VQA Challenge 2017
Encoder-Decoder with Atrous Separable Convolution for VQA Challenge 2018
Semantic Image Segmentation VQA Challenge 2019
MSRF-Net: A Multi-Scale Residual Fusion Network for VQA Challenge 2020
Biomedical Image Segmentation VQA Challenge 2021
ResUNet++: An Advanced Architecture for Medical Image
Segmentation Kinetics-400, Kinetics-600, and Kinetics-700
U-Net: Convolutional Networks for Biomedical Image Data on Kinetics-400, Kinetics-600, and Kinetics-700 was
Segmentation retrieved through a detailed arXiv literature review cross-
referenced by technical progress reported on Papers with
To highlight progress on Kvasir-SEG, scores were taken Code (Kinetics-400, Kinetics-600, and Kinetics-700). The
from the following papers: reported dates correspond to the year during which a
Encoder-Decoder with Atrous Separable Convolution for paper was first published to arXiv, and the reported results
Semantic Image Segmentation (accuracy) correspond to the result reported in the most
MSRF-Net: A Multi-Scale Residual Fusion Network for recent version of each paper. Details on the Kinetics-400
Biomedical Image Segmentation benchmark can be found in the Kinetics-400 paper. Details
PraNet: Parallel Reverse Attention Network for Polyp on the Kinetics-600 benchmark can be found in the
Segmentation Kinetics-600 paper. Details on the Kinetics-700 benchmark
U-Net: Convolutional Networks for Biomedical Image can be found in the Kinetics-700 paper.
Segmentation
To highlight progress on Kinetics-400, scores were taken
National Institute of Standards and Technology from the following papers:
(NIST) Face Recognition Vendor Test (FRVT) Co-Training Transformer with Videos and Images Improves
and NIST FRVT Face Mask Effects Action Recognition
Data on NIST FRVT 1:1 verification accuracy by dataset Large-Scale Weakly-Supervised Pre-training for Video
was obtained from the FRVT 1:1 verification leaderboard. Action Recognition
Data on NIST FRVT face mask effects was obtained from Multiview Transformers for Video Recognition
the FRVT face mask effects leaderboard. The face mask Non-Local Neural Networks
effects leaderboard contains results of the testing of Omni-Sourced Webly-Supervised Learning for Video
319 face recognition algorithms that were submitted to Recognition
FRVT prior to and post mid-March 2020, when the COVID SlowFast Networks for Video Recognition
pandemic began. Temporal Segment Networks: Towards Good Practices for
Deep Action Recognition

To highlight progress on Kinetics-600, scores were taken on the COCO benchmark can be found in the COCO paper.
from the following papers:
Masked Feature Prediction for Self-Supervised Visual Pre- To highlight progress on COCO without the use of extra
Training training data, scores were taken from the following
Multiview Transformers for Video Recognition papers:
Learning Spatio-Temporal Representation with Local and An Analysis of Scale Invariance in Object Detection – SNIP
Global Diffusion Deformable ConvNets v2: More Deformable, Better Results
Rethinking Spatiotemporal Feature Learning: Speed- Dynamic Head: Unifying Object Detection Heads with
Accuracy Trade-offs in Video Classification Attentions
SlowFast Networks for Video Recognition Inside-Outside Net: Detecting Objects in Context with Skip
Pooling and Recurrent Neural Networks
To highlight progress on Kinetics-700, scores were taken Mish: A Self Regularized Non-Monotonic Activation
from the following papers: Function
Learn to Cycle: Time-Consistent Feature Discovery for Scaled-YOLOv4: Scaling Cross Stage Partial Network
Action Recognition
Masked Feature Prediction for Self-Supervised Visual Pre- To highlight progress on COCO with the use of extra
Training training data, scores were taken from the following
Multiview Transformers for Video Recognition papers:
EfficientDet: Scalable and Efficient Object Detection
ActivityNet: Temporal Action Localization Task
Grounded Language-Image Pre-Training
In the challenge, there are three separate tasks, but
they focus on the main problem of temporally localizing
You Only Look Once (YOLO)
where activities happen in untrimmed videos from the
Data on YOLO mean average precision (mAP50) was
ActivityNet benchmark. To source information on the
retrieved through a detailed arXiv literature review
state-of-the-art results for TALT, the Index did a detailed
and survey of GitHub repositories. The reported dates
survey of arXiv papers in addition to reports of yearly
correspond to the year during which a paper was first
ActivityNet challenge results. More specifically, the Index
published to arXiv or a method was introduced. More
made use of the following sources of information:
specifically, the Index made use of the following sources
TALT 2016
of information:
TALT 2017
YOLO 2016
TALT 2018
YOLO 2018
TALT 2019
YOLO 2020
TALT 2020
YOLO 2021
TALT 2021
YOLO results for 2017 and 2019 were not included in the
Common Object in Context (COCO)
index as no state-of-the-art improvements in YOLO for
Data on COCO mean average precision (mAP50) was
those years were uncovered during the literature review
retrieved through a detailed arXiv literature review cross-
and survey of GitHub repositories.
referenced by technical progress reported on Papers
with Code. The reported dates correspond to the year
Visual Commonsense Reasoning (VCR)
during which a paper was first published to arXiv, and
Technical progress for VCR is taken from the VCR
the reported results (mAP50) correspond to the result
leaderboard; the VCR leaderboard webpage further
reported in the most recent version of each paper. Details
delineates the methodology behind the VCR challenge.

Human performance on VCR is taken from Zellers et al. (2018). Details on the VCR benchmark can be found in the VCR
paper.
SuperGLUE
The SuperGLUE benchmark data was pulled from the SuperGLUE leaderboard. Details about the SuperGLUE benchmark
are in the SuperGLUE paper and SuperGLUE software toolkit. The tasks and evaluation metrics for SuperGLUE are:
NAME IDENTIFIER METRIC

Broad Coverage Diagnostics AX-b Matthew’s Carr
CommitmentBank CB Avg. F1/ Accuracy
Choice of Plausible Alternatives COPA Accuracy
Multi-Sentence Reading Comprehension MultiRC F1a/EM
Recognizing Textual Entailment RTE Accuracy
Words in Context WiC Accuracy
The Winograd Schema Challenge WSC Accuracy
BooIQ BooIQ Accuracy
Reading Comprehension with Commonsense Reasoning ReCoRD F1/ Accuracy
Winogender Schema Diagnostic AX-g Gender Parity/ Accuracy
was first published to arXiv, and the reported results

SQuAD 1.1 and SQuAD 2.0 (ROUGE-1) correspond to the result reported in the most
Data on SQuAD 1.1 performance was taken from Papers recent version of each paper. Details about the arXiv
with Code. Data on SQuAD 2.0 performance was taken benchmark are in the arXiv dataset webpage.
from the SQuAD 2.0 leaderboard. Details about the SQuAD
1.1 benchmark are in the SQuAD 1.1 paper. Details about To highlight progress on arXiv without the use of extra
the SQuAD 2.0 benchmark are in the SQuAD 2.0 paper. training data, scores were taken from the following
papers:
Reading Comprehension Dataset Requiring A Discourse-Aware Attention Model for Abstractive
Logical Reasoning (ReClor) Summarization of Long Documents
Data on ReClor performance was taken from the ReClor Extractive Summarization of Long Documents by
leaderboard. Details about the ReClor benchmark are in Combining Global and Local Context
the ReClor paper. Get to the Point: Summarization with Pointer-Generator
Networks
arXiv Systematically Exploring Redundancy Reduction in
Data on arXiv recall-oriented understudy for gisting Summarizing Long Documents
evaluation (ROUGE-1) was retrieved through a detailed Sparsifying Transformer Models with Trainable
arXiv literature review cross-referenced by technical Representation Pooling
progress reported on Papers with Code. The reported
dates correspond to the year during which a paper

To highlight progress on arXiv with the use of extra Stanford Natural Language Inference (SNLI)
training data, scores were taken from the following Data on Stanford Natural Language Inference (SNLI)
papers: accuracy was retrieved through a detailed arXiv literature
Big Bird: Transformers for Longer Sequences review cross-referenced by technical progress reported on
Hierarchical Learning for Generation with Long Source Papers with Code. The reported dates correspond to the
Sequences year during which a paper was first published to arXiv, and
PEGASUS: Pre-Training with Extracted Gap-Sentences for the reported results (accuracy) correspond to the result
Abstractive Summarization reported in the most recent version of each paper. Details
on the SNLI benchmark can be found in the SNLI paper.
PubMed
Data on PubMed recall-oriented understudy for gisting To highlight progress on SNLI, scores were taken from the
evaluation (ROUGE-1) was retrieved through a detailed following papers:
arXiv literature review cross-referenced by technical Compare, Compress and Propagate: Enhancing Neural
progress reported on Papers with Code. The reported Architectures with Alignment Factorization for Natural
dates correspond to the year during which a paper Language Inference
was first published to arXiv, and the reported results Convolutional Neural Networks for Sentence Classification
(ROUGE-1) correspond to the result reported in the most Enhanced LSTM for Natural Language Inference
recent version of each paper. Details about the PubMed Entailment as Few-Shot Learner
benchmark are in the PubMed paper. Explicit Contextual Semantics for Text Comprehension
Semantics-Aware BERT for Language Understanding
To highlight progress on PubMed without the use of Self-Explaining Structures Improve NLP Models
extra training data, scores were taken from the following
papers: Abductive Natural Language Inference (aNLI)
A Discourse-Aware Attention Model for Abstractive Data on Abductive Natural Language Inference (aNLI) was
Summarization of Long Documents sourced from the Allen Institute for AI’s aNLI leaderboard.
Extractive Summarization of Long Documents by Details on the aNLI benchmark can be found in the aNLI
Combining Global and Local Context paper.
Get to the Point: Summarization with Pointer-Generator
Networks SemEval 2014 Task 4 Sub Task 2
Sparsifying Transformer Models with Trainable Data on SemEval 2014 Task 4 Sub Task 2 accuracy was
Representation Pooling retrieved through a detailed arXiv literature review cross-
referenced by technical progress reported on Papers
To highlight progress on PubMed with the use of extra with Code. The reported dates correspond to the year
training data, scores were taken from the following during which a paper was first published to arXiv, and
papers: the reported results (accuracy) correspond to the result
A Divide-and-Conquer Approach to the Summarization of reported in the most recent version of each paper. Details
Long Documents on the SemEval benchmark can be found in the SemEval
Hierarchical Learning for Generation with Long Source 2014 paper.
Sequences
PEGASUS: Pre-Training with Extracted Gap-Sentences for To highlight progress on SemEval, scores were taken from
Abstractive Summarization the following papers:
A Multi-Task Learning Model for Chinese-Oriented Aspect
Polarity Classification and Aspect Term Extraction

Aspect Level Sentiment Classification with Deep Memory following papers:

Network Understanding Back-Translation at Scale
Back to Reality: Leveraging Pattern-Driven Modeling to Very Deep Transformers for Neural Machine Translation
Enable Affordable Sentiment Dependency Learning
Effective LSTMs for Target-Dependent Sentiment To highlight progress on WMT2014 English-German
Classification without the use of extra training data, scores were taken
Hierarchical Attention Based Position-Aware Network for from the following papers:
Aspect-Level Sentiment Analysis BERT, mBERT, or BiBERT? A Study on Contextualized
Investigating Typed Syntactic Dependencies for Targeted Embeddings for Neural Machine Translation
Sentiment Classification Using Graph Attention Neural Effective Approaches to Attention-based Neural Machine
Network Translation
Recurrent Attention Network on Memory for Aspect Data Diversification: A Simple Strategy for Neural Machine
Sentiment Analysis Translation
Fast and Simple Mixture of Softmaxes with BPE and
WMT2014, English-French and English-German Hybrid-LightRNN for Language Generation
Data on WMT2014 English-French and English-German Google’s Neural Machine Translation System: Bridging the
BLEU score was retrieved through a detailed arXiv Gap Between Human and Machine Translation
literature review cross-referenced by technical progress Incorporating BERT into Neural Machine Translation
reported on Papers with Code (English-French and Weighted Transformer Network for Machine Translation
English-German). The reported dates correspond to the
year during which a paper was first published to arXiv, and To highlight progress on WMT2014 English-German with
the reported results (BLEU score) correspond to the result the use of extra training data, scores were taken from the
reported in the most recent version of each paper. Details following papers:
about both the WMT2014 English-French and English- Lessons on Parameter Sharing across Layers in
German benchmarks can be found in the WMT2014 paper. Transformers
Understanding Back-Translation at Scale
To highlight progress on WMT2014 English-French without
the use of extra training data, scores were taken from the Number of Commercially Available MT
following papers: Systems
Addressing the Rare Word Problem in Neural Machine Details about the number of commercially available MT
Translation systems was sourced from Intento’s report The State of
Google’s Neural Machine Translation System: Bridging the Machine Translation, 2021. Intento is a San Francisco-
Gap Between Human and Machine Translation based startup that analyzes commercially available MT
MUSE: Parallel Multi-Scale Attention for Sequence to services.
Sequence Learning
R-Drop: Regularized Dropout for Neural Networks LibriSpeech (Test-Clean and Other Dataset)
Scaling Neural Machine Translation Data on LibriSpeech (Test-Clean and Other) word error
Understanding the Difficulty of Training Transformers rate was retrieved through a detailed arXiv literature
Weighted Transformer Network for Machine Translation review cross-referenced by technical progress reported
on Papers with Code (Test-Clean and Other). The reported
To highlight progress on WMT2014 English-French with dates correspond to the year during which a paper was
the use of extra training data, scores were taken from the first published to arXiv, and the reported results (word

error rate) correspond to the result reported in the most To highlight progress on LibriSpeech Test-Other with the
recent version of each paper. Details about both the use of extra training data, scores were taken from the
LibriSpeech Test-Clean and Test-Other benchmarks can following papers:
be found in the LibriSpeech paper. Deep Speech 2: End-to-End Speech Recognition in English
and Mandarin
To highlight progress on LibriSpeech Test-Clean without End-to-End ASR: From Supervised to Semi-Supervised
the use of extra training data, scores were taken from the Learning with Modern Architectures
following papers: Pushing the Limits of Semi-Supervised Learning for
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for Automatic Speech Recognition
SOTA Speech Recognition W2v-BERT: Combining Contrastive Learning and Masked
Letter-Based Speech Recognition with Gated ConvNets Language Modeling for Self-Supervised Speech Pre-
Neural Network Language Modeling with Letter-Based Training
Features and Importance Sampling
SpeechStew: Simply Mix All Available Speech Recognition VoxCeleb
Data to Train One Large Neural Network VoxCeleb is an audio-visual dataset consisting of short
State-of-the-Art Speech Recognition Using Multi-Stream clips of human speech, extracted from interview videos
Self-Attention With Dilated 1D Convolutions uploaded to YouTube. VoxCeleb contains speech from
7,000-plus speakers spanning a wide range of ethnicities,
To highlight progress on LibriSpeech Test-Clean with the accents, professions, and ages—amounting to over a
use of extra training data, scores were taken from the million utterances (face-tracks are captured “in the wild,”
following papers: with background chatter, laughter, overlapping speech,
Deep Speech 2: End-to-End Speech Recognition in English pose variation, and different lighting conditions) recorded
and Mandarin over a period of 2,000 hours (both audio and video). Each
End-to-End ASR: From Supervised to Semi-Supervised segment is at least three seconds long. The data contains
Learning with Modern Architectures an audio dataset based on celebrity voices, shorts,
Pushing the Limits of Semi-Supervised Learning for films, and conversational pieces (e.g., talk shows). The
Automatic Speech Recognition initial VoxCeleb 1 (100,000 utterances taken from 1,251
celebrities on YouTube) was expanded to VoxCeleb 2 (1
To highlight progress on LibriSpeech Test-Other without million utterances from 6,112 celebrities).
the use of extra training data, scores were taken from the
following papers: For the sake of consistency, the AI Index reported scores
Conformer: Convolution-Augmented Transformer for on the initial VoxCeleb dataset. Specifically, the Index
Speech Recognition made use of the following sources of information:
Neural Network Language Modeling with Letter-Based The IDLAB VoxSRC-20 Submission: Large Margin Fine-
Features and Importance Sampling Tuning and Quality-Aware Score Calibration in DNN Based
SpeechStew: Simply Mix All Available Speech Recognition Speaker Verification
Data to Train One Large Neural Network The SpeakIn System for VoxCeleb Speaker Recognition
Transformer-Based Acoustic Modeling for Hybrid Speech Challenge 2021
Recognition VoxCeleb: A Large-Scale Speaker Identification Dataset
VoxCeleb2: Deep Speaker Recognition
VoxCeleb: Large-Scale Speaker Verification in the Wild

MovieLens 20M MaskNet: Introducing Feature-Wise Multiplication to CTR

Data on MovieLens 20M normalized discounted Ranking Models by Instance-Guided Mask
cumulative gain@100 (nDCG@100) was retrieved through Product-Based Neural Networks for User Response
a detailed arXiv literature review cross-referenced by Prediction
technical progress reported on Papers with Code. The
reported dates correspond to the year during which Arcade Learning Environment: Atari-57
a paper was first published to arXiv, and the reported Data on Arcade Learning Environment: Atari-57 mean
results (nDCG@100) correspond to the result reported human-normalized score was retrieved through a detailed
in the most recent version of each paper. Details on the arXiv literature review cross-referenced by technical
MovieLens series of benchmarks can be found in Harper et progress reported on Papers with Code. The reported
al. 2015. dates correspond to the year during which a paper was
first published to arXiv, and the reported results (mean
To highlight progress on MovieLens 20M, scores were human-normalized score) correspond to the result
taken from the following papers: reported in the most recent version of each paper. Details
Deep Variational Autoencoder with Shallow Parallel Path on the Arcade Learning Environment: Atari-57 benchmark
for Top-N Recommendation (VASP) can be found in the Arcade Learning Environment paper.
Enhancing VAEs for Collaborative Filtering: Flexible Priors
& Gating Mechanisms To highlight progress on Arcade Learning Environment:
RaCT: Toward Amortized Ranking-Critical Training for Atari-57, scores were taken from the following papers:
Collaborative Filtering Dueling Network Architectures for Deep Reinforcement
Variational Autoencoders for Collaborative Filtering Learning
Distributional Reinforcement Learning with Quantile
Criteo Regression
Data on Criteo area under curve score (AUC) was retrieved GDI: Rethinking What Makes Reinforcement Learning
through a detailed arXiv literature review cross-referenced Different From Supervised Learning
by technical progress reported on Papers with Code. Mastering Atari, Go, Chess and Shogi by Planning with a
The reported dates correspond to the year during which Learned Model
a paper was first published to arXiv, and the reported Recurrent Experience Replay in Distributed Reinforcement
results (AUC) correspond to the result reported in the Learning
most recent version of each paper. Details on the Criteo
benchmark can be found on the Criteo Kaggle Challenge Procgen
page. Data on Procgen mean-normalized score was retrieved
through a detailed arXiv literature review. The reported
To highlight progress on Criteo, scores were taken from dates correspond to the year during which a paper was
the following papers: first published to arXiv, and the reported results (mean-
AutoInt: Automatic Feature Interaction Learning via Self- normalized score) correspond to the result reported in the
Attentive Neural Networks most recent version of each paper. Details on the Procgen
DeepFM: A Factorization-Machine Based Neural Network benchmark can be found in the Procgen paper.
for CTR Prediction
DeepLight: Deep Lightweight Feature Interactions for To highlight progress on Procgen, scores were taken from
Accelerating CTR Predictions in Ad Serving the following papers:
FAT-DeepFFM: Field Attentive Deep Field-aware Automatic Data Augmentation for Generalization in
Factorization Machine Reinforcement Learning

Leveraging Procedural Generation to Benchmark Details on the MLPerf Training benchmark can be found
Reinforcement Learning in the MLPerf Training benchmark paper. Details on
Procedural Generalization by Planning with Self- the current benchmark categories as well as technical
Supervised World Models information about submission and competition
subdivisions can be found on the MLPerf Training
Chess webpage.
Data on the performance of chess software engines was
taken from the Swedish Chess Computer Association’s ImageNet Training Cost
ranking of top chess software engines. The Swedish Chess Data on ImageNet Training cost was based on research
Computer Association tests computer chess software from DAWNBench and the individual research of Deepak
systems against one another and releases a ranking list of Narayanan. DAWNBench is a benchmark suite for end-to-
the top-performing systems. The ranking list produced by end deep-learning training and inference. DAWNBench
the Swedish Chess Computer Association is a statistically provides a reference set of common deep-learning
significant and meaningful measurement of chess engine workloads for quantifying training time, training cost,
performance because engines are pitted against one inference latency, and inference cost across different
another in thousands of tournament-like games and each optimization strategies, model architectures, software
employ the same underlying hardware. Data on Magnus frameworks, clouds, and hardware. More details available
Carlsen’s top ELO score was taken from the International at DAWNBench.
Chess Federation.
Because DAWNBench was deprecated after March 2020,
Training Time and Number of Accelerators data on the training cost for the most recent round of
Data on training time and number of accelerators for AI MLPerf submissions was manually collected by Deepak
systems was taken from the MLPerf Training benchmark Narayanan.
competitions. More specifically, the AI Index made use of
data from the following MLPerf training competitions:
MLPerf Training v0.5, 2018

AI Index Robotics Survey

The survey was distributed to 509 professors who specialize in robotics from 67 universities online over three waves from
December 2021 to February 2022. The selection of universities was based on the World University Rankings 2021 with
geographic representation across the globe. Please view the complete survey raw data in our public data folder here.
The survey was completed by 101 professors from 43 universities, including:

Aalborg University, Denmark Stanford University, United States
Ain Shams University, Egypt Stellenbosch University, South Africa
Carnegie Mellon University, United States Swiss Federal Institute of Technology Lausanne,
Columbia University, United States Switzerland
Cornell University, United States Tokyo Institute of Technology, Japan
Delft University of Technology, Netherlands University of British Columbia, Canada
ETH Zurich, Switzerland University of California Berkeley, United States
Hong Kong University of Science and Technology, Hong University of California Los Angeles, United States
Kong University of California San Diego, United States
Korea Advanced Institute of Science and Technology, University of Cambridge, United Kingdom
South Korea University of Cape Town, South Africa
KU Leuven, Belgium University College London, United Kingdom
Massachusetts Institute of Technology, United States University of Hong Kong, Hong kong
Nanyang Technological University, Singapore University of Illinois at Urbana-Champaign, United States
National Polytechnic Institute, Mexico University of Malaya, Malaysia
National University of Singapore, Singapore University of Manchester, United Kingdom
Peking University, China University of Michigan, United States
Politecnico di Milano, Italy Universitat Politècnica de Catalunya, Spain
Pontificia Universidad Católica de Chile, Chile University of Texas at Austin, United States
Princeton University, United States University of Tokyo, Japan
Purdue University, United States University of Toronto, Canada
RWTH Aachen University, Germany University of Waterloo, Canada
Seoul National University, South Korea Zhejiang University, China

Index Report 2022 Chapter 3: Technical AI Ethics
CHAPTER 3: TECHNICAL AI ETHICS

A I E T H I C S T R E N D S AT FAC CT A N D of papers accepted to the NeurIPS main track with titles
NEURIPS containing keywords (e.g., “counterfactual” or “causal”
To understand trends at the ACM Conference on Fairness, for tracking papers related to causal effect), as well as
Accountability, and Transparency, this section tracks papers submitted to related workshops. See the list of
FAccT papers published in conference proceedings workshops considered for analysis here.
from 2018 to 2021. We categorize author affiliations
into academic, industry, nonprofit, government, and M E TA-A N A LYS I S O F FA I R N E S S
independent categories, while also tracking the location AND BIAS METRICS
of their affiliated institution. Authors with multiple For the analysis conducted on fairness and bias metrics in
affiliations are counted once in each category (academic AI, we identify and report on benchmark and diagnostic
and industry), but multiple affiliations of the same type metrics which have been consistently cited in the
(i.e., authors belonging to two academic institutions) are academic community, reported on a public leaderboard,
counted once in the category. or reported for publicly available baseline models (e.g.,
GPT-3, BERT, ALBERT). We note that research paper
For the analysis conducted on NeurIPS publications, we citations are a lagging indicator of adoption, and metrics
identify workshops themed around real-world impact and which have been very recently adopted may not be
label papers with a single main category in “healthcare,” reflected in the data for 2021.
“climate,” “finance,” “developing world,” or “other,” where
“other” denotes a paper related to a real-world use case For Figures 3.1.1 and 3.1.2, we track metrics from the
but not in one of the other categories. following papers and projects:

Aligning AI with Shared Human Values
We tally the number of papers in each category to reach
Assessing Social and Intersectional Biases in
the numbers found in Figure 3.3.3. Papers are not double-
Contextualized Word Representations
counted in multiple categories. We note that this data
Bias in Bios: A Case Study of Semantic Representation Bias
may not be as accurate for data pre-2018 as societal
in a High-Stakes Setting
impacts work at NeurIPS has historically been categorized
BOLD: Dataset and Metrics for Measuring Biases in Open-
under a broad “AI for social impact” umbrella,7 but it has
Ended Language Generation
recently been split into more granular research areas.
Certifying and Removing Disparate Impact
Examples include workshops dedicated to machine
CivilComments: Jigsaw Unintended Bias in Toxicity
learning for health,8 climate,9 policy & governance10,
Classification
disaster response11, and the developing world.12
CrowS-Pairs: A Challenge Dataset for Measuring Social
Biases in Masked Language Models
To track trends around specific technical topics at
NeurIPS as in Figures 3.3.4–3.3.7, we count the number
7 See 2018 Workshop on Ethical, Social and Governance Issues in AI 2018, 2018 AI for Social Good Workshop, 2019 Joint Workshop on AI for Social Good, 2020 Resistance AI Workshop, 2020 Navigating the
Broader Impacts of AI Research Workshop.
8 See 2014 Machine Learning for Clinical Data Analysis, Healthcare and Genomics, 2015 Machine Learning for Healthcare, 2016 Machine Learning for Health, 2017 Machine Learning for Health.
9 See 2013 Machine Learning for Sustainability, 2020 AI for Earth Sciences, 2019, 2020, 2021 Tackling Climate Change with ML.
10 See 2016 People and Machines, 2019 Joint Workshop on AI for Social Good–Public Policy, 2021 Human-Centered AI.
11 See 2019 AI for Humanitarian Assistance and Disaster Response, 2020 Second Workshop on AI for Humanitarian Assistance and Disaster Response, 2021 Third Workshop on AI for Humanitarian
Assistance and Disaster Response.
12 See 2017–2021 Machine Learning for the Developing World Workshops.

Detecting Emergent Intersectional Biases: Contextualized N AT U R A L L A N G UAG E P R O C E S S I N G

Word Embeddings Contain a Distribution of Human-Like BIAS METRICS
Biases In Section 3.2, we track citations of the Perspective API
Equality of Opportunity in Supervised Learning created by Jigsaw at Google. The Perspective API has been
Evaluating Gender Bias in Machine Translation adopted widely by researchers and engineers in natural
Evaluating Gender Bias in Natural Language Inference language processing. Its creators define toxicity as “a rude,
Examining Gender Bias in Languages with Grammatical disrespectful, or unreasonable comment that is likely to
Gender make someone leave a discussion,” and the tool is powered
Fairness Through Awareness by machine learning models trained on a proprietary
Gender Bias in Coreference Resolution dataset of comments from Wikipedia and news websites.
Gender Bias in Coreference Resolution: Evaluation and We include the following papers in our analysis:
Debiasing Methods
Gender Bias in Multilingual Embeddings and Cross- #ContextMatters: Advantages and Limitations of Using
Lingual Transfer Machine Learning to Support Women in Politics
Gender Shades: Intersectional Accuracy Disparities in A General Language Assistant as a Laboratory for
Commercial Gender Classification Alignment
Image Representations Learned with Unsupervised A Machine Learning Approach to Comment Toxicity
Pretraining Contain Human-Like Biases Classification
Measuring and Reducing Gendered Correlations in A Novel Preprocessing Technique for Toxic Comment
Pretrained Models Classification
Measuring Bias in Contextualized Word Representations Adversarial Text Generation for Google’s Perspective API
Measuring Bias with Wasserstein Distance Avoiding Unintended Bias in Toxicity Classification with
Nuanced Metrics for Measuring Unintended Bias with Real Neural Networks
Data for Text Classification Bad‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌Charac ters: Imperc eptible NLP Attac ks
On Formalizing Fairness in Prediction with Machine Challenges in Detoxifying Language Models
Learning Classification of Online Toxic Comments Using Machine
On Measuring Social Biases in Sentence Encoders Learning Algorithms
Perspective API Context Aware Text Classification and Recommendation
RealToxicityPrompts: Evaluating Neural Toxic Model for Toxic Comments Using Logistic Regression
Degeneration in Language Models Detecting Cross-Geographic Biases in Toxicity Modeling on
Scaling Language Models: Methods, Analysis & Insights Social Media
from Training Gopher Detoxifying Language Models Risks Marginalizing Minority
Semantics Derived Automatically from Language Corpora Voices
Contain Human-Like Biases Fighting Hate Speech, Silencing Drag Queens? Artificial
StereoSet: Measuring Stereotypical Bias in Pretrained Intelligence in Content Moderation and Risks to LGBTQ
Language Models Voices Online
The Woman Worked as a Babysitter: On Biases in HATEMOJI: A Test Suite and Adversarially Generated
Language Generation Dataset for Benchmarking and Detecting Emoji-Based
When Worlds Collide: Integrating Different Counterfactual Hate
Assumptions in Fairness HotFlip: White-Box Adversarial Examples for Text
Classification
Identifying Latent Toxic Features on YouTube Using Non-
Negative Matrix Factorization

Interpreting Social Respect: A Normative Lens for ML While the Perspective API is used widely within machine
Models learning research and also for measuring online toxicity,
Knowledge-Based Neural Framework for Sexism toxicity in the specific domains used to train the models
Detection and Classification undergirding Perspective (e.g., news, Wikipedia) may not
Large Pretrained Language Models Contain Human-Like be broadly representative of all forms of toxicity (e.g.,
Biases of What Is Right and Wrong to Do trolling). Other known caveats include biases against text
Leveraging Multilingual Transformers for Hate Speech written by minority voices: The Perspective API has been
Detection shown to disproportionately assign high toxicity scores
Limitations of Pinned AUC for Measuring Unintended Bias to text that contains mentions of minority identities (e.g.,
Machine Learning Suites for Online Toxicity Detection “I am a gay man”). As a result, detoxification techniques
Mitigating Harm in Language Models with Conditional- built with labels sourced from the Perspective API result
Likelihood Filtration in models that are less capable of modeling language
On-the-Fly Controlled Text Generation with Experts and used by minority groups, and they avoid mentioning
Anti-Experts minority identities.
Process for Adapting Language Models to Society (PALMS)
with Values-Targeted Datasets We note that the effect size metric reported in the
Racial Bias in Hate Speech and Abusive Language Word Embeddings Association Test (WEAT) section is
Detection Datasets highly sensitive to rare words, as it has been shown
RealToxicityPrompts: Evaluating Neural Toxic that removing less than 1% of relevant documents in
Degeneration in Language Models a corpus can significantly impact the WEAT effect size.
Scaling Language Models: Methods, Analysis & Insights This means that effect size is not guaranteed to be a
from Training Gopher robust metric for assessing bias in embeddings. While
Self-Diagnosis and Self-Debiasing: A Proposal for we report on a subset of embedding association tasks
Reducing Corpus-Based Bias in NLP measuring bias along gender and racial axes, these
Social Bias Frames: Reasoning About Social and Power embedding association tests have been extended to
Implications of Language quantify the effect across intersectional axes (e.g.,
Social Biases in NLP Models as Barriers for Persons with EuropeanAmerican+male, AfricanAmerican+male,
Disabilities AfricanAmerican+female).
Stereotypical Bias Removal for Hate Speech Detection
Task Using Knowledge-Based Generalizations In the analysis of embeddings from over 100 years of
The Risk of Racial Bias in Hate Speech Detection U.S. Census data, embedding bias was measured by
Towards Measuring Adversarial Twitter Interactions computing the difference between average embedding
Against Candidates in the US Midterm Elections distances. For example, gender bias is calculated as the
Toxic Comment Classification Using Hybrid Deep Learning average distance of embeddings of words associated with
Model women (e.g., she, female) compared to embeddings of
Toxicity-Associated News Classification: The Impact of words for occupations (e.g., teacher, lawyer), minus the
Metadata and Content Features same average distance calculated for words associated
Understanding BERT Performance in Propaganda Analysis with men.
White-to-Black: Efficient Distillation of Black-Box
Adversarial Attacks
Women, Politics and Twitter: Using Machine Learning to
Change the Discourse

FAC T UA L I T Y A N D T R U T H F U L N E S S Hallucination refers to language models fabricating

statements not found in factually correct supporting
Definitions evidence or input documents. In closed-form dialog,
The concepts of factuality, factual correctness, factual summarization, or question-answering, a system that
accuracy, and veracity are all used to refer to conformity hallucinates is considered untruthful.
with facts or truth. Recent work in AI aims to assess
factual correctness within language models and Language Diversity in Training Data
characterize their limitations. Imbalanced language distribution in training data
impacts the performance of general-purpose language
While human truthfulness is a relatively well-understood models. For example, the Gopher family of models is
concept, truthfulness is not a well-characterized concept trained on MassiveText (10.5TB), which is a dataset
within the context of AI. A group of researchers has made up of 99% English. Similarly, only 7.4% of GPT-3
proposed frameworks for what it means for a system training data is in non-English languages. In contrast,
to be truthful—for example, a broadly truthful system XGLM, a recent model family from Meta AI, is trained on
should avoid lying or using true statements to mislead a training data of 30 languages, and upsamples low-
or misdirect; should be clear, informative, and (mostly) resource languages to create a more balanced language
cooperative in conversation; and should be well-calibrated, representation. See Figure 1 on the XGLM paper that
self-aware, and open about the limits of its knowledge. compares the language distribution of XGLM and GPT-3.
A definition of narrow truthfulness may simply refer to
systems which avoid stating falsehoods. The authors of In addition, Figure 7 of the XGLM paper highlights the
TruthfulQA define a system as truthful only if it avoids extent to which language models can effectively store
asserting a false statement; refusing to answer a question, factual knowledge by comparing the performance of
expressing uncertainty, or giving a true but irrelevant XGLM (a multilingual language model) with GPT-3, a
answer may be considered truthful but not informative. monolingual model. Performance was evaluated on
knowledge triplet completion using the mLAMA dataset,
Truthfulness is related to alignment: A truthful system is which was translated from the English benchmark LAMA
aligned with human values and goals. In one definition of using Google Translate. GPT-3 outperforms in English,
alignment, an aligned system is one that is helpful, honest, but XGLM outperforms in non-English languages. Further
and harmless. Since we cannot yet measure honesty within results show that more diverse language representation
a system, truthfulness can be used as a proxy. improves language model performance in tasks such as
translation.
An honest system is one that asserts only what it
“believes” or one that never contradicts its own beliefs. In 2021, Congress inquired into the content moderation
A system can be honest but not truthful—for example, if practices of social media companies in non-English
an honest system believes that vaccines are unsafe, it can languages, and emphasized the importance of equal
claim this honestly, despite the statement being factually access to truthful and trustworthy information regardless
incorrect. Conversely, a system can be truthful but not of language. As these companies start to adopt language
honest: It may believe vaccines are unsafe but asserts models into their fact-checking and content moderation
they are safe to pass a test. Another work proposes that processes for languages around the world, it is critical to
an honest system should give accurate information, not be able to measure the disproportionate negative impact
mislead users, be calibrated (e.g., it should be correct 80% of using models which underperform on non-English
of the time when it claims 80% confidence), and express languages.
appropriate levels of uncertainty.

Index Report 2022 Chapter 4: The Economy and Education
CHAPTER 4: THE ECONOMY

AND EDUCATION
EMSI BURNING GLASS and Emsi Burning Glass, the labor market demand
Prepared by Julia Nitschke, Summer Jasinski, Bledi Taska captured by Emsi Burning Glass data represents over 95%
and Rucha Vankudre of the total labor demand. Jobs not posted online are
usually in small businesses (the classic example being the
Emsi Burning Glass delivers job market analytics “Help Wanted” sign in the restaurant window) and union
that empower employers, workers, and educators to hiring halls.
make data-driven decisions. The company’s artificial
intelligence technology analyzes hundreds of millions Measuring Demand for AI
of job postings and real-life career transitions to provide In order to measure employers’ demand for AI skills, Emsi
insight into labor market patterns. This real-time strategic Burning Glass uses its skills taxonomy of over 17,000 skills.
intelligence offers crucial insights, such as what jobs are The list of AI skills from Emsi Burning Glass data is shown
most in demand, the specific skills employers need, and below, with associated skill clusters. While some skills
the career directions that offer the highest potential for are considered to be in the AI cluster specifically, for the
workers. For more information, visit burning-glass.com. purposes of this report, all skills below were considered AI
skills. A job posting was considered an AI job if it requested
Job Posting Data one or more of these skills.
To support these analyses, Emsi Burning Glass mined
its dataset of millions of job postings collected since Artificial Intelligence: Expert System, IBM Watson, IPSoft
2010. Emsi Burning Glass collects postings from over Amelia, Ithink, Virtual Agents, Autonomous Systems, Lidar,
45,000 online job sites to develop a comprehensive, OpenCV, Path Planning, Remote Sensing
real-time portrait of labor market demand. It aggregates
job postings, removes duplicates, and extracts data Natural Language Processing (NLP): ANTLR, Automatic
from job postings text. This includes information on job Speech Recognition (ASR), Chatbot, Computational
title, employer, industry, and region, as well as required Linguistics, Distinguo, Latent Dirichlet Allocation, Latent
experience, education, and skills. Semantic Analysis, Lexalytics, Lexical Acquisition, Lexical
Semantics, Machine Translation (MT), Modular Audio
Job postings are useful for understanding trends in the Recognition Framework (MARF), MoSes, Natural Language
labor market because they allow for a detailed, real- Processing, Natural Language Toolkit (NLTK), Nearest
time look at the skills employers seek. To assess the Neighbor Algorithm, OpenNLP, Sentiment Analysis/
representativeness of job postings data, Emsi Burning Opinion Mining, Speech Recognition, Text Mining, Text to
Glass conducts a number of analyses to compare the Speech (TTS), Tokenization, Word2Vec
distribution of job postings to the distribution of official
government and other third-party sources in the United Neural Networks: Caffe Deep Learning Framework,
States. The primary source of government data on U.S. Convolutional Neural Network (CNN), Deep Learning,
job postings is the Job Openings and Labor Turnover Deeplearning4j, Keras, Long Short-Term Memory (LSTM),
Survey (JOLTS) program, conducted by the Bureau of MXNet, Neural Networks, Pybrain, Recurrent Neural
Labor Statistics. Based on comparisons between JOLTS Network (RNN), TensorFlow

Machine Learning: AdaBoost algorithm, Boosting and classified by taxonomists at LinkedIn into 249 skill
(Machine Learning), Chi Square Automatic Interaction groupings, which are the skill groups represented in the
Detection (CHAID), Classification Algorithms, Clustering dataset. The top skills that make up the AI skill grouping
Algorithms, Decision Trees, Dimensionality Reduction, are machine learning, natural language processing, data
Google Cloud Machine Learning Platform, Gradient structures, artificial intelligence, computer vision, image
boosting, H2O (software), Libsvm, Machine Learning, processing, deep learning, TensorFlow, Pandas (software),
Madlib, Mahout, Microsoft Cognitive Toolkit, MLPACK (C++ and OpenCV, among others.
library), Mlpy, Random Forests, Recommender Systems,
Scikit-learn, Semi-Supervised Learning, Supervised Skill groupings are derived by expert taxonomists
Learning (Machine Learning), Support Vector Machines through a similarity-index methodology that measures
(SVM), Semantic Driven Subtractive Clustering Method skill composition at the industry level. Industries are
(SDSCM), Torch (Machine Learning), Unsupervised classified according to the ISIC 4 industry classification
Learning, Vowpal, Xgboost (Zhu et al., 2018).
Robotics: Blue Prism, Electromechanical Systems, Skills Genome

Motion Planning, Motoman Robot Programming, Robot For any entity (occupation or job, country, sector, etc.), the
Framework, Robotic Systems, Robot Operating System skill genome is an ordered list (a vector) of the 50 “most
(ROS), Robot Programming, Servo Drives / Motors, characteristic skills” of that entity. These most characteristic
Simultaneous Localization and Mapping (SLAM) skills are identified using a TF-IDF algorithm to identify the
most representative skills of the target entity, while down-
Visual Image Recognition: Computer Vision, Image ranking ubiquitous skills that add little information about
Processing, Image Recognition, Machine Vision, Object that specific entity (e.g., Microsoft Word).
Recognition
TF-IDF is a statistical measure that evaluates how
LINKEDIN representative a word (in this case a skill) is to a selected
Prepared by Akash Kaura and Murat Erer entity. This is done by multiplying two metrics:
1. The term frequency of a skill in an entity (TF).
Country Sample 2. The logarithmic inverse entity frequency of the skill
Included countries represent a select sample of eligible across a set of entities (IDF). This indicates how
countries with at least 40% labor force coverage by common or rare a word is in the entire entity set.
LinkedIn and at least 10 AI hires in any given month. The closer IDF is to 0, the more common a word is.
China and India were included in this sample because of So, if the skill is very common across LinkedIn entities,
their increasing importance in the global economy, but and appears in many job or member descriptions, the IDF
LinkedIn coverage in these countries does not reach 40% will approach 0. If, on the other hand, the skill is unique to
of the workforce. Insights for these countries may not specific entities, the IDF will approach 1. Details available
provide as full a picture as other countries, and should be at LinkedIn’s Skills Genome and LinkedIn-World Bank
interpreted accordingly. Methodology note.
Skills (and AI Skills) AI Skills Penetration

LinkedIn members self-report their skills on their LinkedIn The aim of this indicator is to measure the intensity of AI
profiles. Currently, more than 38,000 distinct, standardized skills in an entity (in a particular country, industry, gender,
skills are identified by LinkedIn. These have been coded etc.) through the following methodology:

• Compute frequencies for all self-added skills by explicitly added AI skills to their profile and/or they are
LinkedIn members in a given entity (occupation, occupied in an AI occupation representative. The counts of
industry, etc.) in 2015–2021. AI talent are used to calculate talent concentration metrics
• Re-weight skill frequencies using a TF-IDF model (e.g. to calculate the country-level AI talent concentration,
to get the top 50 most representative skills in that we use the counts of AI talent at the country level vis-
entity. These 50 skills compose the “skill genome” of a-vis the counts of LinkedIn members in the respective
that entity. countries).
•C ompute the share of skills that belong to the AI skill
group out of the top skills in the selected entity. Relative AI Skills Penetration
To allow for skills penetration comparisons across
Interpretation: The AI skill penetration rate signals the countries, the skills genomes are calculated and a relevant
prevalence of AI skills across occupations, or the intensity benchmark is selected (e.g., global average). A ratio is then
with which LinkedIn members utilize AI skills in their constructed between a country’s and the benchmark’s AI
jobs. For example, the top 50 skills for the occupation of skills penetrations, controlling for occupations.
engineer are calculated based on the weighted frequency
with which they appear in LinkedIn members’ profiles. If Interpretation: A country’s relative AI skills penetration of
four of the skills that engineers possess belong to the AI 1.5 indicates that AI skills are 1.5 times as frequent as in
skill group, this measure indicates that the penetration of the benchmark, for an overlapping set of occupations.
AI skills is estimated to be 8% among engineers (e.g., 4/50).
Global Comparison
Jobs or Occupations For cross-country comparison, we present the relative
LinkedIn member titles are standardized and grouped penetration rate of AI skills, measured as the sum of the
into approximately 15,000 occupations. These are not penetration of each AI skill across occupations in a given
sector or country specific. These occupations are further country, divided by the average global penetration of AI
standardized into approximately 3,600 occupation skills across the overlapping occupations in a sample of
representatives. Occupation representatives group countries.
occupations with a common role and specialty, regardless
of seniority. Interpretation: A relative penetration rate of 2 means
that the average penetration of AI skills in that country
AI Jobs or Occupations is two times the global average across the same set of
An AI job (technically, occupation representative) is occupations.
an occupation representative that requires AI skills to
perform the job. Skills penetration is used as a signal Global Comparison: By Industry
for whether AI skills are prevalent in an occupation The relative AI skills penetration by country for industry
representative, in any sector where the occupation provides an in-depth sectoral decomposition of AI skill
representative may exist. Examples of such occupations penetration across industries and sample countries.
include (but are not limited to): machine learning
engineer, artificial intelligence specialist, data scientist, Interpretation: A country’s relative AI skill penetration
computer vision engineer, etc. rate of 2 in the education sector means that the average
penetration of AI skills in that country is two times the
AI Talent global average across the same set of occupations in that
A LinkedIn member is considered AI talent if they have sector.

Relative AI Hiring Index Search, Data Sources, and Scope

• LinkedIn Hiring Rate or Overall Hiring Rate is Over 6 million global public and private company profiles
a measure of hires normalized by LinkedIn from multiple data sources are indexed in order to
membership. It is computed as the percentage of search across company descriptions, while filtering and
LinkedIn members who added a new employer in including metadata ranging from investment information
the same period the job began, divided by the total to firmographic information, such as founded year, HQ
number of LinkedIn members in the corresponding location, and more. Company information is updated on a
location. weekly basis. NetBase Quid algorithm reads a big amount
• AI Hiring Rate is computed following the overall of text data from each document to make links between
hiring rate methodology, but only considering different documents based on their similar language. This
members classified as AI talent. process is repeated at an immense scale, which produces
•R elative AI Hiring Index is the pace of change in AI a network with different clusters identifying distinct topics
Hiring Rate normalized by the pace of change in or focus areas. Trends are identified based on keywords,
Overall Hiring Rate, providing a picture of whether phrases, people, companies, institutions that NetBase
hiring of AI talent is growing at a higher, equal, Quid identifies, and the other metadata that is put into the
or lower rate than overall hiring in a market. The software.
relative AI Hiring Index is equal to 1.0 when AI hiring
and overall hiring are growing at the same rate year Data
on year. Organization data is embedded from Capital IQ and
Crunchbase. These companies include all types of
Interpretation: Relative AI Hiring Index shows how fast companies (private, public, operating, operating as a
each country is experiencing growth in AI talent hiring subsidiary, out of business) throughout the world. The
relative to growth in overall hiring in the country. A ratio investment data includes private investments, M&A, public
of 1.2 means the growth in AI talent hiring has outpaced offerings, minority stakes made by PE/VCs, corporate
the growth in overall hiring by 20%. venture arms, governments, and institutions both within
and outside the United States. Some data is simply
N E T BA S E Q U I D unreachable—for instance, when the investors or the
Prepared by Julie Kim and Tejas Sirohi funding amounts by investors are undisclosed. NetBase
Quid also embeds firmographic information such as
NetBase Quid delivers AI-powered consumer and market founding year and HQ location.
intelligence to enable business reinvention in a noisy
and unpredictable world. The software applies artificial NetBase Quid embeds Capital IQ data as a default and
intelligence to reveal patterns in large, unstructured adds in data from Crunchbase for the ones that are not
datasets and to generate visualizations that enable users captured in Capital IQ. This not only yields comprehensive
to make smart, data-driven decisions accurately, quickly, and accurate data on all global organizations, but it also
and efficiently. NetBase Quid uses Boolean query to captures early-stage startups and funding events data.
search for focus areas, topics, and keywords within social, Company information is uploaded on a weekly basis.
news, forums and blogs, companies, and patents data
sources, as well as other custom datasets. NetBase Quid Search Parameters
then visualizes these data points based on the semantic Boolean query is used to search for focus areas, topics,
similarity. and keywords within the archived company database,
within their business descriptions and websites. We can
filter out the search results by HQ regions, investment

amount, operating status, organization type (private/ COMPUTING RESEARCH

public), and founding year. NetBase Quid then visualizes A S S O C I AT I O N (C R A TAU L B E E
these companies by the semantic similarity. If there
SURVEY)
are more than 7,000 companies from the search result,
Prepared by Betsy Bizot (CRA senior research associate)
NetBase Quid selects the 7,000 most relevant companies for
visualization based on the language algorithm.
Source
Computing Research Association (CRA) members
Boolean Search: “artificial intelligence” or “AI” or “machine
are 200-plus North American organizations active in
learning” OR “deep learning”
computing research: academic departments of computer
science and computer engineering; laboratories and
Companies:
centers in industry, government, and academia; and
• Chart 4.2.1: Global AI & ML companies that have
affiliated professional societies (AAAI, ACM, CACS/AIC,
been invested (private, IPO, M&A) from 01/01/2012 to
IEEE Computer Society, SIAM USENIX). CRA’s mission
12/31/2021.
is to enhance innovation by joining with industry,
• Chart 4.2.2–4.2.12: Global AI & ML companies that have
government, and academia to strengthen research and
invested over $1.5M for the last 10 years (January 1,
advanced education in computing. Learn more about
2012 to December 31, 2021)—7,000 companies out of
CRA here.
7,500 companies have been selected through Quid’s
relevance algorithm.
Methodology
CRA Taulbee Survey gathers survey data during the fall
Target Event Definitions
of each academic year by reaching out to over 200 PhD-
• Private investments: A private placement is a private
granting departments. Details about the Taulbee Survey
sale of newly issued securities (equity or debt) by a
can be found here. Taulbee doesn’t directly survey the
company to a selected investor or a selected group
students. The department identifies each new PhD’s area
of investors. The stakes that buyers take in private
of specialization as well as their type of employment.
placements are often minority stakes (under 50%),
Data is collected from September to January of each
although it is possible to take control of a company
academic year for PhDs awarded in the previous
through a private placement as well, in which case
academic year. Results are published in May after data
the private placement would be a majority stake
collection closes. So the 2020 data points were newly
investment.
available last spring, and the numbers provided for 2021
• Minority investment: These refer to minority stake
will be available in May 2020.
acquisitions in Quid, which take place when the buyer
acquires less than 50% of the existing ownership stake
The CRA Taulbee Survey is sent only to doctoral
in entities, asset product, and business divisions.
departments of computer science, computer
• M&A: This refers to a buyer acquiring more than 50%
engineering, and information science/systems.
of the existing ownership stake in entities, asset
Historically, (a) Taulbee covers 1/4 to 1/3 of total BS CS
product, and business divisions.
recipients in the United States; (b) the percent of women
earning bachelor’s degrees is lower in the Taulbee
schools than overall; and (c) Taulbee tracks the trends in
overall CS production.

Nuances
• Of particular interest in PhD job market trends are
the metrics on the AI PhD area of specialization. The
categorization of specialty areas changed in 2008 and
was clarified in 2016. From 2004-2007, AI and robotics
were grouped; from 2008-present, AI is separate; 2016
clarified to respondents that AI includes ML.
• Notes about the trends in new tenure-track hires
(overall and particularly at AAU schools): In the 2018
Taulbee Survey, for the first time we asked how many
new hires had come from the following sources: new
PhD, postdoc, industry, and other academic. Results
indicated that 29% of new assistant professors came
from another academic institution.
• Some may have been teaching or research faculty
rather than tenure-track, but there is probably some
movement between institutions, meaning the total
number hired overstates the total who are actually
new.

Index Report 2022 Chapter 5: AI Policy and Governance
CHAPTER 5: AI POLICY
AND GOVERNANCE
B LO O M B E R G G OV E R N M E N T Legislative Documents: Bloomberg Government
Prepared by Amanda Allen maintains a repository of congressional documents,
including bills, Congressional Budget Office assessments,
Bloomberg Government is a premium, subscription- and reports published by congressional committees,
based service that provides comprehensive information the Congressional Research Service, and other offices.
and analytics for professionals who interact with—or Bloomberg Government also ingests state legislative
are affected by—the government. Delivering news, bills. For the section “AI Policy and Governance,”
analytics, and data-driven decision tools, Bloomberg Bloomberg Government analysts identified all legislation,
Government’s digital workspace gives an intelligent edge congressional committee reports, and CRS reports that
to government affairs and contracting professionals. For referenced one or more AI-specific keywords.
more information or a demo, visit about.bgov.com.
Methodology
Contract Spending: Bloomberg Government’s Contracts
Intelligence Tool structures all contracts data from
www.fpds.gov. The CIT includes a model of government
spending on artificial intelligence-related contracts that is
based on a combination of government-defined product
service codes and more than 100 AI-related keywords.
For the section “U.S. Government Contract Spending,”
Bloomberg Government analysts used contract spending
data from fiscal year 2000 through fiscal year 2021.
Defense RDT&E Budget: Bloomberg Government

organized all the RDT&E budget request line items
available from the Defense Department Comptroller. For
the section “U.S. Department of Defense (DOD) Budget,”
Bloomberg Government used a set of AI-specific keywords
to identify 500 unique budget activities related to artificial
intelligence and machine learning worth a combined $5.9
billion in FY 2021.

G LO BA L L E G I S L AT I O N R E C O R D S O N A I
For AI-related bills passed into laws, the AI Index performed searches of the keyword “artificial intelligence,” in respective
languages, on the websites of 25 countries’ congresses or parliaments, in full-text of bills. Note that only laws passed
by state-level legislative bodies and signed into law (i.e., by presidents or received royal assent) from 2015 to 2021 are
included. Future AI Index reports hope to include analysis on other types of legal documents, such as regulations and
standards, adopted by state- or supranational-level legislative bodies, government agencies, etc.
Australia Denmark
Website: www.legislation.gov.au Website: https://www.retsinformation.dk/
Keyword: artificial Intelligence Keyword: kunstig intelligen
Filters: Filter:
• Legislation types: Acts • Document Type: Laws
• Portfolios: Department of House of Representatives,
Department of Senate Finland
Note: Texts in explanatory memorandum are not counted. Website: https://www.finlex.fi/
Keyword: tekoäly
Belgium Noting under the Current Legislation section
Website: http://www.ejustice.just.fgov.be/loi/loi.htm
Keyword: intelligence artificielle France
Website: https://www.legifrance.gouv.fr/
Brazil Keyword: intelligence artificielle
Website: https://www.camara.leg.br/legislacao Filter:
Keyword: inteligência artificial • texte consolidé
Filter: • Document Type: Law
• Federal legislation
• Type: Law Germany
Website: http://www.gesetze-im-internet.de/index.html
Canada Keyword: künstliche Intelligenz
Website: https://www.parl.ca/legisinfo/ Filter:
Keyword: artificial Intelligence • All federal codes, statutes, and ordinances that are
Note: Results were investigated to determine how many of currently in force
the bills introduced were eventually passed (i.e., received • Volltextsuche (full text)
royal assent) and bill status was recorded. • Und-Verknüpfung der Wörter (entire word)
China India
Website: https://flk.npc.gov.cn/ Website: https://www.indiacode.nic.in
Keyword: 人工智能 Keyword: artificial intelligence
Filters: Note: The website used allows for a search of keywords
• Legislative body: Standing Committee of the in legalization title but not in the full text, as such it is not
National People’s Congress useful for this particular research. Therefore, a Google
search using the “site” function to search the site with the
keyword of “artificial intelligence” is conducted.

Ireland Singapore
Website: www.irishstatutebook.ie Website: https://sso.agc.gov.sg/
Keyword: artificial intelligence Keyword: artificial intelligence
Filter:
Italy • Document Type: Current acts and subsidiary
Website: https://www.normattiva.it/ legislation
Keyword: intelligenza artificiale
Filter: South Africa
• Document Type: law Website: www.gov.za
Keyword: artificial intelligence
Japan Filter:
Website: https://elaws.e-gov.go.jp/ • Document: acts
Keyword: 人工知能 Note: This search function seemingly does not search
Filter: within the context of the full text and so no results were
• Full text returned. Therefore, a Google search using the “site”
• Law function to search the site with the keyword of “artificial
intelligence” is conducted.
Netherlands
Website: https://www.overheid.nl/ South Korea
Keyword: kunstmatige intelligentie Website: https://law.go.kr/eng/; https://elaw.klri.re.kr/
Filter: Keyword: artificial Intelligence or 인공 지능
• Document Type: Wetten Filter:
• Type: Act
New Zealand Note: Cannot search combined words, so individual
Website: www.legislation.govt.nz analysis is conducted.
Keyword: Artificial intelligence
Filter: Spain
• Document type: acts Website: https://www.boe.es/
• Status option: For the status option (example: acts in Keyword: inteligencia artificial
force, current bills, etc.) Filter:
• Type: law
Norway • Head of state (for passed laws)
Website: https://lovdata.no/
Keyword: kunstig intelligens Sweden
Website: https://www.riksdagen.se/
Russia Keyword: artificiell intelligens
Website: http://graph.garant.ru:8080/SESSION/PILOT/ Filter: Swedish Code of Statutes
main.htm (Database “The Federal Laws” in the official
website of the Federation Council of the Federal Assembly
of the Russian Federation.)
Keyword: искусственный интеллект
Filter:
• Words in text

Switzerland
Website: https://www.fedlex.admin.ch/
Keyword: intelligence artificielle
Filter:
• Text category: federal constitution, federal acts, and
federal decrees, miscellaneous texts, orders, and
other forms of legislation.
• Publication period for legislation was limited to 2015-
2021.
United Kingdom
Website: https://www.legislation.gov.uk/
Filter:
• Legislation Type: U.K. Public General Acts & U.K.
Statutory Instruments
United States
Website: https://www.congress.gov/
Filter:
• Source: Legislation
Status of legislation: Became law

M E N T I O N S O F A I I N A I - R E L AT E D L E G I S L AT I O N P R O C E E D I N G S
For mentions of AI in AI-related legislative proceedings around the world, the AI Index performed searches of the keyword
“artificial intelligence,” in respective languages, on the websites of 25 countries’ congresses or parliaments, usually under
sections named “minutes,” “hansard,” etc.
Australia Denmark
Website: https://www.aph.gov.au/Parliamentary_ Website: https://www.retsinformation.dk/
Business/Hansard Keyword: kunstig intelligens
Keyword: artificial intelligence Filter:
• Minutes
Belgium
Website: http://www.parlement.brussels/search_form_fr/ Finland
Keyword: intelligence artificielle Website: https://www.eduskunta.fi/
Filter Keyword: tiedot
• Document Type: all Filter:
• Parliamentary Affairs and Documents
Brazil • Public document: Minutes
Website: https://www2.camara.leg.br/atividade- • Actor: Plenary sessions
legislativa/discursos-e-notas-taquigraficas
Keyword: inteligência artificial France
Filter: Website: https://www.assemblee-nationale.fr/
• Federal legislation Keyword: intelligence artificielle
• Type: Law Filter:
• Reports of the debates in session
Canada Note: Such documents were only prepared starting in
Website: https://www.ourcommons.ca/PublicationSearch/ 2017.
en/?PubType=37
Keyword: artificial Intelligence Germany
Website: https://dip.bundestag.de/
China Keyword: künstliche Intelligenz
Website: Various reports on the work of the government Filter:
Keyword: 人工智能 • Speeches, requests to speak in the plenum
Note: The National People’s Congress is held once per
year and does not provide full legislative proceedings. India
Hence, the counts included in the analysis only searched Website: http://loksabhaph.nic.in/
the mentions of artificial intelligence in the only public Keyword: artificial intelligence
document released from the Congress meetings, the Filter:
Report on the Work of the Government, delivered by the • Exact word/phrase
Premier.
Ireland
Website: https://www.oireachtas.ie/
Filter: Content of parliamentary debates

Italy Singapore
Website: https://aic.camera.it/aic/search.html Website: https://sprs.parl.gov.sg/search/home
Keyword: intelligenza artificiale Keyword: artificial intelligence
Filter:
• Type: All South Africa
• Search by exact phrase Website: https://www.parliament.gov.za/hansard
Japan Note: This search function does not search within the
Website: https://kokkai.ndl.go.jp/#/ context of the full text and so no results were returned.
Keyword: 人工知能 Therefore, a Google search using the “site” function
Filter: to search https://www.parliament.gov.za/storage/
• Full text app/media/Docs/hansard/ with the keyword “artificial
• Law intelligence” is conducted.
Netherlands South Korea

Website: https://www.tweedekamer.nl/kamerstukken?pk_ Website: http://likms.assembly.go.kr/
campaign=breadcrumb Keyword: 인공 지능
Keyword: kunstmatige intelligentie Filter:
Filter: • Meeting Type: All
• Parliamentary papers - Plenary reports
Spain
New Zealand Website: https://www.congreso.es/
Website: https://www.parliament.nz/en/pb/hansard- Keyword: inteligencia artificial
debates/ Filter:
Keyword: artificial intelligence • Official publications of parliamentary proceedings
Norway Switzerland
Website: https://www.stortinget.no/no/Saker-og- Website: https://www.parlament.ch/
publikasjoner/Publikasjoner/Referater/ Keyword: intelligence artificielle
Keyword: kunstig intelligens Filter:
Note: This search function does not directly allow the • Parliamentary proceedings
keyword within minutes. Therefore, a Google search using
the “site” function to search the site with the keyword of Sweden
“artificial intelligence” is conducted. Website: https://www.riksdagen.se/sv/global/
sok/?q=&doktyp=prot
Russia Keyword: artificiell intelligens
Website: http://transcript.duma.gov.ru/ Filter:
Keyword: искусственный интеллект • Minutes
Filter:
• Words in text

United Kingdom Studies, Council on Foreign Relations, Heritage

https://hansard.parliament.uk/ Foundation, Hudson Institute, MacroPolo, National
Keyword: artificial intelligence Security Institute, New America Foundation, RAND
Filter Corporation, Rockefeller Foundation, Stimson
• References Center, Urban Institute, Wilson Center
• University Institutes & Research Programs: AI and
United States Humanity Cornell University; AI Now Institute, New
Website: https://www.congress.gov/ York University; AI Pulse, UCLA Law; Belfer Center for
Keyword: artificial intelligence Science and International Affairs, Harvard University;
Filter: Berkman Klein Center, Harvard University; Center for
• Source: Congressional record Information Technology Policy, Princeton University;
• Congressional record section: Senate, House of Center for Long-Term Cybersecurity, UC Berkeley;
Representatives, and Extensions of Remarks Center for Security and Emerging Technology,
Georgetown University; CITRUS Policy Lab, UC
Berkeley; Hoover Institution; Institute for Human-
U.S. A I P O L I CY PA P E R S Centered Artificial Intelligence, Stanford University;
Organizations Internet Policy Research Initiative, Massachusetts
To develop a more nuanced understanding of the thought Institute of Technology; MIT Lincoln Laboratory;
leadership that motivates AI policy, we tracked policy Princeton School of Public and International Affairs
papers published by 55 organizations in the United States
or with a strong presence in the United States (expanded Methodology
from the list of 36 organizations last year) across four Each broad topic area is based on a collection of
broad categories: underlying keywords that describe the content of the
• Civil Society, Associations & Consortiums: specific paper. We included 17 topics that represented the
Algorithmic Justice League, Alliance for Artificial majority of discourse related to AI between 2018-2021.
Intelligence in Healthcare, Amnesty International, These topic areas and the associated keywords are listed
EFF, Future of Privacy Forum, Human Rights Watch, below:
IJIS Institute, Institute for Electrical and Electronics • Health & Biological Sciences: medicine, healthcare
Engineers, Partnership on AI systems, drug discovery, care, biomedical research,
• Consultancy: Accenture, Bain & Company, Boston insurance, health behaviors, COVID-19, global health
Consulting Group, Deloitte, McKinsey & Company • Physical Sciences: chemistry, physics, astronomy,
• Government Agencies: Congressional Research earth science
Service, Library of Congress, Defense Technical • Energy & Environment: energy costs, climate change,
Information Center, Government Accountability energy markets, pollution, conservation, oil and gas,
Office, Pentagon Library alternative energy
• Private Sector Companies: Google AI, Microsoft AI, • International Affairs & International Security:
Nvidia, OpenAI international relations, international trade,
• Think Tanks & Policy Institutes: American Enterprise developing countries, humanitarian assistance,
Institute, Aspen Institute, Atlantic Council, Brookings warfare, regional security, national security,
Institute, Carnegie Endowment for International autonomous weapons
Peace, Cato Institute, Center for a New American • Justice & Law Enforcement: civil justice, criminal
Security, Center for Strategic and International justice, social justice, police, public safety, courts

• Communications & Media: social media,

disinformation, media markets, deepfakes
• Government & Public Administration: federal
government, state government, local government,
public sector efficiency, public sector effectiveness,
government services, government benefits,
government programs, public works, public
transportation
• Democracy: elections, rights, freedoms, liberties,
personal freedoms
• Industry & Regulation: economy, antitrust, M&A,
competition, finance, management, supply chain,
telecom, economic regulation, technical standards,
autonomous vehicle industry and regulation
• Innovation & Technology: advancements and
improvements in AI technology, R&D, intellectual
property, patents, entrepreneurship, innovation
ecosystems, startups, computer science, engineering
• Education & Skills: early childhood, K-12, higher
education, STEM, schools, classrooms, reskilling
• Workforce & Labor: labor supply and demand, talent,
immigration, migration, personnel economics, future
of work
• Social & Behavioral Sciences: sociology, linguistics,
anthropology, ethnic studies, demography,
geography, psychology, cognitive science
• Humanities: arts, music, literature, language,
performance, theater, classics, history, philosophy,
religion, cultural studies
• Equity & Inclusion: biases, discrimination, gender,
race, socioeconomic inequality, disabilities,
vulnerable populations
• Privacy, Safety & Security: anonymity, GDPR,
consumer protection, physical safety, human
control, cybersecurity, encryption, hacking
• Ethics: transparency, accountability, human
values, human rights, sustainability, explainability,
interpretability, decision-making norms

Artificial
Intelligence
Index Report 2022

2022 AI Index Report - Master (181 230)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 AI Index Report - Master (181 230)

Uploaded by

Copyright:

Available Formats

Artificial Intelligence CHAPTER 5: AI POLICY AND GOVERNANCE

Index Report 2022 5.1 AI and Policymaking

By State NUMBER of STATE-LEVEL PROPOSED AI-RELATED BILLS in the UNITED

proposed at least one AI-related VT NH

the most AI bills, with 40 since CA UT CO NE MO KY WV VA MD DE

(15) and Alabama (12).

NUMBER of STATE-LEVEL PROPOSED AI-RELATED BILLS in the UNITED

Table of Contents Chapter 5 Preview 181

Table of Contents Chapter 5 Preview 182

M E N T I O N S O F A I I N L E G I S L AT I V E from Bloomberg Government concerning mentions of AI-

MENTIONS of AI in the U.S. CONGRESSIONAL RECORD by LEGISLATIVE SESSION, 2001–21

Table of Contents Chapter 5 Preview 183

NUMBER of MENTIONS of AI in LEGISLATIVE PROCEEDINGS in 25 SELECT COUNTRIES, 2016–21

Table of Contents Chapter 5 Preview 184

NUMBER of MENTIONS of AI in LEGISLATIVE PROCEEDINGS in NUMBER of MENTIONS of AI in LEGISLATIVE PROCEEDINGS in

Spain 269 United Kingdom 939

Figure 5.1.10a Figure 5.1.10b

Table of Contents Chapter 5 Preview 185

U.S. A I P O L I CY PA P E R S specific recommendations to policymakers. Topics of

NUMBER of AI-RELATED POLICY PAPERS by U.S.-BASED ORGANIZATIONS, 2018–21

Table of Contents Chapter 5 Preview 186

By Topic topics, but they were reported on more frequently

NUMBER of AI-RELATED POLICY PAPERS by U.S.-BASED ORGANIZATIONS by TOPIC, 2021

Primary Topic Secondary Topic

Table of Contents Chapter 5 Preview 187

5.2 U.S. PUBLIC INVESTMENT IN AI

Table of Contents Chapter 5 Preview 188

U.S. D E PA R T M E N T O F D E F E N S E DOD allocated $9.26 billion across 500 AI R&D programs

Table of Contents Chapter 5 Preview 189

DOD Top Five Highest-Funded Programs

5 High Performance Army 96 Investigate, demonstrate, and mature both general

Table of Contents Chapter 5 Preview 190

Air Force Army DARPA DISA Navy OSD Other

Table of Contents Chapter 5 Preview 191

U.S. G OV E R N M E N T A I - R E L AT E D of more than 100 AI-specific keywords in their titles or

U.S. GOVERNMENT TOTAL CONTRACT SPENDING on AI, FY 2000–21

Table of Contents Chapter 5 Preview 192

TOP CONTRACT SPENDING on AI by U.S. GOVERNMENT DEPARTMENT and AGENCY, 2021

Department of Defense (DOD) 1,138

Table of Contents Chapter 5 Preview 193

Department of Defense (DOD) 5.20

National Aeronautics and Space Administration (NASA) 1.41

Department of Health and Human Services (HHS) 0.70

Department of Homeland Security (DHS) 0.45

Department of the Treasury (TREAS) 0.32

Department of Veterans A!airs (VA) 0.15

Department of Commerce (DOC) 0.15

Department of Agriculture (USDA) 0.07

Department of Energy (DOE) 0.06

Securities and Exchange Commission (SEC) 0.06

General Services Administration (GSA) 0.06

Department of State (DOS) 0.06

Social Security Administration (SSA) 0.05

Department of Transportation (DOT) 0.05

Table of Contents Chapter 5 Preview 194

Largest Contract for Five Top-Spending

SBIR-Autonomous Surveillance DHS 37 To construct towers capable of autonomous

Schedule 70: Information Technology DOC 13 To develop a prototype using AI technology

Table of Contents Chapter 5 Preview 195

CHAPTER 2 Technical Performance 200

CHAPTER 3 Technical AI Ethics 212

CHAPTER 4 The Economy and Education 216

CHAPTER 5 AI Policy and Governance 222

• Communications & Media: social media,