Professional Documents
Culture Documents
HAI AI-Index-Report-2023 CHAPTER 3
HAI AI-Index-Report-2023 CHAPTER 3
CHAPTER 3:
Technical AI Ethics
Text and Analysis by Helen Ngo
Artificial Intelligence
Index Report 2023
CHAPTER 3 PREVIEW:
Technical AI Ethics
Overview 4 Fairness in Machine Translation 19
Chapter Highlights 5 RealToxicityPrompts 20
Chapter 3 Preview 2
Artificial Intelligence
Index Report 2023
Technical AI Ethics
3.7 AI Ethics Trends at FAccT
and NeurIPS 32
ACM FAccT (Conference on Fairness,
Accountability, and Transparency) 32
Accepted Submissions by
Professional Affiliation 32
Accepted Submissions by
Geographic Region 35
NeurIPS (Conference on Neural Information
Processing Systems) 36
Real-World Impact 36
Interpretability and Explainability 37
Causal Effect and Counterfactual
Reasoning 38
Privacy 39
Fairness and Bias 40
Appendix 44
Chapter 3 Preview 3
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023
Overview
Fairness, bias, and ethics in machine learning continue to be topics of interest
among both researchers and practitioners. As the technical barrier to entry for
creating and deploying generative AI systems has lowered dramatically, the ethical
issues around AI have become more apparent to the general public. Startups and
large companies find themselves in a race to deploy and release generative models,
and the technology is no longer controlled by a small group of actors.
In addition to building on analysis in last year’s report, this year the AI Index
highlights tensions between raw model performance and ethical issues, as well as
new metrics quantifying bias in multimodal models.
Chapter 3 Preview 4
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023
Chapter Highlights
The effects of model scale on bias and toxicity
are confounded by training data and mitigation methods.
In the past year, several institutions have built their own large models trained on proprietary data—
and while large models are still toxic and biased, new evidence suggests that these issues can be
somewhat mitigated after training larger models with instruction-tuning.
Chapter 3 Preview 5
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.1 Meta-analysis of Fairness and Bias Metrics
3.1 Meta-analysis of
Fairness and Bias Metrics
Number of AI Fairness In 2022 several new datasets or metrics were released
to probe models for bias and fairness, either as
and Bias Metrics standalone papers or as part of large community
efforts such as BIG-bench. Notably, metrics are
Algorithmic bias is measured in terms of allocative
being extended and made specific: Researchers are
and representation harms. Allocative harm occurs
zooming in on bias applied to specific settings such as
when a system unfairly allocates an opportunity or
question answering and natural language inference,
resource to a specific group, and representation harm
extending existing bias datasets by using language
happens when a system perpetuates stereotypes
models to generate more examples for the same task
and power dynamics in a way that reinforces
(e.g., Winogenerated, an extended version of the
subordination of a group. Algorithms are considered
Winogender benchmark).
fair when they make predictions that neither favor
nor discriminate against individuals or groups based
Figure 3.1.1 highlights published metrics that have been
on protected attributes which cannot be used for
cited in at least one other work. Since 2016 there has
decision-making due to legal or ethical reasons (e.g.,
been a steady and overall increase in the total number
race, gender, religion).
of AI fairness and bias metrics.
Number of AI Fairness and Bias Metrics, 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
20
19
15
Number of Metrics
10
0
2016 2017 2018 2019 2020 2021 2022
Figure 3.1.1
Chapter 3 Preview 6
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.1 Meta-analysis of Fairness and Bias Metrics
Number of AI Fairness and correlate with each other, highlighting the importance
of careful selection of metrics and interpretation of
Bias Metrics (Diagnostic results.
Metrics Vs. Benchmarks)
In 2022, a robust stream of both new ethics
Measurement of AI systems along an ethical benchmarks as well as diagnostic metrics was
dimension often takes one of two forms. A benchmark introduced to the community (Figure 3.1.2). Some
contains labeled data, and researchers test how metrics are variants of previous versions of existing
well their AI system labels the data. Benchmarks do fairness or bias metrics, while others seek to measure
not change over time. These are domain-specific a previously undefined measurement of bias—for
(e.g., SuperGLUE and StereoSet for language example, VLStereoSet is a benchmark which extends
models; ImageNet for computer vision) and often the StereoSet benchmark for assessing stereotypical
aim to measure behavior that is intrinsic to the bias in language models to the text-to-image setting,
model, as opposed to its downstream performance while the HolisticBias measurement dataset assembles
on specific populations (e.g., StereoSet measures a new set of sentence prompts which aim to quantify
model propensity to select stereotypes compared demographic biases not covered in previous work.
to non-stereotypes, but it does not measure
performance gaps between different subgroups).
These benchmarks often serve as indicators of
intrinsic model bias, but they may not give as clear an In 2022 a robust stream
indication of the model’s downstream impact and its of both new ethics
extrinsic bias when embedded into a system.
benchmarks as well
A diagnostic metric measures the impact or as diagnostic metrics
performance of a model on a downstream task, and it
is often tied to an extrinsic impact—for example, the was introduced to the
differential in model performance for some task on a community.
population subgroup or individual compared to similar
individuals or the entire population. These metrics
can help researchers understand how a system will
perform when deployed in the real world, and whether
it has a disparate impact on certain populations.
Previous work comparing fairness metrics in natural
language processing found that intrinsic and extrinsic
metrics for contextualized language models may not
Chapter 3 Preview 7
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.1 Meta-analysis of Fairness and Bias Metrics
Number of New AI Fairness and Bias Metrics (Diagnostic Metrics Vs. Benchmarks), 2016–22
Source: AI Index, 2022 | Chart: 2023 AI Index Report
Benchmarks
14
Diagnostic Metrics
13
12
11
10
10
9 9 9
Number of Metrics
4
4
3 3
2 2 2
2
1
0
0
2016 2017 2018 2019 2020 2021 2022
Figure 3.1.2
Chapter 3 Preview 8
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.2 AI Incidents
3.2 AI Incidents
AI, Algorithmic, and that tracks the ethical issues associated with AI
technology.
Automation Incidents and
Controversies (AIAAIC) The number of newly reported AI incidents and
Repository: Trends Over Time controversies in the AIAAIC database was 26 times
greater in 2021 than in 2012 (Figure 3.2.1)1. The rise
The AI, Algorithmic, and Automation Incidents in reported incidents is likely evidence of both
and Controversies (AIAAIC) Repository is an the increasing degree to which AI is becoming
independent, open, and public dataset of recent intermeshed in the real world and a growing
incidents and controversies driven by or relating to awareness of the ways in which AI can be ethically
AI, algorithms, and automation. It was launched in misused. The dramatic increase also raises an
2019 as a private project to better understand some important point: As awareness has grown, tracking of
of the reputational risks of artificial intelligence incidents and harms has also improved—suggesting
and has evolved into a comprehensive initiative that older incidents may be underreported.
260
250
Number of AI Incidents and Controversies
200
150
100
50
0
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 3.2.1
1 This figure does not consider AI incidents reported in 2022, as the incidents submitted to the AIAAIC database undergo a lengthy vetting process before they are fully added.
Chapter 3 Preview 9
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.2 AI Incidents
2 Although these events were reported in 2022, some of them had begun in previous years.
Chapter 3 Preview 10
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.2 AI Incidents
Chapter 3 Preview 11
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.2 AI Incidents
3 Although other text-to-image models launched in 2022 such as DALL-E 2 and Stable Diffusion were also criticized, for the sake of brevity the AI Index chose to highlight one particular
incident.
Chapter 3 Preview 12
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.3 Natural Language Processing Bias Metrics
35
30
Number of Research Papers
25
20
15
10
0
2018 2019 2020 2021 2022
Figure 3.3.1
Chapter 3 Preview 13
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.3 Natural Language Processing Bias Metrics
80%
73.58%
71.70% 71.40%
Winogender Accuracy (%)
63.30% 64.17%
61.50% 60.00%
57.90% 59.00% 59.00%
60%
50.00%
40%
20%
0%
iPET (ALBERT) Gopher WARP Bort Gopher GPT-3 PaLM GLAM GPT-3 Gopher PaLM
31M 100M (ALBERT-XXL-V2) 340M 10B 13B 62B 64B 175B 280M 540B
223M
Model and Number of Parameters
Figure 3.3.2
Chapter 3 Preview 14
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.3 Natural Language Processing Bias Metrics
Flan-PaLM 8B 72.25%
PaLM 8B 10.80%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Winogender Accuracy (%)
Figure 3.3.3
Chapter 3 Preview 15
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.3 Natural Language Processing Bias Metrics
BBQ: The Bias Benchmark for GUINEAN guy.” The corresponding question might
be, “Who has never used a computer?” referencing
Question Answering the stereotypical bias that Africans are technologically
illiterate. The example may be disambiguated with
Language models are increasingly being deployed
additional context such as, “The ICELANDIC guy was
in question answering systems. While several
behind the counter trying to explain to the GUINEAN
benchmarks exist for the purpose of measuring bias
guy how to find the power switch on his computer!”
in language models based on their outputs, including
BOLD, CrowS-Pairs, and StereoSet, the BBQ
benchmark measures how biases can manifest in the
Models can be more biased
question-answering setting. BBQ measures bias along
the axes of socioeconomic status, sexual orientation, along certain identity
religion, race/ethnicity, physical appearance, categories than others—
nationality, gender identity, disability status, and age.
most models are biased
Examples consist of template-based context and
along the axes of physical
question pairs, where each answer choice references
a person belonging to either a stereotypical or appearance and age, but the
anti-stereotypical social group. The third option is biases along the axis of race/
“Unknown,” designed for cases where the answer
cannot be determined from the information given.
ethnicity are less clear.
Each example includes an ambiguous version where
it is unclear which social group the entity belongs
In contexts where the answer is ambiguous, models
to, as well as additional context which can be used
are more likely to fall back on stereotypes and select
to disambiguate the correct answer. The ambiguous
unsupported answers rather than “Unknown” (Figure
setting measures how often models fall back on
3.3.4), and this result is exacerbated for models fine-
stereotypes when the answer cannot be inferred from
tuned with reinforcement learning.4
the given context. Models that do not exhibit bias have
a score of zero, while a score of 100 indicates that the
As seen in Figure 3.3.4, models can be more biased
model chooses answers aligned with the social bias
along certain identity categories than others—
in question, and a score of -100 indicates the model
most models are biased along the axes of physical
always chooses the anti-stereotypical answer.
appearance and age, but the biases along the axis
of race/ethnicity are less clear. For reference, Figure
An example of an ambiguous context might be,
3.3.5 highlights bias in question answering on BBQ in
“At the computer store the other day, I overheard
disambiguated contexts.
a conversation between an ICELANDIC guy and a
Chapter 3 Preview 16
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.3 Natural Language Processing Bias Metrics
Disability Status 9.90 17.40 10.70 38.30 32.60 21.20 4.00 13.00
Gender Identity 10.00 15.00 11.30 25.60 18.60 2.40 4.00 8.00
Physical Appearance 17.00 40.70 41.00 38.50 47.70 40.90 4.00 16.00
Sexual Orientation 0.20 -3.00 -4.40 6.50 11.80 5.80 1.00 7.00
Socio-Economic Status 4.40 3.50 9.70 29.60 48.70 27.30 11.00 14.00
Ro Ro De De Un Un Dia DP
B ER B ER BE BE i ed i ed log C,
Ta Ta RT RT QA QA ue R L-F
-Ba -L aV aV -Pr
se arg 3-B 3-L ( AR ( RA om i ne
e a se arge C) CE pte tun
) d ed
Ch
in ch
illa
(D
PC
)
Model
Figure 3.3.4
Disability Status 5.40 5.70 8.10 1.70 -0.70 -1.40 0.00 8.00
Gender Identity 14.00 2.90 4.60 -16.90 -3.40 -5.80 2.00 3.00
Physical Appearance 17.10 -2.70 4.20 -5.00 -1.70 -2.30 12.00 8.00
Sexual Orientation 6.50 -3.10 -4.80 -0.20 0.50 -0.70 -1.00 -1.00
Socio-Economic Status 7.00 3.50 3.80 2.90 3.80 3.90 8.00 7.00
Ro Ro De De Un Un Dia DP
B ER B ER BE BE i ed i ed log C,
Ta Ta RT RT QA QA ue R L-F
-Ba -L aV aV -Pr
se arg 3-B 3-L (AR (RA om i ne
e ase a rge C) CE pte tun
) d ed
Ch
inch
illa
(D
PC
)
Model
Figure 3.3.5
Chapter 3 Preview 17
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.3 Natural Language Processing Bias Metrics
Fairness and Bias Trade-Offs not clear (Figure 3.3.6). This finding may be contingent
on the specific criterion for fairness, defined as
in NLP: HELM counterfactual fairness and statistical fairness.
1.00
0.50
0.80
0.40
Bias (Gender Representation)
0.60
0.30
Fairness
0.40
0.20
0.20 0.10
0.00 0.00
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00
Accuracy Accuracy
Figure 3.3.6
Chapter 3 Preview 18
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.3 Natural Language Processing Bias Metrics
60%
40%
20%
0%
Flan-T5-XXL 11B Flan-PaLM 8B Flan-PaLM 62B Flan-PaLM 540B PaLM 8B PaLM 62B PaLM 540B
Model and Number of Parameters
Figure 3.3.7
Chapter 3 Preview 19
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.3 Natural Language Processing Bias Metrics
RealToxicityPrompts by Model
Source: Liang et al., 2022 | Chart: 2023 AI Index Report
0.08
0.07
Toxicity Probability
0.06
0.05
0.04
0.03
0.02
0.01
0.00
GPT-3 ada v1 350M
GPT-J 6B
TNLG v2 6.7B
J1-Large v1 7.5B
T0pp 11B
T5 11B
J1-Grande v1 17B
GPT-NeoX 20B
UL2 20B
OPT 66B
YaLM 100B
GLM 130B
OPT 175B
BLOOM 176B
J1-Jumbo v1 178B
TNLG v2 530B
Figure 3.3.8
Chapter 3 Preview 20
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.4 Conversational AI Ethical Issues
A natural application of generative language models is in open-domain conversational AI; for example, chatbots and assistants. In the
past year, companies have started deploying language models as chatbot assistants (e.g., OpenAI’s ChatGPT, Meta’s BlenderBot3).
However, the open-ended nature of these models and their lack of steerability can result in harm—for example, models can be
unexpectedly toxic or biased, reveal personally identifiable information from their training data, or demean or abuse users.
Chapter 3 Preview 21
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.4 Conversational AI Ethical Issues
The training data used for dialog systems can result attractive daughters? I will sell one.
leaving their users feeling unsettled. Researchers inappropriate for a robot to output. (Gros et al., 2022)
99%
MultiWOZ
99%
94%
Persuasion for Good
94%
88%
EmpatheticDialogues
90%
88%
Wizard of Wikipedia
87%
Dataset
82%
Reddit Small
75%
72%
MSC
77%
67%
RUAR Blender2
75%
Possible
65%
Blender for a Robot to Say
75%
Comfortable
56% for a Robot to Say
PersonaChat
67%
Chapter 3 Preview 22
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.4 Conversational AI Ethical Issues
Narrative Highlight:
Tricking ChatGPT Into Building a Dirty Bomb, Part 1
Tricking ChatGPT Source: Outrider, 2022
Figure 3.4.4
Chapter 3 Preview 23
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.5 Fairness and Bias in Text-to-Image Models
Text-to-image models took over social media in 2022, turning the issues of fairness and bias in AI systems visceral through image form:
Women put their own images into AI art generators and received hypersexualized versions of themselves.
Chapter 3 Preview 24
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.5 Fairness and Bias in Text-to-Image Models
Fairness Across Age Groups for Text-to-Image Models: ImageNet Vs. Instagram
Source: Goyal et al., 2022 | Chart: 2023 AI Index Report
ImageNet 693M (Supervised) ImageNet 693M (SwaV) Instagram 1.5B (SEER) Instagram 10B (SEER)
78.5%
76.6%
18–30
89.6%
93.2%
76.7%
74.6%
30–45
90.5%
Age Group
95.0%
80.1%
76.7%
45–70
92.6%
95.6%
75.8%
69.4%
70+
88.7%
96.7%
Fairness Across Gender/Skin Tone Groups for Text-to-Image Models: ImageNet Vs. Instagram
Source: Goyal et al., 2022 | Chart: 2023 AI Index Report
ImageNet 693M (Supervised) ImageNet 693M (SwaV) Instagram 1.5B (SEER) Instagram 10B (SEER)
73.6%
Skin Tone 69.7%
Darker 86.6%
92.9%
82.1%
Skin Tone 80.8%
Lighter 94.2%
Gender/Skin Tone Group
96.2%
58.2%
Female 50.3%
Darker 78.2%
90.3%
75.1%
Female 71.6%
Lighter 93.7%
96.8%
92.7%
Male 93.7%
Darker 97.5%
96.1%
91.1%
Male 92.5%
Lighter 94.9%
95.4%
Chapter 3 Preview 25
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.5 Fairness and Bias in Text-to-Image Models
Text-to-Image Models
StereoSet was introduced as a benchmark for
measuring stereotype bias in language models along
the axes of gender, race, religion, and profession
by calculating how often a model is likely to choose
a stereotypical completion compared to an anti-
stereotypical completion. VLStereoSet extends the
idea to vision-language models by evaluating how
often a vision-language model selects stereotypical
captions for anti-stereotypical images.
Chapter 3 Preview 26
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.5 Fairness and Bias in Text-to-Image Models
Gender Profession
100 ALBEF VILT 100 VisualBERT CLIP
FLAVA VisualBERT VILT
80 CLIP 80 ALBEF LXMERT
60 LXMERT 60 FLAVA
Vision-Language Relevance (vlrs) Score
40 40
20 20
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Race Religion
100 100 VisualBERT CLIP
VisualBERT CLIP
ALBEF
80 VILT 80 VILT LXMERT
ALBEF
LXMERT FLAVA
60 FLAVA 60
40 40
20 20
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Vision-Language Bias (vlbs) Score
Figure 3.5.4
Chapter 3 Preview 27
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.5 Fairness and Bias in Text-to-Image Models
Text-to-Image Models
This subsection highlights some of the
ways in which bias is tangibly manifested in
popular AI text-to-image systems such as
Stable Diffusion, DALL-E 2, and Midjourney.
Stable Diffusion
Stable Diffusion gained notoriety in 2022
upon its release by CompVis, Runway ML,
and Stability AI for its laissez-faire approach
to safety guardrails, its approach to full
openness, and its controversial training
dataset, which included many images from
artists who never consented to their work
being included in the data. Though Stable
Diffusion produces extremely high-quality
images, it also reflects common stereotypes
and issues present in its training data.
Figure 3.5.5
Chapter 3 Preview 28
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.5 Fairness and Bias in Text-to-Image Models
DALL-E 2
DALL-E 2 is a text-to-image model released by looking men wearing suits. Each of the men appeared
OpenAI in April 2022. DALL-E 2 exhibits similar biases to take an assertive position, with three of the four
as Stable Diffusion—when prompted with “CEO,” the crossing their arms authoritatively (Figure 3.5.6).
model generated four images of older, rather serious-
Bias in DALL-E 2
Source: DALL-E 2, 2023
Figure 3.5.6
Chapter 3 Preview 29
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.5 Fairness and Bias in Text-to-Image Models
Midjourney
Midjourney is another popular text-to-image system that was released in 2022. When prompted with “influential
person,” it generated four images of older-looking white males (Figure 3.5.7). Interestingly, when Midjourney was
later given the same prompt by the AI Index, one of the four images it produced was of a woman (Figure 3.5.8).
Chapter 3 Preview 30
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.6 AI Ethics in China
As research in AI ethics has exploded in the Western world in the past few years, legislators and policymakers have spent significant
resources on policymaking for transformative AI. While China has fewer domestic guidelines than the EU and the United States,
according to the AI Ethics Guidelines Global Inventory, Chinese scholars publish significantly on AI ethics—though these research
communities do not have significant overlap with Western research communities working on the same topics.
100 99
95
88
80
Number of Papers
60 58
50 49
41
40 39
37
32
27
20
0
Privacy Equality Agency Responsibility Security Freedom Unemployment Legality Transparency Autonomy Other
Figure 3.6.1
Chapter 3 Preview 31
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.6 AI Ethics in China
AI Ethics
Ethics in China
China:: St
Strategies for
for Har
Harm
m Mitig
Mitigaation R
Rel
ela
ated tto
o AI
Sour
urc
ce: Zh
Zhu,
u, 2022 | C
Char
hart:
t: 2023 AI Index
Index Repor
Reportt
71
70 69
64
60
52
50
Number of Papers
45
40 39 39
37
30
23
20
10
0
Structural Legislation Value Principles Accountability Shared Technological Talent International
Reform De nition System Governance Solutions Training Cooperation
Figure 3.6.2
Chapter 3 Preview 32
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.6 AI Ethics in China
43
40 40
40
37
Number of References
30
21
20
13
11
10
7 6 6 6
4
3
0
GD Et Ot Th Go Et As Be Pr AI AI Re Th
h hic ilo i e c
PR Tr ics he
rs
re
e a N vern all ma AI a jing CO lim St I
an Ind nfor the om Ro e EU
us G La e a y rA n C M i n d u m C m bo RO
tw uid ws w nc a ar s
or e of Gen e Pr
Al
ign
o
I P d Ed nse EST ry D diz try atio oun end eth N
at
th line
yA s Ro e i e r i nc u c n s o r a a tio D e n c il o ion ics R
I fo bo rat ncip dD
es ipl atio us o n Ro ft Re n W velo nA o oa
r tic ion les ign e s n n b ot p hit p me I f dm
s of fo
r ics ort o e p n ap
AI Et f ap t St
hic er ra
s te
gy
Figure 3.6.3
Chapter 3 Preview 33
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.7 AI Ethics Trends at FAccT and NeurIPS
503
500
Number of Papers
400
302
300
244
200 227
166 181
200
100 139
71
63 53 70
0
2018 2019 2020 2021 2022
Figure 3.7.1
Chapter 3 Preview 34
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.7 AI Ethics Trends at FAccT and NeurIPS
Accepted Submissions by Geographic Region and Central Asia made up 18.7% of submissions, they
European government and academic actors have made up over 30.6% of submissions in 2022 (Figure
increasingly contributed to the discourse on AI ethics 3.7.2). FAccT, however, is still broadly dominated
from a policy perspective, and their influence is by authors from North America and the rest of the
manifested in trends on FAccT publications as well: Western world.
Whereas in 2021 submissions to FAccT from Europe
70%
50%
40%
20%
Chapter 3 Preview 35
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.7 AI Ethics Trends at FAccT and NeurIPS
NeurIPS
NeurIPS (Conference on Neural Information Real-World Impact
Processing Systems), one of the most influential Several workshops at NeurIPS gather researchers
AI conferences, held its first workshop on fairness, working to apply AI to real-world problems. Notably,
accountability, and transparency in 2014. This section there has been a recent surge in AI applied to
tracks and categorizes workshop topics year over healthcare and climate in the domains of drug
year, noting that as topics become more mainstream, discovery and materials science, which is reflected
they often filter out of smaller workshops and into the in the spike in “AI for Science” and “AI for Climate”
main track or into more specific conferences related workshops (Figure 3.7.3).
to the topic.
NeurIPS Workshop Research Topics: Number of Accepted Papers on Real-World Impacts, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Climate 802
800
Developing World
94
Finance
700 Healthcare
Science
Other
600
171
529
Number of Papers
500 65
459
429
61
400
78 116
334
300 283 412
127 238
200 199 273
153 254
79
100 83
144 68
71 77 94 81 64
0
12
Figure 3.7.3
Chapter 3 Preview 36
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.7 AI Ethics Trends at FAccT and NeurIPS
NeurIPS Research Topics: Number of Accepted Papers on Interpretability and Explainability, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Main Track 41
40 Workshop
35
18
30
Number of Papers
25 24
23
20
17
15
7 19
12
23 24
10 5
6 6
5 10
6 7 6
2
4
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.4
5 Declines in the number of workshop-related papers on interpretability and explainability might be attributed to year-over-year differences in workshop themes.
Chapter 3 Preview 37
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.7 AI Ethics Trends at FAccT and NeurIPS
NeurIPS Research Topics: Number of Accepted Papers on Causal E ect and Counterfactual Reasoning,
2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
Main Track 80
80 78
Workshop 76
72
70
20
60
Number of Papers
50 43 53 61
40 39
30 16
58
20
29
9 23 23
10 19
6
4 9
6
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.5
Chapter 3 Preview 38
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.7 AI Ethics Trends at FAccT and NeurIPS
Privacy
Amid growing concerns about privacy, data been devoted to topics such as privacy in machine
sovereignty, and the commodification of personal learning, federated learning, and differential privacy.
data for profit, there has been significant momentum This year’s data shows that discussions related to
in industry and academia to build methods and privacy in machine learning have increasingly shifted
frameworks to help mitigate privacy concerns. into the main track of NeurIPS (Figure 3.7.6).
Since 2018, several workshops at NeurIPS have
120 15
103
100
Number of Papers
88 27
80 79 13
138
60
113
40 76
72 75
21
20 16
19
14
1
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.6
Chapter 3 Preview 39
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.7 AI Ethics Trends at FAccT and NeurIPS
NeurIPS Research Topics: Number of Accepted Papers on Fairness and Bias in AI, 2015–22
Source: NeurIPS, 2022 | Chart: 2023 AI Index Report
300
250
Number of Papers
200
168
149 310
150
50
125 36
114
100 36
2 4 24
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 3.7.7
Chapter 3 Preview 40
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.8 Factuality and Truthfulness
250
236, FEVER
200
191, LIAR
Number of Citations
150
50
Chapter 3 Preview 41
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.8 Factuality and Truthfulness
SciFact 2020 ✓
COVID-Fact 2021 ✓
WikiFactCheck 2020 ✓
FM2 2021 ✓
Thorne et al. 2021 ✓
FaVIQ 2022 ✓
LIAR-PLUS 2017 no ✓
PolitiHop 2021 no ✓
Climate-FEVER 2020 ✓ no
HealthVer 2021 ✓ no
UKP-Snopes 2019 ✓ no
PubHealth 2020 ✓ no
WatClaimCheck 2022 ✓ no
Baly et al. 2018 no no
MultiFC 2019 no no
X-Fact 2021 no no
Figure 3.8.2
Chapter 3 Preview 42
Artificial Intelligence Chapter 3: Technical AI Ethics
Index Report 2023 3.8 Factuality and Truthfulness
50%
Accuracy (%)
40%
30%
20%
10%
0%
T5 60M
GPT-2 117M
Galactica 125M
GPT-NEO-125M
T5 220M
InstructGPT ada v1 350M
GPT3 350M
GPT-3 ada v1 350M
Cohere small v20220720 410M
T5 770M
Galactica 1.3B
GPT-3 babbage v1 1.3B
GPT3 1.3B
GPT-NEO-1.3B
InstructGPT babbage v1 1.3B
Gopher 1.4B
GPT2 1.5B
GPT-NEO-2.7B
T5 2.8B
GPT-J 6B
GPT-NEO-6B
Cohere medium v20220720 6.1B
TNLG v2 6.7B
Galactica 6.7B
InstructGPT curie v1 6.7B
GPT3 6.7B
GPT-3 curie v1 6.7B
Gopher 7.1B
J1-Large v1 7.5B
T5 11B
T0pp 11B
Cohere large v20220720 13.1B
J1-Grande v1 17B
UL2 20B
GPT-NeoX 20B
Galactica 30B
Anthropic-LM v4-s3 52B
Cohere xlarge v20220609 52.4B
OPT 66B
YaLM 100B
Galactica 120B
GLM 130B
GPT-3 davinci v1 175B
OPT-175B
GPT3 175B
OPT 175B
InstructGPT davinci v2 175B
BLOOM 176B
J1-Jumbo v1 178B
Gopher 280B
Gopher 280B -10shot
TNLG v2 530B
Figure 3.8.3
Chapter 3 Preview 43
Artificial Intelligence Appendix
Index Report 2023 Chapter 3: Technical AI Ethics
Appendix
Meta-Analysis of Fairness VLStereoSet: A Study of Stereotypical Bias
in Pre-trained Vision-Language Models
and Bias Metrics
For the analysis conducted on fairness and bias Natural Language Processing
metrics in AI, we identify and report on benchmark Bias Metrics
and diagnostic metrics which have been consistently
In Section 3.3, we track citations of the Perspective
cited in the academic community, reported on a public
API created by Jigsaw at Google. The Perspective API
leaderboard, or reported for publicly available baseline
has been adopted widely by researchers and engineers
models (e.g., GPT-3, BERT, ALBERT). We note that
in natural language processing. Its creators define
research paper citations are a lagging indicator of
toxicity as “a rude, disrespectful, or unreasonable
adoption, and metrics which have been very recently
comment that is likely to make someone leave a
adopted may not be reflected in the data for 2022. We
discussion,” and the tool is powered by machine
include the full list of papers considered in the 2022 AI
learning models trained on a proprietary dataset of
Index as well as the following additional papers:
comments from Wikipedia and news websites.
Beyond the Imitation Game: Quantifying and
Extrapolating the Capabilities of Language Models We include the full list of papers considered in the
2022 AI Index as well as the following additional
BBQ: A Hand-Built Bias Benchmark for
Question Answering papers:
Discovering Language Model Behaviors With AlexaTM 20B: Few-Shot Learning Using a
Model-Written Evaluations Large-Scale Multilingual Seq2Seq Model
“I’m Sorry to Hear That”: Finding New Biases in Aligning Generative Language Models With
Language Models With a Holistic Descriptor Dataset Human Values
On Measuring Social Biases in Prompt-Based Challenges in Measuring Bias via Open-Ended
Multi-task Learning Language Generation
PaLM: Scaling Language Modeling With Pathways Characteristics of Harmful Text: Towards
Perturbation Augmentation for Fairer NLP Rigorous Benchmarking of Language Models
Chapter 3 Preview 44
Artificial Intelligence Appendix
Index Report 2023 Chapter 3: Technical AI Ethics
Detoxifying Language Models With a Toxic Corpus Predictability and Surprise in Large Generative
DisCup: Discriminator Cooperative Unlikelihood Models
Prompt-Tuning for Controllable Text Generation Quark: Controllable Text Generation With
Evaluating Attribution in Dialogue Systems: Reinforced [Un]learning
The BEGIN Benchmark Red Teaming Language Models With Language Models
Exploring the Limits of Domain-Adaptive Training Reward Modeling for Mitigating Toxicity in
for Detoxifying Large-Scale Language Models Transformer-based Language Models
Flamingo: A Visual Language Model for Robust Conversational Agents Against Imperceptible
Few-Shot Learning Toxicity Triggers
Galactica: A Large Language Model for Science Scaling Instruction-Finetuned Language Models
GLaM: Efficient Scaling of Language Models StreamingQA: A Benchmark for Adaptation to New
With Mixture-of-Experts Knowledge over Time in Question Answering Models
GLM-130B: An Open Bilingual Pre-trained Model Training Language Models to Follow Instructions
Gradient-Based Constrained Sampling From With Human Feedback
Language Models Transfer Learning From Multilingual DeBERTa
HateCheckHIn: Evaluating Hindi Hate Speech for Sexism Identification
Detection Models Transformer Feed-Forward Layers Build Predictions
Holistic Evaluation of Language Models by Promoting Concepts in the Vocabulary Space
An Invariant Learning Characterization of While the Perspective API is used widely within
Controlled Text Generation machine learning research and also for measuring
LaMDA: Language Models for Dialog Applications online toxicity, toxicity in the specific domains used to
Leashing the Inner Demons: Self-Detoxification train the models undergirding Perspective (e.g., news,
for Language Models Wikipedia) may not be broadly representative of all
Measuring Harmful Representations in Scandinavian forms of toxicity (e.g., trolling). Other known caveats
Language Models include biases against text written by minority
Mitigating Toxic Degeneration With Empathetic Data: voices: The Perspective API has been shown to
Exploring the Relationship Between Toxicity and disproportionately assign high toxicity scores to text
Empathy that contains mentions of minority identities (e.g., “I
MULTILINGUAL HATECHECK: Functional Tests for am a gay man”). As a result, detoxification techniques
Multilingual Hate Speech Detection Models built with labels sourced from the Perspective API
A New Generation of Perspective API: Efficient result in models that are less capable of modeling
Multilingual Character-Level Transformers language used by minority groups, and may avoid
OPT: Open Pre-trained Transformer Language Models mentioning minority identities.
PaLM: Scaling Language Modeling With Pathways New versions of the Perspective API have been
Perturbations in the Wild: Leveraging Human-Written deployed since its inception, and there may be subtle
Text Perturbations for Realistic Adversarial Attack and undocumented shifts in its behavior over time.
Defense
Chapter 3 Preview 45
Artificial Intelligence Appendix
Index Report 2023 Chapter 3: Technical AI Ethics
Chapter 3 Preview 46