You are on page 1of 50

Artificial Intelligence

Index Report 2022

CHAPTER 2:
Technical
Performance
Artificial Intelligence
Index Report 2022 CHAPTER 2: TECHNICAL PERFORMANCE

CHAPTER 2:

Chapter Preview
Overview 50 Medical Image Segmentation 61
Chapter Highlights 51 CVC-ClinicDB and Kvasir-SEG 61
Face Detection and Recognition 62
2.1 COMPUTER VISION—IMAGE 52 National Institute of Standards
Image Classification 52 and Technology (NIST) Face
Recognition Vendor Test (FRVT) 62
ImageNet 52
Face Detection: Effects of Mask-Wearing 63
ImageNet: Top-1 Accuracy 52
Face Recognition Vendor Test (FRVT):
ImageNet: Top-5 Accuracy 52
Face-Mask Effects 63
Image Generation 54
Highlight: Masked Labeled Faces
STL-10: Fréchet Inception in the Wild (MLFW) 64
Distance (FID) Score 54
Visual Reasoning 65
CIFAR-10: Fréchet Inception
Visual Question Answering
Distance (FID) Score 55
(VQA) Challenge 65
Deepfake Detection 56
FaceForensics++ 56 2.2 COMPUTER VISION—VIDEO 67
Celeb-DF 57 Activity Recognition 67
Human Pose Estimation 57 Kinetics-400, Kinetics-600, Kinetics-700 67
Leeds Sports Poses: Percentage ActivityNet: Temporal Action
of Correct Keypoints (PCK) 58 Localization Task 69
Human3.6M: Average Mean Object Detection 70
Per Joint Position Error (MPJPE) 59
Common Object in Context (COCO) 71
Semantic Segmentation 60
You Only Look Once (YOLO) 72
Cityscapes 60
Visual Commonsense Reasoning (VCR) 73

ACCESS THE PUBLIC DATA

Table of Contents 48
Artificial Intelligence
Index Report 2022 CHAPTER 2: TECHNICAL PERFORMANCE

CHAPTER 2: CHAPTER PREVIEW (CONT’D)

2.3 LANGUAGE 74 2.4 SPEECH 86


English Language Understanding 74 Speech Recognition 86
SuperGLUE 74 Transcribe Speech: LibriSpeech
Stanford Question Answering (Test-Clean and Other Datasets) 86
Dataset (SQuAD) 75 VoxCeleb 87
Reading Comprehension Dataset
Requiring Logical Reasoning (ReClor) 76 2.5 RECOMMENDATION 88
Text Summarization 78 Commercial Recommendation:
arXiv 78 MovieLens 20M 88

PubMed 79 Click-Through Rate Prediction: Criteo 89

Natural Language Inference 80


2.6 REINFORCEMENT LEARNING 90
Stanford Natural Language
Inference (SNLI) 80 Reinforcement Learning Environments 90

Abductive Natural Language Arcade Learning Environment: Atari-57 90


Inference (aNLI) 81 Procgen 91
Sentiment Analysis 82 Human Games: Chess 93
SemEval 2014 Task 4 Sub Task 2 82
Machine Translation (MT) 83 2.7 HARDWARE 94

WMT 2014, English-German MLPerf: Training Time 94


and English-French 84 MLPerf: Number of Accelerators 96
Number of Commercially IMAGENET: Training Cost 97
Available MT Systems 85
2.8 ROBOTICS 98
Price Trends in Robotic Arms 98
AI Skills Employed by Robotics Professors 99
ACCESS THE PUBLIC DATA

Table of Contents 49
Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

Computer vision is the subfield of AI that teaches machines to understand images and videos. There is a wide range of computer
vision tasks, such as image classification, object recognition, semantic segmentation, and face detection. As of 2021, computers can
outperform humans on a plethora of computer vision tasks. Computer vision technologies have a variety of important real-world
applications, such as autonomous driving, crowd surveillance, sports analytics, and video-game creation.

2.1 COMPUTER VISION—IMAGE


I M AG E C L A S S I F I CAT I O N A DEMONSTRATION OF IMAGE CLASSIFICATION
Source: Krizhevsky, 2020

ImageNet

Figure 2.1.1

ImageNet: Top-1 Accuracy

ImageNet: Top-5 Accuracy

Table of Contents Chapter 2 Preview 52


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

I M AG E G E N E R AT I O N

GAN PROGRESS ON FACE GENERATION


Source: Goodfellow et al., 2014; Radford et al., 2016; Liu & Tuzel, 2016;
Karras et al., 2018; Karras et al., 2019; Goodfellow, 2019; Karras et al.,
2020; AI Index, 2021; Vahdat et al., 2021

2021 Figure 2.1.4

STL-10: Fréchet Inception Distance (FID) Score

STL-10: FRÉCHET INCEPTION DISTANCE (FID) SCORE


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report
Fréchet Inception Distance (FID) Score

30

20

10

7.71

0
2018 2019 2020 2021
Figure 2.1.5

Table of Contents Chapter 2 Preview 54


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

CIFAR-10: Fréchet Inception Distance (FID) Score

CIFAR-10: FRÉCHET INCEPTION DISTANCE (FID) SCORE


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

30
Fréchet Inception Distance (FID) Score

20

10

2.10
0
2017 2018 2019 2020 2021
Figure 2.1.6

Table of Contents Chapter 2 Preview 55


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

D E E P FA K E D E T E CT I O N

FaceForensics++

FACEFORENSICS++: ACCURACY
Source: arXiv, 2021 | Chart: 2022 AI Index Report

100% 99.98%, Face2Face


99.47%, DeepFake
98.27%, FaceSwap

93.25%, NeuralTextures

90%
Accuracy (%)

80%

70%

60%
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 2.1.7

Table of Contents Chapter 2 Preview 56


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

Celeb-DF

CELEB-DF: AREA UNDER CURVE SCORE (AUC)


Source: arXiv, 2021 | Chart: 2022 AI Index Report

80

76.88

75
Area Under Curve Score (AUC)

70

65

60
2018 2019 2020 2021
Figure 2.1.8

H U M A N P O S E E S T I M AT I O N

A DEMONSTRATION OF
HUMAN POSE ESTIMATION
Source: Cao et al., 2019

Figure 2.1.9

Table of Contents Chapter 2 Preview 57


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

Leeds Sports Poses: Percentage of Correct


Keypoints (PCK)

LEEDS SPORTS POSES: PERCENTAGE of CORRECT KEYPOINTS (PCK)


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

100% 99.50%
Percentage of Correct Keypoints (PCK)

90%

80%

70%
2014 2015 2016 2017 2018 2019 2020 2021
Figure 2.1.10

Table of Contents Chapter 2 Preview 58


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

Human3.6M: Average Mean Per Joint Position


Error (MPJPE)

HUMAN3.6M: AVERAGE MEAN PER JOINT POSITION ERROR (MPJPE)


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

150

120
Average MPJPE (mm)

90

60

22.70, Without Extra Training Data


30

18.70, With Extra Training Data


0
2013 2014 2015 2016 2017 2018 2019 2020 2021
Figure 2.1.11

Table of Contents Chapter 2 Preview 59


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

S E M A N T I C S E G M E N TAT I O N A DEMONSTRATION OF SEMANTIC SEGMENTATION


Source: Visual Object Classes Challenge, 2012

Figure 2.1.12

Cityscapes

CITYSCAPES CHALLENGE, PIXEL-LEVEL SEMANTIC LABELING TASK: MEAN INTERSECTION-OVER-UNION (IOU)


Source: Cityscapes Challenge, 2021 | Chart: 2022 AI Index Report

90%

86.20%, With Extra Training Data


Mean Intersection-Over-Union (mIoU)

84.30%, Without Extra Training Data

80%

70%

60%
2014 2015 2016 2017 2018 2019 2020 2021
Figure 2.1.13

Table of Contents Chapter 2 Preview 60


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

M E D I CA L I M AG E S E G M E N TAT I O N A DEMONSTRATION OF KIDNEY SEGMENTATION


Source: Kidney and Kidney Tumor Segmentation, 2021

CVC-ClinicDB and Kvasir-SEG

Figure 2.1.14

CVC-CLINICDB: MEAN DICE KVASIR-SEG: MEAN DICE


Source: Papers with Code, 2021; arXiv, 2021 |Chart: 2022 AI Index Report Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

95% 95%
94.20%

92.17%

90% 90%
Mean DICE

Mean DICE

85% 85%

80% 80%
2015 2017 2019 2021 2015 2017 2019 2021
Figure 2.1.15a Figure 2.1.15b

Table of Contents Chapter 2 Preview 61


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

National Institute of Standards and Technology


(NIST) Face Recognition Vendor Test (FRVT)

FACE DETECTION AND RECOGNITION

NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY (NIST) FACE RECOGNITION VENDOR TEST (FRVT):
VERIFICATION ACCURACY by DATASET
Source: National Institute of Standards and Technology, 2021 | Chart: 2022 AI Index Report

0.5
False Non-Match Rate: FMNR (Log-Scale)

0.2

0.1

0.05

0.0297, WILD Photos FNMR @ FMR 0.00001


0.02

0.01

0.005
0.0044, BORDER Photos FNMR @ FMR = 0.000001
0.0023, VISABORDER Photos FNMR@FMR 0.000001
0.0022, MUGSHOT Photos FNMR @ FMR 0.00001
0.002
0.0021, MUGSHOT Photos FNMR @ FMR 0.00001 DT>=12 YRS
0.001 0.0013, VISA Photos FNMR @ FMR 0.000001

2017 2018 2019 2020 2021


Figure 2.1.16

Table of Contents Chapter 2 Preview 62


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

FACE DETECTION:
EFFECTS OF MASK-WEARING Although facial recognition
Face Recognition Vendor Test (FRVT):
technology has existed
Face-Mask Effects for several decades, the
technical progress in the
last few years has been
significant. Some of today’s
top-performing facial
recognition algorithms have
a near 100% success rate on
challenging datasets.

NIST FRVT FACE MASK EFFECTS: FALSE-NON MATCH RATE


Source: National Institute of Standards and Technology, 2021 | Chart: 2022 AI Index Report

0.025
False Non-Match Rate: FMNR (Log-Scale)

0.020

0.015
0.014, Masked

0.010

0.005

0.002, Non-masked
0.000
2019 2020 2021
Figure 2.1.17

Table of Contents Chapter 2 Preview 63


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

Masked Labeled Faces in the Wild (MLFW)


In 2021, researchers from the
Beijing University of Posts and
Telecommunications released a
facial recognition dataset of 6,000
masked faces in response to the new
recognition challenges posed by
large-scale mask-wearing.

EXAMPLES OF MASKED FACES IN


THE MASKED LABELED FACES IN
THE WILD (MLFW) DATABASE
Source: Wang et al., 2021

Figure 2.1.18

As part of the dataset release, the researchers ran a series of existing state-of-the-art detection
algorithms on a variety of facial recognition datasets, including theirs, to determine how much detection
performance decreased when faces were masked. Their estimates suggest that top methods perform 5
to 16 percentage points worse on masked faces compared to unmasked ones. These findings somewhat
confirm the insights from the FRVT face-mask tests: Performance deteriorates when masks are included,
but not by an overly significant degree.

STATE-OF-THE-ART FACE DETECTION METHODS on MASKED LABELED FACES IN THE WILD (MLFW): ACCURACY
Source: Wang et. al, 2021 | Chart: 2022 AI Index Report

Face Detection Method / Dataset


ArcFace1 ArcFace3 Arcface4 CosFace2 Curricularface5 SFace6
100%

90% 91% 91%


90%
85%
83%
Accuracy (%)

80%
75%

70%

60%

50%
CALFW11

CPLFW10

LFW7

MLFW

CALFW11

CPLFW10

LFW7

MLFW

CALFW11

CPLFW10

LFW7

MLFW

CALFW11

CPLFW10

LFW7

MLFW

CALFW11

CPLFW10

LFW7

MLFW

CALFW11

CPLFW10

LFW7

MLFW
SLLFW8

SLLFW8

SLLFW8

SLLFW8

SLLFW8

SLLFW8

Figure 2.1.19

Table of Contents Chapter 2 Preview 64


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

VISUAL REASONING
AN EXAMPLE OF A VISUAL REASONING TASK
Source: Goyal et al., 2021

Figure 2.1.20

SAMPLE QUESTIONS IN THE VISUAL QUESTION


Visual Question Answering
ANSWERING (VQA) CHALLENGE
(VQA) Challenge Source: Goyal et al., 2017

Figure 2.1.21

Table of Contents Chapter 2 Preview 65


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.1 Computer Vision–Image

VISUAL QUESTION ANSWERING (VQA) CHALLENGE: ACCURACY


Source: VQA Challenge, 2021 | Chart: 2022 AI Index Report

80.80%, Human Baseline


80%
79.78%

70%
Accuracy (%)

60%

50%
2015 2016 2017 2018 2019 2020 2021
Figure 2.1.22

Table of Contents Chapter 2 Preview 66


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.2 Computer Vision—Video

Video analysis concerns reasoning or task operation across sequential frames (videos), rather than single frames (images). Video
computer vision has a wide range of use cases, which include assisting criminal surveillance efforts, sports analytics, autonomous
driving, navigation of robots, and crowd monitoring.

2.2 COMPUTER VISION—VIDEO


AC T I V I T Y R E C O G N I T I O N Kinetics-400, Kinetics-600, Kinetics-700

EXAMPLE CLASSES FROM THE KINETICS DATASET


Source: Kay et al., 2017

Figure 2.2.1

Table of Contents Chapter 2 Preview 67


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.2 Computer Vision—Video

KINETICS-400, KINETICS-600, KINETICS-700: TOP-1 ACCURACY


Source: Papers with Code, 2021; arXIv, 2021 | Chart: 2022 AI Index Report

90% 89.60%, Kinetics-600


89.10%, Kinetics-400

82.20%, Kinetics-700
80%
Top-1 Accuracy (%)

70%

60%

50%
2016 2017 2018 2019 2020 2021 2022
Figure 2.2.2

Table of Contents Chapter 2 Preview 68


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.2 Computer Vision—Video

ActivityNet: Temporal Action Localization Task

ACTIVITYNET, TEMPORAL ACTION LOCALIZATION TASK: MEAN AVERAGE PRECISION (mAP)


Source: ActivityNet, 2021 | Chart: 2022 AI Index Report

44.67%

40%
Mean Average Precision (mAP)

30%

20%

2016 2017 2018 2019 2020 2021


Figure 2.2.3

Table of Contents Chapter 2 Preview 69


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.2 Computer Vision—Video

O B J E C T D E T E CT I O N

A DEMONSTRATION OF HOW OBJECT DETECTION APPEARS TO AI SYSTEMS


Source: COCO, 2020

Figure 2.2.4

Table of Contents Chapter 2 Preview 70


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.2 Computer Vision—Video

Common Object in Context (COCO)

COCO-TEST-DEV: MEAN AVERAGE PRECISION (mAP50)


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

80%
79.50%, With Extra Training Data

75%

77.10%, Without Extra Training Data


Mean Average Precision (mAP50)

70%

65%

60%

55%

50%
2015 2016 2017 2018 2019 2020 2021
Figure 2.2.5

Table of Contents Chapter 2 Preview 71


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.2 Computer Vision—Video

You Only Look Once (YOLO)

STATE OF THE ART (SOTA) vs. YOU ONLY LOOK ONCE (YOLO): MEAN AVERAGE PRECISION (mAP50)
Source: arXiv, 2021; GitHub, 2021 | Chart: 2022 AI Index Report

79.50%, SOTA
80%
Mean Average Precision (mAP50)

70%
72.40%, YOLO

60%

50%

40%
2016 2017 2018 2019 2020 2021
Figure 2.2.6

Table of Contents Chapter 2 Preview 72


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.2 Computer Vision—Video

Visual Commonsense Reasoning (VCR)

A SAMPLE QUESTION OF THE VISUAL COMMONSENSE REASONING (VCR) CHALLENGE


Source: Zellers et al., 2018

Figure 2.2.7

VISUAL COMMONSENSE REASONING (VCR) TASK: Q->AR SCORE


Source: VCR Leaderboard, 2021 | Chart: 2022 AI Index Report

90
85.00, Human Baseline

80

72.00
70
Q->AR Score

60

50

40
2018 2019 2020 2021
Figure 2.2.8

Table of Contents Chapter 2 Preview 73


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

Natural language processing (NLP) is a subfield of AI, with roots that stretch back as far as the 1950s. NLP involves research into systems that
can read, generate, and reason about natural language. NLP evolved from a set of systems that in its early years used handwritten rules and
statistical methodologies to one that now combines computational linguistics, rule-based modeling, statistical learning, and deep learning.

This section looks at progress in NLP across several language task domains, including: (1) English language understanding; (2) text
summarization; (3) natural language inference; (4) sentiment analysis; and (5) machine translation. In the last decade, technical progress in
NLP has been significant: The adoption of deep neural network–style machine learning methods has meant that many AI systems can now
execute complex language tasks better than many human baselines.

2.3 LANGUAGE
E N G L I S H L A N G UAG E
U N D E R S TA N D I N G

SuperGLUE

A SET OF SUPERGLUE TASKS3


Source: Wang et al., 2019

Figure 2.3.1

Table of Contents Chapter 2 Preview 74


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

SUPERGLUE: SCORE
Source: SuperGLUE Leaderboard, 2021 | Chart: 2022 AI Index Report

92

91.00
91
Score

90
89.80, Human Performance

89

88
2019 2020 2021
Figure 2.3.2

Stanford Question Answering Dataset (SQuAD) HARDER QUESTIONS ADDED TO STANFORD


QUESTION ANSWERING DATASET (SQUAD) 2.0
Source: Rajpurkar et al., 2018

Figure 2.3.3

Table of Contents Chapter 2 Preview 75


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

SQUAD 1.1 and SQUAD 2.0: F1 SCORE


Source: SQuAD 1.1 and SQuAD 2.0, 2021 | Chart: 2022 AI Index Report

100

95.72, Squad 1.1


95

93.21, Squad 2.0

91.20, Human Baseline (v1)


F1 Score

90 89.50, Human Baseline (v2)

85

80

2016 2017 2018 2019 2020 2021


Figure 2.3.4

Reading Comprehension A SAMPLE QUESTION IN READING COMPREHENSION DATASET


Dataset Requiring Logical REQUIRING LOGICAL REASONING (RECLOR)
Reasoning (ReClor) Source: Yu et al., 2020

Figure 2.3.5

Table of Contents Chapter 2 Preview 76


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

Although AI systems
are presently capable of
achieving a relatively high
level of performance on the
easy set of questions, they
struggle on the hard set.

READING COMPREHENSION DATASET REQUIRING LOGICAL REASONING (RECLOR): ACCURACY


Source: ReClor Leaderboard, 2021 | Chart: 2022 AI Index Report

91.82%, Test Easy


90%

80%
Accuracy (%)

70% 69.29%, Test Hard

60%

50%
2020 2021
Figure 2.3.6

Table of Contents Chapter 2 Preview 77


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

T E X T S U M M A R I Z AT I O N

arXiv

ARXIV: ROUGE-1
Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

47.15, Without Extra Training Data

45
46.74, With Extra Training Data
ROUGE-1

40

35

30
2017 2018 2019 2020 2021
Figure 2.3.7

Table of Contents Chapter 2 Preview 78


Artificial Intelligence
Index Report 2022

PubMed

PUBMED: ROUGE-1
Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

50

48.25, With Extra Training Data

47.81, Without Extra Training Data

45
ROUGE-1

40

35
2017 2018 2019 2020 2021
Figure 2.3.8

Table of Contents Chapter 2 Preview 79


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

N AT U R A L L A N G UAG E I N F E R E N C E Stanford Natural Language Inference (SNLI)

QUESTIONS AND LABELS IN STANFORD NATURAL LANGUAGE INFERENCE (SNLI)


Source: Bowman et al., 2015

Figure 2.3.9

Table of Contents Chapter 2 Preview 80


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

STANFORD NATURAL LANGUAGE INFERENCE (SNLI): ACCURACY


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

95%

93.10%
Accuracy (%)

90%

85%
2014 2015 2016 2017 2018 2019 2020 2021
Figure 2.3.10

Abductive Natural Language Inference (aNLI)

EXAMPLE QUESTIONS IN ABDUCTIVE NATURAL LANGUAGE INFERENCE (ANLI)


Source: Allen Institute for AI, 2021

Figure 2.3.11

Table of Contents Chapter 2 Preview 81


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

ABDUCTIVE NATURAL LANGUAGE INFERENCE (aNLI): ACCURACY


Source: Allen Institute for AI, 2021 | Chart: 2022 AI Index Report

92.90%, Human Baseline

92% 91.87%

90%
Accuracy (%)

88%

86%

84%

2019 2020 2021


Figure 2.3.12

S E N T I M E N T A N A LYS I S

SemEval 2014 Task 4 Sub Task 2

A SAMPLE
SEMEVAL TASK
Source: Pontiki et al., 2014

Figure 2.3.13

Table of Contents Chapter 2 Preview 82


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

SEMEVAL 2014 TASK 4 SUB TASK 2: ACCURACY


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

90%
88.64%

85%
Accuracy (%)

80%

75%

70%
2015 2016 2017 2018 2019 2020 2021
Figure 2.3.14

M AC H I N E T R A N S L AT I O N ( M T )

Table of Contents Chapter 2 Preview 83


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

WMT 2014, English-German and English-French

WMT2014, ENGLISH-FRENCH: BLEU SCORE WMT2014, ENGLISH-GERMAN: BLEU SCORE


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

46.40, With Extra Training Data

45 45

43.95, Without Extra Training Data


40 40

35.14, With Extra Training Data


BLEU Score

BLEU score

35 35

31.26, Without Extra Training Data

30 30

25 25

20 20

2015 2017 2019 2021 2015 2017 2019 2021


Figure 2.3.15

Table of Contents Chapter 2 Preview 84


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.3 Language

Number of Commercially Available MT Systems

NUMBER of INDEPENDENT MACHINE TRANSLATION SERVICES


Source: Intento, 2021 | Chart: 2022 AI Index Report

50
Commercial 46
Number of Independent Machine Translation Services

Open Source Pre-trained


Preview
40

34

30
26 38
23
21
20 28
16
13 23
12 21
18
10
10 9
10 15
9
3
8 9
6 5
0 3 3 3 3
05/2017 07/2017 11/2017 03/2018 07/2018 12/2018 06/2019 11/2019 07/2020 10/2021
Figure 2.3.16

Table of Contents Chapter 2 Preview 85


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.4 Speech

Another important domain of AI research is the analysis, recognition, and synthesis of human speech. In this AI subfield, AI systems are
typically rated on their ability to recognize speech and identify words and convert them into text; and also to recognize speakers and
identify the individuals speaking. Modern home assistance tools, such as Siri, are one of the many examples of commercially applied AI
speech technology.

2.4 SPEECH
SPEECH RECOGNITION

Transcribe Speech: LibriSpeech (Test-Clean and


Other Datasets)

LIBRISPEECH, TEST CLEAN: WORD ERROR RATE (WER) LIBRISPEECH, TEST OTHER: WORD ERROR RATE (WER)
Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

14 14

12 12

10 10
Word Error Rate (WER)

Word Error Rate (WER)

8 8 3.30, Without Extra Training Data

6 6

4 4
1.70, Without Extra Training Data

2 2 2.50, With Extra Training Data

1.40, With Extra Training Data


0 0
2015 2017 2019 2021 2015 2017 2019 2021
Figure 2.4.1

Table of Contents Chapter 2 Preview 86


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.4 Speech

VoxCeleb

VOXCELEB: EQUAL ERROR RATE (EER)


Source: VoxCeleb, 2021 | Chart: 2022 AI Index Report

8%

6%
Equal Error Rate (%)

4%

2%

0.42%

0%
2017 2018 2019 2020 2021
Figure 2.4.2

Table of Contents Chapter 2 Preview 87


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.5 Recommendation

Recommendation is the task of suggesting items that might be of interest to a user, such as movies to watch, articles to read, or products
to purchase. Recommendation systems are crucial to businesses, such as Amazon, Netflix, Spotify, and YouTube. For example, one of
the earliest open recommendation competitions in AI was the Netflix Prize; hosted in 2009, it challenged computer scientists to develop
algorithms that could accurately predict user ratings for films based on previously submitted ratings.

2.5 RECOMMENDATION
Commercial Recommendation: MovieLens 20M

MOVIELENS 20M: NORMALIZED DISCOUNTED CUMULATIVE GAIN@100 (nDCG@100)


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

0.460
Normalized Discounted Cumulative Gain@100 (nDCG@100)

0.450
0.448

0.440

0.430

0.420
2018 2019 2020 2021
Figure 2.5.1

Table of Contents Chapter 2 Preview 88


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.5 Recommendation

Click-Through Rate Prediction: Criteo

CRITEO: AREA UNDER CURVE SCORE (AUC)


Source: Papers with Code, 2021; arXiv, 2021 | Chart: 2022 AI Index Report

0.813

0.810
Area Under Curve Score (AUC)

0.800

0.790
2016 2017 2018 2019 2020 2021
Figure 2.5.2

Table of Contents Chapter 2 Preview 89


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.6 Reinforcement Learning

In reinforcement learning, AI systems are trained to maximize performance on a given task by interactively learning from their prior
actions. Researchers train systems to optimize by rewarding them if they achieve a desired goal and then punishing them if they fail.
Systems experiment with different strategy sequences to solve their stated problem (e.g., playing chess or navigating through a maze)
and select the strategies which maximize their rewards.

Reinforcement learning makes the news whenever programs like DeepMind’s AlphaZero demonstrate superhuman performance on
games like Go and Chess. However, reinforcement learning is useful in any commercial domain where computer agents need to maximize
a target goal or stand to benefit from learning from previous experiences. Reinforcement learning can help autonomous vehicles change
lanes, robots optimize manufacturing tasks, or time-series models predict future events.

2.6 REINFORCEMENT LEARNING


REINFORCEMENT LEARNING
ENVIRONMENTS
Creating reinforcement
learning models that are
both high performing
and highly efficient is
an important step in the
commercial deployment of
Arcade Learning Environment: Atari-57
reinforcement learning.

Table of Contents Chapter 2 Preview 90


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.6 Reinforcement Learning

ATARI-57: MEAN HUMAN-NORMALIZED SCORE


Source: Papers with Code, 2021; arXIv, 2021 | Chart: 2022 AI Index Report

10 9.62
Mean-Human Normalized Score (in thousands)

0
2015 2016 2017 2018 2019 2020 2021
Figure 2.6.1

Procgen
A SCREENSHOT OF THE 16 GAME ENVIRONMENTS IN PROCGEN
Source: Cobbe et al. 2019

Figure 2.6.2

Table of Contents Chapter 2 Preview 91


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.6 Reinforcement Learning

PROCGEN: MEAN-NORMALIZED SCORE


Source: arXiv, 2021 | Chart: 2022 AI Index Report

0.64

0.60
Mean-Normalized Score

0.50

0.40

0.30

2019 2020 2021


Figure 2.6.3

Table of Contents Chapter 2 Preview 92


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.6 Reinforcement Learning

Human Games: Chess

CHESS SOFTWARE ENGINES: ELO SCORE


Source: Swedish Computer Chess Association, 2021 | Chart: 2022 AI Index Report

3,581
3500

3000 2882, Magnus Carlsen

2500
2300, Expert
Elo Score

2000
1700, Intermediate

1500

1000
800, Novice

500

0
1987 1992 1997 2002 2007 2012 2017 2022
Figure 2.6.4

Table of Contents Chapter 2 Preview 93


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.7 Hardware

In evaluating technical progress in AI, it is relevant not only to consider improvements in technical performance but also the speed of
operation. As this section shows, AI systems continue to improve in virtually every skill category. This performance is often realized by
increasing parameters and training systems on greater amounts of data. However, all else being equal, models that use more parameters
and source more data will take longer to train. Longer train times mean slower real-world deployment. Given that the potential of
increased training times can be offset by stronger and more robust computational infrastructures, it is important to keep track of
progress in the hardware that powers AI systems.

2.7 HARDWARE
MLPerf: Training Time

MLPERF TRAINING TIME of TOP SYSTEMS by TASK: MINUTES


Source: MLPerf, 2021 | Chart: 2022 AI Index Report

50

20
Training Time (Minutes; Log Scale)

10 13.57, Reinforcement Learning

5
3.24, Object Detection (heavy-weight)
2.38, Speech recognition
2

1.26, Image Segmentation


1
0.63, Recommendation
0.5
0.34, Object Detection (light-weight)
0.23, Language Processing
0.2 0.23, Image Classi cation

2018 2019 2020 2021


Figure 2.7.1

Table of Contents Chapter 2 Preview 94


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.7 Hardware

Top-performing hardware systems can reach baseline levels


of performance in task categories like recommendation,
light-weight objection detection, image classification, and
language processing in under a minute.

MLPERF: SCALE of IMPROVEMENT across TASK


Source: MLPerf, 2021 | Chart: 2022 AI Index Report

26.96

25
22.25

20
Scale of Improvement

16.47

15

10

5
2.38
1.70 1.92
Improvement Baseline 1.16
0
Reinforcement Speech Language Recommendation Segmentation Object Detection Object Detection Image
Recognition Processing (light-weight) (heavy-weight) Classi cation

Figure 2.7.2

Table of Contents Chapter 2 Preview 95


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.7 Hardware

MLPerf: Number of Accelerators

MLPERF HARDWARE: ACCELERATORS


Source: MLPerf, 2021 | Chart: 2022 AI Index Report

4,320, Maximum Number of Accelerators Used

4,000

3,000
Number of Accelerators

2,000 1,785, Average Accelerators Used by Top System

1,000

337, Mean Number of Accelerators

0
12.12.2018 06.10.2019 07.29.2020 06.30.2021 12.01.2021
Figure 2.7.3

Table of Contents Chapter 2 Preview 96


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.7 Hardware

IMAGENET: Training Cost

IMAGENET: TRAINING COST (to 93% ACCURACY)


Source: AI Index and Narayanan, 2021 | Chart: 2022 AI Index Report

$1,000.00

$500.00

$200.00
Cost (in U.S. Dollars; Log Scale)

$100.00

$50.00

$20.00

$10.00

$5.00 $4.59

$2.00

$1.00
2017 2018 2019 2020 2021
Figure 2.7.4

Table of Contents Chapter 2 Preview 97


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.8 Robotics

In 2021, the AI Index developed a survey that asked professors who specialize in robotics at top-ranked universities around the world and
in emerging economies about changes in the pricing of robotic arms as well as the uses of robotic arms in research labs. The survey was
completed by 101 professors and researchers from over 40 universities and collected data on 117 robotic arm purchase events from 2016
to 2022. The survey results suggest that there has been a notable decline in the price of robotic arms since 2016.

2.8 ROBOTICS
Price Trends in Robotic Arms7

MEDIAN PRICE of ROBOTIC ARMS, 2017–21


Source: AI Index, 2022 | Chart: 2022 AI Index Report

$40
Price (in thousands of U.S. Dollars)

$30

$22.60
$20

$10

$0
2017 2018 2019 2020 2021
Figure 2.8.1

Table of Contents Chapter 2 Preview 98


Artificial Intelligence CHAPTER 2: TECHNICAL PERFORMANCE
Index Report 2022 2.8 Robotics

DISTRIBUTION of ROBOTIC ARM PRICES, 2017–21


Source: AI Index, 2022 | Chart: 2022 AI Index Report

$100
Price (in thousands of U.S. Dollars)

$50

$0
2017 2018 2019 2020 2021
Figure 2.8.2

AI Skills Employed by Robotics AI SKILLS EMPLOYED by ROBOTICS PROFESSORS


Professors Source: AI Index, 2022 | Chart: 2022 AI Index Report

Deep Learning 67.00%

Reinforcement
46.00%
Learning

0% 20% 40% 60% 80%


% of Respondents
Figure 2.8.3

Table of Contents Chapter 2 Preview 99

You might also like