Professional Documents
Culture Documents
You open up a database of pictures used to In short, how did we get here?
train artificial intelligence systems. At first,
things seem straightforward. You’re met
with thousands of images: apples and There’s an urban legend about the early
oranges, birds, dogs, horses, mountains, days of machine vision, the subfield of
clouds, houses, and street signs. But as artificial intelligence (AI) concerned with
you probe further into the dataset, people teaching machines to detect and interpret
begin to appear: cheerleaders, scuba images. In 1966, Marvin Minsky was a
divers, welders, Boy Scouts, fire walkers, young professor at MIT, making a name for
and flower girls. Things get strange: A himself in the emerging field of artificial
photograph of a woman smiling in a bikini intelligence.[1] Deciding that the ability to
is labeled a “slattern, slut, slovenly woman, interpret images was a core feature of
trollop.” A young man drinking beer is intelligence, Minsky turned to an
categorized as an “alcoholic, alky, undergraduate student, Gerald Sussman,
dipsomaniac, boozer, lush, soaker, souse.” and asked him to “spend the summer
A child wearing sunglasses is classified as linking a camera to a computer and getting
a “failure, loser, non-starter, unsuccessful the computer to describe what it saw.”
person.” You’re looking at the “person” [2] This became the Summer Vision
category in a dataset called ImageNet, one Project.[3] Needless to say, the project of
of the most widely used training sets for getting computers to “see” was much
machine learning. harder than anyone expected, and would
take a lot longer than a single summer.
Something is wrong with this picture.
The story we’ve been told goes like this:
brilliant men worked for decades on the
Where did these images come from? Why problem of computer vision, proceeding in
were the people in the photos labeled this fits and starts, until the turn to probabilistic
way? What sorts of politics are at work modeling and learning techniques in the
when pictures are paired with labels, and 1990s accelerated progress. This led to the
what are the implications when they are current moment, in which challenges such
used to train technical systems? as object detection and facial recognition
1
have been largely solved.[4] This arc of turn to the question of labeling: how do
inevitability recurs in many AI narratives, humans tell computers which words will
where it is assumed that ongoing technical relate to a given image? And what is at
improvements will resolve all problems and stake in the way AI systems use these
limitations. labels to classify humans, including by
race, gender, emotions, ability, sexuality,
But what if the opposite is true? What if the and personality? Finally, we turn to the
challenge of getting computers to “describe purposes that computer vision is meant to
what they see” will always be a problem? In serve in our society—the judgments,
this essay, we will explore why the choices, and consequences of providing
automated interpretation of images is an computers with these capacities.
inherently social and political project, rather
than a purely technical one. Understanding
the politics within AI systems matters more
than ever, as they are quickly moving into Training A
the architecture of social institutions:
deciding whom to interview for a job, which Building AI systems requires data.
students are paying attention in class, Supervised machine-learning systems
which suspects to arrest, and much else. designed for object or facial recognition are
trained on vast amounts of data contained
For the last two years, we have been within datasets made up of many discrete
studying the underlying logic of how images. To build a computer vision system
images are used to train AI systems to that can, for example, recognize the
“see” the world. We have looked at difference between pictures of apples and
hundreds of collections of images used in oranges, a developer has to collect, label,
artificial intelligence, from the first and train a neural network on thousands of
experiments with facial recognition in the labeled images of apples and oranges. On
early 1960s to contemporary training sets the software side, the algorithms conduct a
containing millions of images. statistical survey of the images, and
Methodologically, we could call this project develop a model to recognize the
an archeology of datasets: we have been difference between the two “classes.” If all
digging through the material layers, goes according to plan, the trained model
cataloguing the principles and values by will be able to distinguish the difference
which something was constructed, and between images of apples and oranges
analyzing what normative patterns of life that it has never encountered before.
were assumed, supported, and
reproduced. By excavating the construction Training sets, then, are the foundation on
of these training sets and their underlying which contemporary machine-learning
structures, many unquestioned systems are built.[5] They are central to
assumptions are revealed. These how AI systems recognize and interpret the
assumptions inform the way AI systems world. These datasets shape the epistemic
work—and fail—to this day. boundaries governing how AI systems
operate, and thus are an essential part of
This essay begins with a deceptively understanding socially significant questions
simple question: What work do images do about AI.
in AI systems? What are computers meant
to recognize in an image and what is But when we look at the training images
misrecognized or even completely widely used in computer-vision systems,
invisible? Next, we look at the method for we find a bedrock composed of shaky and
introducing images into computer systems skewed assumptions. For reasons that are
and look at how taxonomies order the rarely discussed within the field of
foundational concepts that will become computer vision, and despite all that
intelligible to a computer system. Then we institutions like MIT and companies like
Japanese Female Facial Expression JAFFE Database, image credit M. Lyons et al. (1999)
Neural Network to handily win the top prize, organized into a nested hierarchy, going
bringing new attention to this technique. from general concepts to more specific
That moment is widely considered a turning ones. For example, the concept “chair” is
point in the development of contemporary nested as artifact > furnishing > furniture >
AI.[12] The final year of the ImageNet seat > chair. The classification system is
competition was 2017, and accuracy in broadly similar to those used in libraries to
classifying objects in the limited subset had order books into increasingly specific
risen from 71.8% to 97.3%. That subset did categories.
not include the “Person” category, for
reasons that will soon become obvious. While WordNet attempts to organize the
entire English language,[13] ImageNet is
restricted to nouns (the idea being that
nouns are things that pictures can
Taxonomy represent). In the ImageNet hierarchy,
every concept is organized under one of
The underlying structure of ImageNet is nine top-level categories: plant, geologic
based on the semantic structure of formation, natural object, sport, artifact,
WordNet, a database of word fungus, person, animal, and miscellaneous.
classifications developed at Princeton Below these are layers of additional nested
University in the 1980s. The taxonomy is classes.
organized according to a nested structure
of cognitive synonyms or “synset.” Each As the fields of information science and
“synset” represents a distinct concept, with science and technology studies have long
synonyms grouped together (for example, shown, all taxonomies or classificatory
“auto” and “car” are treated as belonging to systems are political.[14] In ImageNet
the same synset). Those synsets are then (inherited from WordNet), for example, the
If we move from taxonomy down a level, to the 21,841 categories in the ImageNet hierarchy,
we see another kind of politics emerge.
category “human body” falls under the Sexual Crimes,” which the American
branch Natural Object > Body > Human Library Association's Task Force on Gay
Body. Its subcategories include “male Liberation finally convinced the Library of
body”; “person”; “juvenile body”; “adult Congress to change in 1972 after a
body”; and “female body.” The “adult body” sustained campaign.[16]
category contains the subclasses “adult
female body” and “adult male body.” We
find an implicit assumption here: only
“male” and “female” bodies are “natural.” Categories
There is an ImageNet category for the term
“Hermaphrodite” that is bizarrely (and There’s a kind of sorcery that goes into the
offensively) situated within the branch creation of categories. To create a category
Person > Sensualist > Bisexual > alongside or to name things is to divide an almost
the categories “Pseudohermaphrodite” and infinitely complex universe into separate
“ S w i t c h H i t t e r. ” [ 1 5 ] T h e I m a g e N e t phenomena. To impose order onto an
classification hierarchy recalls the old undifferentiated mass, to ascribe
Library of Congress classification of phenomena to a category—that is, to name
LGBTQ-themed books under the category a thing—is in turn a means of reifying the
“Abnormal Sexual Relations, Including existence of that category.
In the case of ImageNet, noun categories problematic, illogical, and cruel, especially
such as “apple” or “apple butter” might when it comes to labels applied to people.
seem reasonably uncontroversial, but not
all nouns are created equal. To borrow an ImageNet contains 2,833 subcategories
idea from linguist George Lakoff, the under the top-level category “Person.” The
concept of an “apple” is more nouny than subcategory with the most associated
the concept of “light”, which in turn is more pictures is “gal” (with 1,664 images)
nouny than a concept such as “health.”[17] followed by “grandfather” (1,662), “dad”
Nouns occupy various places on an axis (1,643), and chief executive officer (1,614).
from the concrete to the abstract, and from With these highly populated categories, we
the descriptive to the judgmental. These can already begin to see the outlines of a
gradients have been erased in the logic of worldview. ImageNet classifies people into
ImageNet. Everything is flattened out and a huge range of types including race,
pinned to a label, like taxidermy butterflies nationality, profession, economic status,
in a display case. The results can be behaviour, character, and even morality.
9
There are categories for racial and national
identities including Alaska Native, Anglo-
American, Black, Black African, Black
Woman, Central American, Eurasian,
German American, Japanese, Lapp, Latin
American, Mexican-American, Nicaraguan,
Nigerian, Pakistani, Papuan, South
American Indian, Spanish American, Texan,
Uzbek, White, Yemeni, and Zulu. Other
people are labeled by their careers or
hobbies: there are Boy Scouts,
cheerleaders, cognitive neuroscientists,
hairdressers, intelligence analysts,
mythologists, retailers, retirees, and so on.
10
11
f
fi
f
n
f
f
f
the big AI companies, where there is no Labeled Image
way for outsiders to see how images are
being ordered and classified.
Finally, there is the issue of where the Images are laden with potential meanings,
thousands of images in ImageNet’s Person irresolvable questions, and contradictions.
class were drawn from. By harvesting In trying to resolve these ambiguities,
images en masse from image search ImageNet’s labels often compress and
engines like Google, ImageNet’s creators simplify images into deadpan banalities.
appropriated people’s selfies and vacation One photograph shows a dark-skinned
photos without their knowledge, and then toddler wearing tattered and dirty clothes
labeled and repackaged them as the and clutching a soot-stained doll. The
underlying data for much of an entire field. child’s mouth is open. The image is
[18] When we take a look at the bedrock completely devoid of context. Who is this
layer of labeled images, we find highly child? Where are they? The photograph is
questionable semiotic assumptions, echoes simply labeled “toy.”
of nineteenth-century phrenology, and the
representational harm of classifying images But some labels are just nonsensical. A
of people without their consent or woman sleeps in an airplane seat, her right
participation. arm protectively curled around her
pregnant stomach. The image is labeled
“snob.” A photoshopped picture shows a
smiling Barack Obama wearing a Nazi
uniform, his arm raised and holding a Nazi
flag. It is labeled “Bolshevik.”
ACCUSED BOLSHEVIK
12
ANTI-SEMITE KLEPTOMANIAC
JEWESS LOSER
14
MIXED-BLOOD
SHARECROPPER
15
SNOB TOSSER
16
SWINGER
At the image layer of the training set, like skulls, and compiled meticulous archives of
everywhere else, we find assumptions, labeled images and measurements, all in
politics, and worldviews. According to an effort to use “mechanical” processes to
ImageNet, for example, Sigourney Weaver detect visual signals in classifications of
is a “hermaphrodite,” a young man wearing race, criminality, and deviance from
a straw hat is a “tosser,” and a young bourgeois ideals. This was done to capture
woman lying on a beach towel is a and pathologize what was seen as deviant
“kleptomaniac.” But the worldview of or criminal behavior, and make such
ImageNet isn’t limited to the bizarre or behavior observable in the world.
derogatory conjoining of pictures and
labels. And as we shall see, not only have the
underlying assumptions of physiognomy
Other assumptions about the relationship made a comeback with contemporary
between pictures and concepts recall training sets, but indeed a number of
p h y s i o g n o m y, t h e p s e u d o s c i e n t i f i c training sets are designed to use
assumption that something about a algorithms and facial landmarks as latter-
person’s essential character can be day calipers to conduct contemporary
gleaned by observing features of their versions of craniometry.
bodies and faces. ImageNet takes this to
an extreme, assuming that whether For example, the UTKFace dataset
someone is a “debtor,” a “snob,” a (produced by a group at the University of
“swinger,” or a “slav” can be determined by Tennessee at Knoxville) consists of over
inspecting their photograph. In the weird 20,000 images of faces with annotations for
metaphysics of ImageNet, there are age, gender, and race.The dataset’s
separate image categories for “assistant authors state that the dataset can be used
professor” and “associate professor”—as for a variety of tasks, like automated face
though if someone were to get a promotion, detection, age estimation, and age
their biometric signature would reflect the progression.[21]
change in rank.
The annotations for each image include an
Of course, these sorts of assumptions have estimated age for each person, expressed
their own dark histories and attendant in years from zero to 116. Gender is a
politics. binary choice: either zero for male or one
for female. Second, race is categorized
from zero to four, and places people in one
of five classes: White, Black, Asian, Indian,
or “Others.”
UTK: Making Race and Gender from
Your Face The politics here are as obvious as they
are troubling. At the category level, the
In 1839, the mathematician François researchers’ conception of gender is as a
Arago claimed that through photographs, simple binary structure, with “male” and
“objects preserve mathematically their “female” the only alternatives. At the level
forms.”[19] Placed into the nineteenth- of the image label is the assumption that
century context of imperialism and social someone’s gender identity can be
Darwinism, photography helped to animate ascertained through a photograph.
—and lend a “scientific” veneer to—various
forms of phrenology, physiognomy, and The classificatory schema for race recalls
eugenics.[20] Physiognomists such as many of the deeply problematic racial
Francis Galton and Cesare Lombroso classifications of the twentieth century. For
created composite images of criminals, example, the South African apartheid
studied the feet of prostitutes, measured regime sought to classify the entire
population into four categories: Black,
17
UTKFace Dataset
18
White, Colored, or Indian.[22] Around 1970, skin.[25] IBM publicly promised to improve
the South African government created a their facial-recognition datasets to make
unified “identity passbook” called The Book them more “representative” and published
of Life, which linked to a centrally managed the “Diversity in Faces” (DiF) dataset as a
database created by IBM. These result.[26] Constructed to be “a
classifications were based on dubious and computationally practical basis for ensuring
shifting criteria of “appearance and general fairness and accuracy in face recognition,”
acceptance or repute,” and many people the DiF consists of almost a million images
were reclassified, sometimes multiple of people pulled from the Yahoo! Flickr
times.[23] The South African system of Creative Commons dataset, assembled
racial classification was intentionally very specifically to achieve statistical parity
different from the American “one-drop” rule, among categories of skin tone, facial
which stated that even one ancestor of structure, age, and gender.[27]
African descent made somebody Black,
likely because nearly all white South The dataset itself continued the practice of
Africans had some traceable black African collecting hundreds of thousands of images
ancestry.[24] Above all, these systems of of unsuspecting people who had uploaded
classifications caused enormous harm to pictures to sites like Flickr.[28] But the
people, and the elusive classifier of a pure dataset contains a unique set of categories
“race” signifier was always in dispute. not previously seen in other face-image
However, seeking to improve matters by datasets. The IBM DiF team asks whether
producing “more diverse” AI training sets age, gender, and skin color are truly
presents its own complications. sufficient in generating a dataset that can
ensure fairness and accuracy, and
concludes that even more classifications
are needed. So they move into truly
IBM’S Diversity in Face
strange territory: including facial symmetry
IBM’s “Diversity in Faces” dataset was and skull shapes to build a complete
created as a response to critics who had picture of the face. The researchers claim
s h o w n t h a t t h e c o m p a n y ’s f a c i a l - that the use of craniofacial features is
recognition software often simply did not justified because it captures much more
recognize the faces of people with darker granular information about a person's face
than just gender, age, and skin color alone.
19
The paper accompanying the dataset the language of increasing “fairness” and
specifically highlights prior work done to “mitigating bias, ” clearly there are strong
show that skin color is itself a weak business imperatives to produce tools that
predictor of race, but this begs the question will work more effectively across wider
of why moving to skull shapes is markets. However, here too the technical
appropriate. process of categorizing and classifying
people is shown to be a political act. For
Craniometry was a leading methodological example, how is a “fair” distribution
approach of biological determinism during achieved within the dataset?
the nineteenth century. As Stephen Jay
Gould shows in his book The Mismeasure IBM decided to use a mathematical
of Man, skull size was used by nineteenth- approach to quantifying “diversity” and
and twentieth-century pseudoscientists as “evenness,” so that a consistent measure
a spurious way to claim inherent superiority of evenness exists throughout the dataset
of white people over black people, and for every feature quantified. The dataset
different skull shapes and weights were also contains subjective annotations for
said to determine people’s intelligence— age and gender, which are generated using
always along racial lines.[29] three independent Amazon Turk workers
for each image, similar to the methods
While the efforts of companies to build used by ImageNet.[30] So people’s gender
more diverse training sets is often put in and age are being ‘predicted’ based on
20
21
MS CELEB dataset
and pilots. The picture of a man drinking response to research published by Adam
beer characterized as an “alcoholic” Harvey and Jules LaPlace,[32] Duke
disappeared, as did the pictures of a University took down a massive photo
woman in a bikini dubbed a “slattern” and a repository of surveillance-camera footage
young boy classified as a “loser.” The of students attending classes (called the
picture of a man eating a sandwich (labeled Duke Multi-Target, Multi-Camera [MTMC]
a “selfish person”) met the same fate. dataset). It turned out that the authors of
When you search for these images, the the dataset had violated the terms of their
ImageNet website responds with a Institutional Review Board approval by
statement that it is under maintenance, and collecting images from people in public
only the categories used in the ImageNet space, and by making their dataset publicly
competition are still included in the search available.[33]
results.
Similar datasets created from surveillance
But once it came back online, the search footage disappeared from servers at the
functionality on the site was modified so University of Colorado Colorado Springs,
that it would only return results for and more from Stanford University, where a
categories that had been included in collection of faces culled from a webcam
ImageNet’s annual computer-vision installed at San Francisco’s iconic
contest. As of this writing, the “Person” Brainwash Cafe was “removed from access
category is still browsable from the data at the request of the depositor.”[34]
set’s online interface, but the images fail to
load. The URLs for the original images are By early June, Microsoft had followed suit,
still downloadable.[31] removing their landmark “MS-CELEB”
collection of approximately 10 million
Over the next few months, other image photos from 100,000 people scraped from
collections used in computer-vision and AI the internet in 2016. It was the largest
research also began to disappear. In public facial-recognition dataset in the
22
world, and the people included were not for drone surveillance systems to detect
just famous actors and politicians, but also and isolate violent behavior in crowds. The
journalists, activists, policy makers, team created the Aerial Violent Individual
academics, and artists.[35] Ironically, (AVI) Dataset, which consists of 2,000
several of the people who had been images of people engaged in five activities:
included in the set without any consent are punching, stabbing, shooting, kicking, and
known for their work critiquing surveillance strangling. In order to train their AI, they
and facial recognition itself, including asked 25 volunteers between the ages of
filmmaker Laura Poitras, digital rights 18 and 25 to mimic these actions. Watching
activist Jillian York, critic Evgeny Morozov, the videos is almost comic. The actors
and author of Surveillance Capitalism stand far apart and perform strangely
Shoshana Zuboff. After an investigation in exaggerated gestures. It looks like a
the Financial Times based on Harvey and children’s pantomime, or badly modeled
LaPlace’s work was published, the set game characters.[38] The full dataset is not
disappeared.[36] A spokesperson for available for the public to download. The
Microsoft claimed simply that it was lead researcher, Amarjot Singh (now at
removed “because the research challenge Stanford University), said he plans to test
is over.”[37] the AI system by flying drones over two
major festivals, and potentially at national
On one hand, removing these problematic borders in India.[39] [40]
datasets from the internet may seem like a
victory. The most obvious privacy and An archeological analysis of the AVI
ethical violations are addressed by making dataset—similar to our analyses of
them no longer accessible. However, taking ImageNet, JAFFE, and Diversity in Faces
them offline doesn’t stop their work in the —could be very revealing. There is clearly
world: these training sets have been a significant difference between staged
downloaded countless times, and have performances of violence and real-world
made their way into many production AI cases. The researchers are training drones
systems and academic papers. By erasing to recognize pantomimes of violence, with
them completely, not only is a significant all of the misunderstandings that might
part of the history of AI lost, but come with that. Furthermore, the AVI
researchers are unable to see how the dataset doesn’t have anything for “actions
assumptions, labels, and classificatory that aren’t violence but might look like it”;
approaches have been replicated in new neither do they publish any details about
systems, or trace the provenance of skews their false-positive rate (how often their
and biases exhibited in working systems. system detects nonviolent behavior as
Facial-recognition and emotion-recognition violent).[41] Until their data is released, it is
AI systems are already propagating into impossible to do forensic testing on how
hiring, education, and healthcare. They are they classify and interpret human bodies,
part of security checks at airports and actions, or inactions.
interview protocols at Fortune 500
companies. Not being able to see the basis This is the problem of inaccessible or
on which AI systems are trained removes disappearing datasets. If they are, or were,
an important forensic method to being used in systems that play a role in
understand how they work. This has everyday life, it is important to be able to
serious consequences. study and understand the worldview they
normalize. Developing frameworks within
For example, a recent paper led by a PhD which future researchers can access these
student at the University of Cambridge data sets in ways that don’t perpetuate
introduced a real-time drone surveillance harm is a topic for further work.
system to identify violent individuals in
public areas. It is trained on datasets of
“violent behavior” and uses those models
23
24
25
END NOTE
[1] Minsky currently faces serious Hall Series in Artificial Intelligence individuals, and clearly visible in
allegations related to convicted (Upper Saddle River, NJ: Prentice observable biological mechanisms
pedophile and rapist Jeffrey Hall, 2010), 987. regardless of cultural context. But
Epstein. Minsky was one of Ekman’s work has been deeply
several scientists who met with [5] In the late 1970s, Ryszard criticized by psychologists,
Epstein and visited his island Michalski wrote an algorithm anthropologists, and other
retreat where underage girls were based on “symbolic variables” and researchers who have found his
forced to have sex with members logical rules. This language was theories do not hold up under
of Epstein’s coterie. As scholar very popular in the 1980s and sustained scrutiny. The
Meredith Broussard observed, this 1990s, but, as the rules of psychologist Lisa Feldman Barrett
was part of a broader culture of decision-making and qualification and her colleagues have argued
exclusion that became endemic in became more complex, the that an understanding of emotions
AI: “as wonderfully creative as language became less usable. At in terms of these rigid categories
Minsky and his cohort were, they the same moment, the potential of and simplistic physiological causes
also solidified the culture of tech as using large training sets triggered is no longer tenable. Nonetheless,
a billionaire boys’ club. Math, a shift from this conceptual AI researchers have taken his
physics, and the other “hard” clustering to contemporary work as fact, and used it as a basis
sciences have never been machine-learning approaches. See for automating emotion detection.”
hospitable to women and people of Ryszard Michalski, “Pattern Meredith Whitaker et al., “AI Now
color; tech followed this lead.” See Recognition as Rule-Guided Report 2018” (AI Now Institute,
Meredith Broussard, Artificial Inductive Inference.” IEEE December 2018), https://
Unintelligence: How Computers Transactions on Pattern Analysis ainowinstitute.org/
Misunderstand the World and Machine Intelligence, 2, 349– AI_Now_2018_Report.pdf. See
(Cambridge, Massachusetts, and 361, 1980. also Lisa Feldman Barrett et al.,
London: MIT Press, 2018), 174. “Emotional Expressions
[6] There are hundreds of scholarly Reconsidered: Challenges to
[2] See Daniel Crevier, AI: The books in this category, but for a Inferring Emotion From Human
Tumultuous History of the Search good place to start, see W. J. T. Facial Movements,” Psychological
for Artificial Intelligence (New York: Mitchell, Picture Theory: Essays Science in the Public Interest 20,
Basic Books, 1993), 88. on Verbal and Visual no. 1 (July 17, 2019): 1–68, https://
Representation, Paperback ed., doi.org/
[3] Minsky gets the credit for this [Nachdr.] (Chicago: University of 10.1177/1529100619832930.
idea, but clearly Papert, Sussman, Chicago Press, 2007).
and teams of “summer workers” [9] See, for example, Ruth Leys,
were all part of this early effort to [7] M. Lyons et al., “Coding Facial “How Did Fear Become a Scientific
get computers to describe objects Expressions with Gabor Wavelets,” Object and What Kind of Object Is
in the world. See Seymour A. in Proceedings Third IEEE It?”, Representations 110, no. 1
Papert, “The Summer Vision International Conference on (May 2010): 66–104, https://
Project,” July 1, 1966, https:// Automatic Face and Gesture doi.org/10.1525/rep.2010.110.1.66.
dspace.mit.edu/handle/ Recognition (Third IEEE Leys has offered a number of
1721.1/6125. As he wrote: “The International Conference on critiques of Ekman’s research
summer vision project is an Automatic Face and Gesture program, most recently in Ruth
attempt to use our summer Recognition, Nara, Japan: IEEE Leys, The Ascent of Affect:
workers effectively in the Comput. Soc, 1998), 200–205, Genealogy and Critique
construction of a significant part of https://doi.org/10.1109/ (Chicago and London: University of
a visual system. The particular AFGR.1998.670949. Chicago Press, 2017). See also
task was chosen partly because it Lisa Feldman Barrett, “Are
can be segmented into sub- Emotions Natural Kinds?”,
[8] As described in the AI Now Report Perspectives on Psychological
problems which allow individuals to 2018, this classification of
work independently and yet Science 1, no. 1 (March 2006):
emotions into six categories has its 28–58, https://doi.org/10.1111/
participate in the construction of a root in the work of the psychologist
system complex enough to be a j.1745-6916.2006.00003.x; Erika
Paul Ekman. “Studying faces, H. Siegel et al., “Emotion
real landmark in the development according to Ekman, produces an
of ‘pattern recognition’.” Fingerprints or Emotion
objective reading of authentic Populations? A Meta-Analytic
interior states—a direct window to Investigation of Autonomic
[4] Stuart J. Russell and Peter the soul. Underlying his belief was
Norvig, Artificial Intelligence: A Features of Emotion Categories.,”
the idea that emotions are fixed Psychological Bulletin, 20180201,
Modern Approach, 3rd ed, Prentice and universal, identical across
26
https://doi.org/10.1037/ Intelligence on Social Media,” Big Areas Act used four classes:
bul0000128. Data & Society 6, no. 1 (January “Europeans, Asiatics, persons of
2019): 205395171881956, https:// mixed race or coloureds, and
[10] Fei-Fei Li, as quoted in Dave doi.org/ ‘natives’ or pure-blooded
Gershgorn, “The Data That 10.1177/2053951718819569. individuals of the Bantu race”
Transformed AI Research—and (Bowker and Star, 197). Black
Possibly the World,” Quartz, July [15] These are some of the South Africans were required to
26, 2017, https://qz.com/1034972/ categories that have now been carry pass books and could not, for
the-data-that-changed-the- entirely deleted from ImageNet as example, spend more than 72
direction-of-ai-research-and- of January 24, 2019. hours in a white area without
possibly-the-world/. Emphasis permission from the government
added. [16] For an account of the politics of for a work contract (198).
classification in the Library of
[11] John Markoff, “Seeking a Better Congress, see Sanford Berman, [23] Bowker and Star, 208.
Way to Find Web Images,” The Prejudices and Antipathies: A Tract
New York Times, November 19, on the LC Subject Heads [24] See F. James Davis, Who Is
2012, sec. Science, https:// Concerning People (Metuchen, NJ: Black? One Nation’s Definition,
www.nytimes.com/2012/11/20/ Scarecrow Press, 1971). 10th anniversary ed. (University
science/for-web-images-creating- Park, PA: Pennsylvania State
new-technology-to-seek-and- [17] We’re drawing in part here on University Press, 2001).
find.html. the work of George Lakoff in
Women, Fire, and Dangerous [25] See Joy Buolamwini and Timnit
[12] Their paper can be found here: Things: What Categories Reveal Gebru, “Gender Shades:
Alex Krizhevsky, Ilya Sutskever, about the Mind (Chicago: Intersectional Accuracy Disparities
and Geoffrey E. Hinton, “ImageNet University of Chicago Press, in Commercial Gender
Classification with Deep 2012). Classification,” in Conference on
Convolutional Neural Networks,” in Fairness, Accountability, and
Advances in Neural Information [18] See Deng, Jia, Wei Dong, Transparency, 2018, 77–91, http://
Processing Systems 25, ed. F. Richard Socher, Li-Jia Li, Kai Li, proceedings.mlr.press/v81/
Pereira et al. (Curran Associates, and Li Fei-Fei, “Imagenet: A Large- buolamwini18a.html.
Inc., 2012), 1097–1105, http:// Scale Hierarchical Image
papers.nips.cc/paper/4824- Database” In 2009 IEEE [26] Michele Merler et al., “Diversity
imagenet-classification-with-deep- Conference on Computer Vision in Faces,” ArXiv:1901.10436 [Cs],
convolutional-neural-networks.pdf. and Pattern Recognition, pp. 248– January 29, 2019, http://arxiv.org/
255. abs/1901.10436.
[13] Released in the mid-1980s, this
lexical database for the English [19] Quoted in Allan Sekula, “The [27] “Webscope | Yahoo Labs,”
language can be seen as a Body and the Archive,” October 39 accessed August 28, 2019, https://
thesaurus that defines and groups (1986): 3–64, https://doi.org/ webscope.sandbox.yahoo.com/
English words into synsets, i.e., 10.2307/778312. catalog.php?
sets of synonyms. https:// datatype=i&did=67&guccounter=1.
wordnet.princeton.edu This project [20] Ibid; for a broader discussion of
takes place in a broader history of objectivity, scientific judgment, and [28] Olivia Solon, “Facial
computational linguistics and a more nuanced take on Recognition’s ‘Dirty Little Secret’:
natural-language processing photography’s role in it, see Millions of Online Photos Scraped
(NLP), which developed during the Lorraine Daston and Peter without Consent,” March 12, 2019,
same period. This subfield aims at Galison, Objectivity, Paperback ed. https://www.nbcnews.com/tech/
programming computers to (New York: Zone Books, 2010). internet/facial-recognition-s-dirty-
process and analyze large little-secret-millions-online-photos-
amounts of natural language data, scraped-n981921.
using machine-learning algorithms. [21] “UTKFace - Aicip,” accessed
August 28, 2019, http://
aicip.eecs.utk.edu/wiki/UTKFace. [29] Stephen Jay Gould, The
[14] See Geoffrey C. Bowker and Mismeasure of Man, revised and
Susan Leigh Star, Sorting Things expanded (New York: Norton,
Out: Classification and Its [22] See Paul N. Edwards and
Gabrielle Hecht, “History and the 1996). The approach of measuring
Consequences, First paperback intelligence based on skull size
edition, Inside Technology Technopolitics of Identity: The
Case of Apartheid South Africa,” was prevalent across Europe and
(Cambridge, Massachusetts and the US. For example, in France,
London: MIT Press, 2000): 44, Journal of Southern African
Studies 36, no. 3 (September Paul Broca and Gustave Le Bon
107; Anja Bechmann and Geoffrey developed the approach of
C. Bowker, “Unsupervised by Any 2010): 619–39, https://doi.org/
10.1080/03057070.2010.507568. measuring intelligence based on
Other Name: Hidden Layers of skull size. See Paul Broca, “Sur le
Knowledge Production in Artificial Earlier classifications used in the
1950 Population Act and Group crâne de Schiller et sur l’indice
27
cubique des crânes,” Bulletin de la [33] Jake Satisky, “A Duke Study cf19b956-60a2-11e9-
Société d’anthropologie de Paris, Recorded Thousands of Students’ b285-3acd5d43599e.
I° Série, t. 5, fasc. 1, p. 253-260, Faces. Now They’re Being Used
1864. Gustave Le Bon, L’homme All over the World,” The Chronicle, [37] Locker, “Microsoft, Duke, and
et les sociétés. Leurs origines et June 12, 2019, https:// Stanford Quietly Delete
leur développement (Paris: Edition www.dukechronicle.com/article/ Databases”
J. Rothschild, 1881). In Nazi 2019/06/duke-university-facial-
Germany, the “anthropologist” Eva recognition-data-set-study- [38] Full video here: Amarjot Singh,
Justin wrote about Sinti and Roma surveillance-video-students-china- Eye in the Sky: Real-Time Drone
people, based on anthropometric uyghur. Surveillance System (DSS) for
and skull measurements. See Eva Violent Individuals Identification,
Justin, Lebensschicksale artfremd [34] “2nd Unconstrained Face 2018, https://www.youtube.com/
erzogener Zigeunerkinder und Detection and Open Set watch?
ihrer Nachkommen [Biographical Recognition Challenge,” accessed time_continue=1&v=zYypJPJipYc.
destinies of Gypsy children and August 28, 2019, https://
their offspring who were educated vast.uccs.edu/Opensetface/;
in a manner inappropriate for their [39] Steven Melendez, “Watch This
Russell Stewart, Brainwash Drone Use AI to Spot Violence in
species], doctoral dissertation, Dataset (Stanford Digital
Friedrich-Wilhelms-Universit t Crowds from the Sky,” Fast
Repository, 2015), https:// Company, June 6, 2018, https://
Berlin, 1943. purl.stanford.edu/sx925dc9385. www.fastcompany.com/40581669/
watch-this-drone-use-ai-to-spot-
[30] “Figure Eight | The Essential [35] Melissa Locker, “Microsoft, violence-from-the-sky.
High-Quality Data Annotation Duke, and Stanford Quietly Delete
Platform,” Figure Eight, accessed Databases with Millions of Faces,”
August 28, 2019, https:// [40] James Vincent, “Drones Taught
Fast Company, June 6, 2019, to Spot Violent Behavior in Crowds
www.figure-eight.com/. https://www.fastcompany.com/ Using AI,” The Verge, June 6,
90360490/ms-celeb-microsoft- 2018, https://www.theverge.com/
[31] The authors made a backup of deletes-10m-faces-from-face- 2018/6/6/17433482/ai-automated-
the ImageNet dataset prior to database. surveillance-drones-spot-violent-
much of its deletion. behavior-crowds.
[36] Madhumita Murgia, “Who’s
[32] Their “MegaPixels” project is Using Your Face? The Ugly Truth [41] Ibid.
here: https://megapixels.cc/ about Facial Recognition,”
Financial Times, April 19, 2019,
https://www.ft.com/content/ [42] Gould, The Mismeasure of Man,
140.
28
ä