Professional Documents
Culture Documents
A Methodology To Characterize Bias and Harmful Stereotypes in Natural Language Processing in Latin America
A Methodology To Characterize Bias and Harmful Stereotypes in Natural Language Processing in Latin America
Abstract 1 Introduction
Automated decision-making systems, spe- In this section, we first describe the need to in-
cially those based on natural language pro- volve expert knowledge and social scientists with
cessing, are pervasive in our lives. They experience on discrimination within the core de-
arXiv:2207.06591v3 [cs.CL] 28 Mar 2023
are not only behind the internet search en- velopment of natural language processing (NLP)
gines we use daily, but also take more crit-
systems, and then we describe the guiding princi-
ical roles: selecting candidates for a job,
determining suspects of a crime, diagnos- ples of our methodology to do so. We finish this
ing autism and more. Such automated sys- section with a plan for the paper that briefly de-
tems make errors, which may be harmful in scribes the content of each section.
many ways, be it because of the severity of
the consequences (as in health issues) or be- 1.1 The motivation for our methodology
cause of the sheer number of people they Machine learning models and data-driven systems
affect. When errors made by an automated are increasingly being used to support decision-
system affect a population more than other,
making processes. Such processes may affect fun-
we call the system biased.
damental rights, like the right to receive an edu-
Most modern natural language technologies cation or the right to non-discrimination. It is im-
are based on artifacts obtained from enor-
portant that models can be assessed and audited to
mous volumes of text using machine learn-
ing, namely language models and word em- guarantee that such rights are not compromised.
beddings. Since they are created applying Ideally, a wider range of actors should be able to
subsymbolic machine learning, mostly arti- carry out those audits, especially those that are
ficial neural networks, they are opaque and knowledgeable of the context where systems are
practically uninterpretable by direct inspec- deployed or those that would be affected.
tion, thus making it very difficult to audit In this paper, we argue that social risk mitiga-
them.
tion can not be approached through algorithmic
In this paper we present a methodology calculations or adjustments, as done in a large part
that spells out how social scientists, domain of previous work in the area. Thus, the aim is to
experts, and machine learning experts can
open up the participation of experts both on the
collaboratively explore biases and harm-
ful stereotypes in word embeddings and complexities of the social world and on commu-
large language models. Our methodology is nities that are being directly affected by AI sys-
based on the following principles: tems. Participation would allow processes to be-
1. focus on the linguistic manifestations
come transparent, accountable, and responsive to
of discrimination on word embeddings the needs of those directly affected by them.
and language models, not on the math- Nowadays, data-driven systems can be audited,
ematical properties of the models but such audits often require technical skills that
2. reduce the technical barrier for dis- are beyond the capabilities of most of the people
crimination experts with knowledge on discrimination. The technical
3. characterize through a qualitative ex- barrier has become a major hindrance to engaging
ploratory process in addition to a experts and communities in the assessment of the
metric-based approach behavior of automated systems. Human-rights de-
4. address mitigation as part of the train- fenders and experts in social science recognize and
ing process, not as an afterthought
criticize the dominant epistemology that sustains
AI mechanisms, but they do not master the tech- lead to more equitable, solid, responsible, and
nicalities that underlie these systems. This techni- trustworthy AI systems. It would enable the com-
cal and digital barrier impedes them from having prehension and representation of the historically
greater incidence in claims and transformations. marginalized communities’ needs, wills, and per-
That is why we are putting an effort to facilitate spectives.
reducing the technical barriers to understanding,
inspecting and modifying fundamental natural lan- 1.3 Our guiding principles and roadmap
guage technologies. In particular, we are focusing In this paper, we subscribe to the following guid-
on a key component in the automatic treatment of ing principles to detect and characterize the poten-
natural language, namely, word embeddings and tially discriminatory behavior of NLP systems.
large (neural) language models. In contrast with other approaches, we believe
the part of the process that can most contribute to
1.2 Language technologies convey harmful bias assessment are not subtle differences in met-
stereotypes rics or technical complexities incrementally added
to existing approaches. We believe what can most
Word embeddings and large (neural) language contribute to an effective assessment of bias in
models (LLMs) are a key component of modern NLP is precisely the linguistic characterization of
natural language processing systems. They pro- the discrimination phenomena.
vide a representation of words and sentences that Thus our main goals are are:
has boosted the performance of many applications,
functioning as a representation of the semantics of 1. focus on the linguistic manifestations of dis-
utterances. They seem to capture a semblance of crimination, not on the mathematicl proper-
the meaning of language from raw text, but, at the ties of the inferred models
same time, they also distill stereotypes and soci-
etal biases which are subsequently relayed to the 2. reduce the technical barrier
final applications. 3. characterize instead of diagnose, proposing
Several studies found that linguistic represen- a qualitative exploratory process instead of a
tations learned from corpora contain associations metric-based approach
that produce harmful effects when brought into
practice, like invisibilization, self-censorship or 4. mitigation needs to be addressed as part of
simply as deterrents (Blodgett et al., 2020). The the whole development process, and not re-
effects of these associations on downstream appli- duced to a the bias assessment after the de-
cations have been treated as bias, that is, as sys- velopment has been completed
tematic errors that affect some populations more
5. allows to inspect central artifacts encoding
than others, more than could be attributed to a ran-
linguistic knowledge, namely, word embed-
dom distribution of errors. This biased distribution
dings and language models
of errors results in discrimination of those popula-
tions. Unsurprisingly, such discrimination often In these principles, we keep in mind the spe-
affects negatively populations that have been his- cific needs of the Latin American region. In Latin
torically marginalized. America, we need domain experts to be able to
To detect and possibly mitigate such harm- carry out these analyses with autonomy, not re-
ful behaviour, many techniques for measuring lying on an interdisciplinary team or on training,
and mitigating the bias encoded in word em- since both are usually not available.
beddings and LLMs have been proposed by The paper is organized as follows. Next sec-
NLP researchers and machine learning practition- tion develops on the fundamental concept of bias,
ers (Bolukbasi et al., 2016; Caliskan et al., 2017). how bias materializes in the automated treatment
In contrast social scientists have been mainly re- of language, and some artifacts that currently con-
duced to the ancillary role of providing data for centrate much linguistic knowledge and also en-
labeling rather than being considered as part of the code stereotypes and bias, namely, word embed-
system’s core (Kapoor and Narayanan, 2022). dings and large (neural) language models (LLMs).
We believe the participation of experts on social Then, section 3 presents relevant work in the
and cultural sciences in the design of IA would area of bias diagnostics and mitigation, and shows
how they fall short in integrating expert linguistic cations such as text auto-completion or automatic
knowledge and making it the central issue in bias translation, and have been shown to improve the
characterization. Section 5 explains our method- performance of virtually any natural language pro-
ology in a worked out user story illustrating how it cessing task they have been applied to. The prob-
puts the power to diagnose biases in the hands of lem is that, even if their impact in performance
people with the knowledge and the societal roles is overall positive, they are systematically biased.
to have an impact on current technologies. We Thus, even if they improve general performance,
conclude with a summary of our approach, a de- they may damage communities that are the dis-
scription of our vision and next steps for this line criminated by those biases.
of work. Word embeddings and LLMs are biased be-
In the Annex A we provide two sets of case cause they are obtained from large volumes of
studies in which two groups of users with differ- texts that have underlying societal biases and prej-
ent profiles applied this methodology to carry out udices. Such biases are carried into the representa-
an exploration of biases, and the observations on tion which are thus transferred to applications. But
usability and requirements that we obtained. since these are complex, opaque artifacts, work-
Our methodology uses the software ing at a subsymbolic level, it is very difficult for a
we implemented, available at https: person to inspect them and detect possible biases.
//huggingface.co/spaces/vialibre/ This difficulty is even more acute for people with-
edia. out extensive skills in this kind of technologies. In
spite of that opacity, readily available pre-trained
2 Fundamental concepts embeddings and LLMs are widely used in socio-
2.1 Bias in automated decision-making productive solutions, like rapid technological so-
systems lutions to scale and speed up the response to the
COVID19 pandemic (Aigbe and Eick, 2021).
In automated decision making systems, the term
In the last decade, the techniques used behind
bias refers to errors that are distributed unevenly
natural language processing systems have under-
across categories in an uninteded way that cannot
gone profound transformations. In recent years,
be attributed to chance, that is, they are not ran-
data science has learned to extract the meaning of
dom but systematic. Such bias is considered prob-
words (using techniques such as word embedding)
lematic when errors create unfair outcomes, with
through computational models in an effective way.
negative effects in the world.
Before these technologies were available, linguists
Although the buzzword "algorithmic bias" has
and specialists in the field encoded large dictio-
been in the spotlight in the last years, most of the
naries by hand and uploaded information about
bias in machine learning comes from the examples
the meaning of words to the net. In sociohistor-
used to train models. Bias can be found in any part
ical terms, this has profound implications, since it
of the machine learning life cycle: the framing of
places us at the hinge of a new phase in which the
the problem, which determines which data will be
generation of lexical knowledge bases no longer
collected, the actual collection, with more or less
requires language experts, but now occurs auto-
resources devoted to guarantee the quality of the
matically through models that learn from data.
data, the curation of the collected data, which can
result in hugely different representations... when
2.3 Word Embeddings
examples arrive to the point where different algo-
rithms may use them, they have undergone such Word embeddings are widely used natural lan-
transformations that the room for bias left to dif- guage processing artifacts that aim to represent
ferences in algorithms is, by comparison, residual. the behavior of words, and thus approximate their
meaning, fully automatically, based on their usage
2.2 Bias encoded in NLP artifacts in large amounts of naturally occurring text. This
In NLP, bias is not only encoded in the classifier, is why it is necessary to have large volumes of text
but also in constructs that encode lexical mean- to obtain word embeddings.
ing irrespective of the final application. In cur- Currently, the most popular word embeddings
rent applications, word embeddings and LLMs are are based on neural networks. Training a word
widely used. They are a key component of appli- embedding consists in obtaining a neural network
that minimizes the error to predict the behavior of
words in context. Such networks are obtained with
training examples artificially created from natu-
rally occurring text. Examples are created by re-
moving, or masking one word in a naturally oc-
curring sentence, then having the neural network
try to guess it. Billions and billions of such exam-
ples can be automatically generated from the text
that is publicly available in the internet. With such
huge amount of examples, word embeddings have
been able to model the behavior of words in quite
a satisfactory manner, and have been successfully
applied to improve the performance in many NLP
tools. Figure 1: A representation of a word embedding in two
dimensions, showing how words are closer in space ac-
The gist of word embeddings consists in rep- cording to the proportion of co-occurrences they share.
resenting word meaning as similarity between
words. Words are considered similar if they of-
ten occur in similar linguistic contexts, more con- cape the expressive power of word embeddings,
cretely, if they share a high proportion of contexts like relations between words (i.e. agreement, co-
of co-occurrence. Contexts of co-occurrence are reference), and the behavior of words in context,
usually represented as the n words that occur be- not in isolation. Indeed, word embeddings repre-
fore (and after) the target word being represented. sent the behavior of words just by themselves, not
In some more sophisticated structures, contexts in relation to their context.
may include some measure of word order or syn- Just like word embeddings, LLMs are obtained
tactic structures. However, most improvements in using neural networks and large amounts of natu-
current word representations have been obtained rally occurring text. LLMs are usually trained us-
not by adding explicit syntactic information but by ing the following tasks: masking a word that needs
optimizing n for the NLP task (from a few words to be guessed by the neural network (similar to a
to a few dozen words) (Lison and Kutuzov, 2017). fill in the blanks activity when learning a foreign
Once words are represented by their contexts language), by predicting the next words in a sen-
of occurrence (in a mathematical data structure tence (similar to a complete the sentence activity
called vector), the similarity between words can when learning a foreign language), or a combi-
be captured and manipulated as a mathematical nation of these two. The resulting LLMs can be
distance, so that words that share more contexts used for different applications, depending on their
are closer, and words that share less contexts are training. LLMs can be used as automatic language
farther apart, as seen in Figure 1. To measure generators, applied to provide replies in a conver-
distance, the cosine similarity is used (Singhal, sation or to complete texts after a prompt.
2001), although other distances can be used as Thus LLMs capture a different perspective on
well. The cosine similarity is useful because it is the behaviour of words, a perspective that can
higher when words occur in the same contexts, and account for finer-grained, subtler behaviors of
lower when they don’t. words. For example, multi-word expressions,
word senses
2.4 Large Language Models (LLMs)
3 A critical survey of existing methods
Just as word embeddings, large language models
(LLMs) are pervasive in applications that involve In the last years the academic study of biases
the automated treatment of natural language. In in language technology has been gaining grow-
fact, LLMs are used more than word embeddings ing relevance, with a variety of approaches ac-
nowadays. LLMs are artifacts that represent not companied by insightful critiques (Nissim et al.,
individual words, but sequences of words, for ex- 2020). Most approaches build upon the experience
ample, sentences or dialogue turns. Thus, they are of early proposals (Bolukbasi et al., 2016; Gonen
able to account for linguistic phenomena that es- and Goldberg, 2019).
Framework Reference Word Embeddings Language models Requieres NLP Mitigation Counterfactuals
Analysis Analysis Knowledge Techniques Analysis
Implemented
WordBias Ghai et al. (2021) 3 7 7 7 7
VERB Rathore et al. (2021) 3 7 3 3 7
LIT Tenney et al. (2020) 3 3 3 7 3
Table 1: Description of frameworks with graphical interfaces available for bias analysis of embeddings or language
models. The What-if Tool is not included in the table due to is not it is not specifically targeting text data.
In this section, we present a critical survey of tance between vectors and does not allow the in-
previous academic work in bias assessment and corporation of other metrics. Until October 2022,
we situate our proposal with respect to them. The this framework is only available to analyze the
section is organized as follows. First, we present word2vec embedding, without having the possi-
existing tools for bias exploration. Second, we re- bility to introduce other embeddings or models.
view previous work that highlights the importance The Visualizing of embedding Representations
of the quality of benchmarks for bias assessment. for deBiasing system (VERB) (Rathore et al.,
The benchmarks for bias assessment are list of 2021) is an open-source graphical interface frame-
words (to debias) for word embeddings and lists of work that aims to study word embeddings. VERB
stereotype and anti-stereotype sentences for lan- enables users to select subsets of words and to
guage models. Moreover, we argue that most pre- visualize potential correlations. Also, VERB is
vious work has focused on a rather narrow set of a tool that helps users gain an understanding of
biases and languages. Then, we discuss the risks the inner workings of the word embedding debias-
and limitations of previous work which focuses on ing techniques by decomposing these techniques
developing algorithms for measuring and mitigat- into interpretable steps and showing how words
ing biases automatically. Finally, we discuss the representation change using dimensionality reduc-
role that training data play in the process and re- tion and interactive visual exploration. The target
view work that focuses on the data instead of fo- of this framework is, mainly, researchers with an
cusing on the algorithms. NLP background, but it also helps NLP starters as
an educational tool to understand some biases mit-
3.1 Existing frameworks for bias exploration igations techniques in word embeddings.
Multiple frameworks were developed in the last What-if tool (Wexler et al., 2019) is a frame-
years for bias analysis. Most of them require mas- work that enables the bias analysis correspond-
tery of machine learning methods and program- ing to a diverse kind of data. Although it is not
ming knowledge. Others have a graphic interface focused on text data it allows this type of input.
and may reach a larger audience. However, hav- What-if tool offers multiple kinds of analysis, vi-
ing a graphical interface is certainly not enough sualization, and evaluation of fairness through dif-
to guarantee an analysis without relying on re- ferent metrics. To use this framework researchers
searchers with machine learning skills. In order with technical skills will be required to access the
to create interdisciplinary bias analysis, it will be graphic interface due to is through Jupyter/ Co-
relevant that the analysis options present in the lab Notebooks, Google Cloud, or Tensorboard,
graphical interfaces, and data visualizations and and, also, because multiple analysis options re-
manipulation are oriented toward a non-technical quire some machine learning knowledge (e.g, se-
audience. In the following, we give a brief descrip- lections between AUC, L1, L2 metrics). Own
tion of some of the frameworks with graphical in- models can be evaluated but since it is not text-
terfaces available for bias analysis of embeddings specific, it is not clear how the evaluation of words
or language models. or sentences will be. This tool allows the evalua-
WordBias (Ghai et al., 2021) is a framework tion of fairness through different metrics.
that aims to analyze embeddings biases by defin- The Language Interpretability Tool (LIT) (Ten-
ing lists of words. In WordBias, new variables and ney et al., 2020) is an open-source platform for vi-
lists of words may be defined. This framework al- sualization and analysis of NLP models. It was
lows the analysis of intersectional bias. The bias designed mainly to understand the models’ pre-
evaluation is done by a score based on cosine dis- dictions, to explore in which examples the model
Figure 2: A list of 16 words in English (left) and a translation to Spanish (right) and the similarity of their word
embeddings with respect to the list of words “woman, girl, she, mother, daughter, feminine” representing the
concept "feminine", the list “man, boy, he, father, son, masculine” representing "masculine", and translations for
both to Spanish. The English word embedding data and training is described in Bolukbasi et al. (2016) and the
Spanish in by (Cañete et al., 2020). From the 16 words of interest, in English, 8 are more associated to the concept
of "feminine", while in Spanish only 5 of them are. In particular, "nurse" in Spanish is morphologically marked
with masculine gender in the word “enfermero” so, there is some degree of gender bias that needs to be taken into
account to fully account for the behavior of the word. This figure illustrates that methodologies for bias detection
developed for English are not directly applicable to other languages. Also, the figure illustrates that the observed
biases depend completely on the list of words chosen.
underperforms, and to investigate the consistency are detected and mitigated, but they are not cen-
behavior of the models by analyzing controlled tral in the efforts devoted to this task, as argued
changes in data points. LIT allows users to add in Antoniak and Mimno (2021). The methodolo-
new datapoints on the fly, to compare two models gies for choosing the words to make these lists are
or data points, and provides local explanations and varied: sometimes lists are crowd-sourced, some-
aggregated analysis. However, this tool requires times hand-selected by researchers, and some-
extensive NLP understanding from the user. times drawn from prior work in the social sci-
Badilla et al. (2020) is an open source Python ences. Most of them are developed in one specific
library called WEFE which is similar to Word- context and then used in others without reflection
Bias in that it allows for the exploration of biases on the domain or context shift.
different to race and gender and in different lan-
guages. One of the focuses of WEFE is the com- Motivated by the aforementioned gap informa-
parison of different automatic metrics for biases tion and convinced that the origins and rationale
measurement and mitigation. As WEFE, Fair- underlying the lists’ selection must be explicit,
Learn (Bird et al., 2020) and responsibly (Hod, tested, and documented, Antoniak and Mimno
2018) are Python libraries that enable auditing (2021) proposed a systematic framework that en-
and mitigating biases in machine learning systems. ables the analysis of the sources of information
However, in order to use these libraries, python and the characteristics of the instability of words
programming skills are needed as it doesn’t pro- in the lists, providing a detailed report about their
vide a graphical interface. possible vulnerabilities. They detailed three char-
acteristics that can arise in vulnerabilities of the
3.2 On the importance of benchmarks lists of words. In the first place, the reductionist
All approaches to assess bias on word embeddings definition, i.e., when words are encrypted in tradi-
relies on lists of words to define the space of bias tional categories. This can reinforce harmful defi-
to be explored (Badilla et al., 2021). These words nitions or preconceptions of the categories. In the
have a crucial impact on how and which biases second place, the imprecise definitions, i.e., the in-
discriminate use of pre-existing datasets and over- list of words selected to analyze bias have a strong
reliance on category labels from previous works impact on the bias that is shown by the analysis.
that can lead to errors. Besides, these stigmas can
be enhanced with other attributes (such as gender 3.3 On the importance of linguistic difference
or age) increasing the prejudice towards certain
As illustrated by the previous example, linguistic
social groups. In the third place, the lexical fac-
differences have a big impact on the results ob-
tors, i.e., the words’ frequency and context which
tained by the methodology to assess bias. Repre-
were selected may influence the bias analysis.
senting language idiosincracies is a crucial goal in
Several recent efforts have focused on bench- characterizing biases, first, because we want to fa-
mark datasets consisting of pairs of contrastive cilitate these technologies to a wider range of ac-
sentences to evaluate bias in language models. As tors. Secondly, because to model bias in a given
for methods relevant for word embeddings, these context or a given culture you need to do it in the
benchmarks are often accompanied by metrics language of that culture.
that aggregate an NLP system’s behavior on these Different approaches have been proposed to
pairs into measurements of harms. Blodgett et al. capture specific linguistic phenomena. A paradig-
(2021) examine four such benchmarks and apply a matic example of linguistic variation are lan-
method—originating from the social sciences—to guages with morphologically marked gender,
inventory a range of pitfalls that threaten these which can get confused with semantic gender to
benchmarks’ validity as measurement models for some extent. Most of the proposals to model
stereotyping. They find that these benchmarks gender bias in languages with morphologically
frequently lack clear articulations of what is be- marked gender add some technical construct that
ing measured, and they highlight a range of am- captures the specific phenomena. That is the case
biguities and unstated assumptions that affect how of Zhou et al. (2019), who add an additional space
these benchmarks conceptualize and operational- to represent morphological gender, independent of
ize stereotyping. Névéol et al. (2022) propose how the dimension where they model semantic gender.
to overcome some of this challenges by taking a This added complexity supposes an added diffi-
culturally aware standpoint and a curation method- culty for people without technical skills.
ology when designing such benchmarks. However, it is not strictly necessary to add tech-
Most previous work uses word or sentence lists nical complexity to capture these linguistic com-
developed for English, or direct translations from plexities. A knowledgeable exploitation of word
English that do not take into account structural dif- lists can also adequately model linguistic particu-
ferences between languages (Garg et al., 2018). larities. In the work presented here, we adapted the
For example, in Spanish almost all nouns and ad- approach to bias assessment proposed by Boluk-
jectives are morphologically marked with gender, basi et al. (2016), resorting to its implementation
but this is not the case in English. Figure 2 il- in the responsibly toolkit. The toolkit allows for a
lustrates the differences in lexical biases measure- visual and interactive exploration of binary bias as
ments between translations of lists of words in illustrated in Figure 2 and explained above.
English to Spanish over two different word em- Bolukbasi et al. (2016)’s approach was origi-
beddings in each language: the English embed- nally developed for English, and did not envis-
ding is described in Bolukbasi et al. (2016) and age morphologically marked gender or the differ-
the Spanish in Cañete et al. (2020). From the 16 ent usage of pronouns in other languages. In order
words analyzed, in English, 8 are more associated to apply it to Spanish, we propose that the follow-
to the "feminine" extreme of the bias space, while ing considerations are necessary.
in Spanish only 5 of them are. The 3 words with First, the extremes of bias cannot be defined by
different positions are “nurse, care and wash”. In pronouns alone, because the pronouns do not oc-
particular, "nurse" in Spanish is morphologically cur as frequently or in the same functions in Span-
marked with masculine gender in the word “enfer- ish as in English. Therefore, the lists of words
mero”, so it is not gender neutral. This figure illus- defining the extremes of the bias space need to
trates two things. First, the fact that methodologies be designed for the particularities of Spanish, not
for bias detection developed for English are not di- translated as is.
rectly applicable to other languages. Second, the Second, with respect to the lists of words of
help to understand the implications of using his-
torical datasets to train models that will be used
to predict data that is markedly different from the
training data (Antoniak and Mimno, 2021).
Figure 3: An assessment of how bias affects gendered
words in Spanish. It can be seen that "female nurse", 3.3.1 Assessing bias in word embeddings
"enfermera" is much more strongly associated to the Our core methodology to assess biases in word
feminine extreme of bias than "male nurse", "enfer-
embeddings consists of three main parts (illus-
mero" is associated to the masculine extreme. In con-
trast "woman", "mujer is symetrically associated to fe- trated in Figure 2 described in the previous sec-
male than "man", "hombre to masculine. tion). The three parts are the following.
Integrate information about diverse aspects account for those aspects of words. This has the
of linguistic constructs and their contexts. added advantage of being able to inspect llms.
• provide context: which corpora, concrete 4.2 The EDIA prototype for bias exploration
contexts of occurrence (concordances), to get
This section provides a detailed explanation
a more accurate idea of actual uses or mean-
of the EDIA prototype. This prototype in-
ings, even those that may have not been taken
corporates the guiding principles described in
into account.
the section 4.1. EDIA is a visual inter-
• provide information on statistical properties face framework for the analysis of bias in
of words (mostly number of occurrences in word embeddings and in LLMs that is cur-
the corpus, and relative frequency in differ- rently available at https://huggingface.
ent subcorpora), that may account for unsus- co/spaces/vialibre/edia. This proto-
pected behavior, like infrequent words being type is hosted in huggingface so that it is easier to
strongly associated to other words merely by import pre-trained models and to offer our tool to
chance occurrences. the NLP community of practitioners around hug-
gingface. The EDIA visual interface is formed by
• position with respect to other words in the four tabs. These are: Data, Explore Words, Biases
embedding space, and most similar words. in Words , and Biases in Sentences. This frame-
More complex bias two-way instead of one- work is designed to allow the users to switch be-
way, intersectionality tween tabs while performing the bias analysis, al-
lowing them to utilize the exploration in one tab to
More complex representation of linguistic phe- feedback their analysis in the other tabs.
nomena word-based approaches are oversim- In the following, we will describe each tab in
plistic, and cannot deal with polysemy (the am- detail.
biguity or vagueness of words with respect to the
meanings they may convey) or multiword expres- 4.2.1 Data tab
sions. That is why we need more context. Inspect- Huge datasets are used to train word embeddings
ing LLMs instead of word embeddings allows to and natural LMs. So, it is relevant to explore the
Figure 5: The EDIA visual interface, with the Explore Words tab visible. (A) Displays the input panel for the list
of words of interest, and for the lists that represent each stereotype. The color next to each list (that represents a
stereotype) is the one with which the words are displayed. (B) Displays the visualization panel. Three configuration
options are available: word-related visualization, transparency, and font size. The list of words selected in the input
panel (A) are plotted in the plane, the closeness of words indicates that they are used similarly in the corpus on
which the word embedding was trained. If the "visualize related words" option is selected, the related words will
appear in the same color but more transparently than the word to which they are related.
way the words are used in the datasets due to bi- collections. Figure 4C shows the frequency of ap-
ases or relations between words that appear on the pearance of the word of interest in each of the 15
corpus may be learned by the embeddings or lan- collections. The percentages on the left of each
guage models. collection are calculated concerning the total fre-
The main objectives of the Data tab are to ex- quency shown in Figure 4B. To get a better under-
plore the frequency of appearance of a word in the standing of the word of interest in the corpus, the
corpus as well as to explore the context of those analysis of contexts (i.e., the sentences that appear
occurrences. First, the word of interest is entered in the corpus that contain the word of interest) is
(Figure 4A). Then, a graphic is visualized (Fig- usually required. In this tab, it is possible to re-
ure 4B). This graphic depicts the frequency of ap- trieve some of the contexts corresponding to the
pearance of each word in the corpus, while the or- word of interest. To do so the maximum amount
ange line shows the frequency of the selected word of context must be selected using the slider (five
in the corpus. Words with a higher frequency in by default) and the dataset collections from which
the corpus (i.e., easier to find in the corpus) will the user would like to get the contexts must be se-
have the orange line further to the right, whereas lected (Figure 4D). Once the "search for context"
those words that hardly ever appear in the corpus button is pressed a table showing the list of con-
will have the orange line further to the left. texts appears (Figure 4E). Each time the button is
In the default version of our tool, the objective pressed a random sample of contexts are shown.
is to explore the Word2Vec from Spanish Billion
Word Corpus embedding (Cardellino, 2019). This 4.2.2 Explore Words tab
embedding was trained over the Spanish Unanno- Both word embeddings and language models, after
tated Corpora (Cañete, 2019) formed by 15 text being trained, construct a meaning for each word
Figure 6: The EDIA visual interface, with the "Biases in Words" tab visible. (A) Displays the input panel. In t
(B) Displays the visualization panel. In this figure, we show the graphic that appears after selecting the "Plot 4
Stereotypes!" option. The words of interest are plotted on a 2-dimensional graphic. Here, the x-axis represents the
Word Lists 1 and 2 stereotypes. Words of interest that are plotted more to the right are more related to Word List
1 stereotypes, and words of interest that are plotted more to the left are more related to Word List 2. Furthermore,
the y-axis in this graphic represents Word Lists 3 and 4. Words that are plotted more above mean that they are
more related to Word List 3, whereas words that are plotted more below mean that they are more related to Word
List 4.
in the vocabulary. This tab aims to analyze the 4.2.3 Biases in Words tab
meaning constructed by the model for a set of se-
The main objective of the Biases in Words tab is
lected words of interest.This tab enables the visu-
to analyze bias in a set of words of interest from
alization of a list of words of interest and lists of
a word embedding. Each type of bias to be an-
words that represent different stereotypes in a 2-
alyzed is considered as having two possible ex-
dimensional space.
tremes (e.g., for age the two extremes can be old
and young). Each extreme is represented by a list
of words.
The Explore Words tab is formed by two main The Biases in Words tab enables the study of
panels ( 5A and B). The input panel on the left up to two types of bias, allowing an intersectional
(Figure 5A) is where the user enters the different analysis. This tab is formed by two panels: the in-
lists of words. The rectangle next to each group put and the visualization panels. The input panel
of words indicates the color with which each list (Figure 6A) enables the entry of the list of the
of words is going to be visualized. On the right words of interest as well as the list of words that
(Figure 5B), the graphics are visualized. In this represent each extreme of bias (or stereotype) to
panel, three configuration options are available: be analyzed. The visualization panel (Figure 6B)
word-related visualization, transparency, and font is where the graphics are displayed. Two graphics
size. Once the selections are made and the "Vi- are available: one that enables the analysis of the
sualize" button is pressed, the visualization of the words of interest concerning two stereotypes (the
group of words appears. If the "visualize related “Plot 2 stereotypes!” option) and another that en-
words" option is selected, the related words ap- ables the analysis of the list of words concerning
pear in the same color but more transparently than four stereotypes (the “Plot 4 stereotypes!” option).
the word to which they are related. The closeness When the "Plot 2 stereotypes!" is selected, a
of words in the 2-dimensional chart indicates that bar plot is depicted with the list of words of in-
these are used similarly in the corpus on which the terest on the y-axis and the stereotypes on the x-
word embedding was trained. axis. The intensity and length of the bar indicate
Figure 7: The EDIA visual interface, with the "Biases in Sentences" tab visible. (A) Shows the input panel. Here,
the sentence to be evaluated is entered. The sentence must contain a blank, represented by a "*". The list of
words of interest and the unwanted words list can be entered in this panel. (B) Displays a list of sentences and
their corresponding probabilities, which indicate the likelihood of the sentences being completed with each word
of interest based on the language model. If no words of interest are entered, the tool will fill the blank in the
sentence with the first five words that the language model considers more likely to occupy the space. In the case
that unwanted words were entered, these are not considered in the analysis.
towards which extreme of bias the words of inter- terest according to the language model. When no
est are associated. When the "Plot 4 stereotypes!" list of words is entered, the tool will fill the blanks
is selected (Figure 6B), each word of interested is in the initial sentence with the first five words that
plotted on the plane. The position of the word in- the language model considers more likely to oc-
dicates towards which extreme of bias the word cupy the space.
is more related to. For example, in Figure 6 B
the word "pelear" (that means to fight in spanish) 5 User story in an NLP application
is more biased towards with man and old stereo-
types, whereas the word "maquillar" (that means Up to this point we have motivated the need for
to put on makeup in spanish) is biased towards to bias assessment in language technologies and in
the woman and young stereotype. word embeddings and LLMs in particular; we
have explained our differences in the framing of
4.2.4 Biases in Sentences tab the solution with respect to existing tools; we have
The Biases in Sentences tab aims to explore biases discussed the unnecessary technical barriers of ex-
in LLMs, using BETO (Cañete et al., 2020) as the isting approaches, that hinder the engagement of
default language model. actual experts in the exploration process; and we
The main idea in the Biases in Sentences tab presented the software tool that we developed to
is to select a sentence that contains a blank (in allow social scientists and data scientists to collab-
the prototype, the blank is represented with a *) oratively explore biases and define lists of words
and compare the probabilities obtained by the lan- and lists of sentences that are useful to detect bi-
guage model by filling the blank with different ases in a particular domain. In this Section we
words. The user is able to select a list of words describe a user story that presents a paradigmatic
of interest and a list of unwanted words (Fig- process of bias exploration and assessment.
ure 7A). The words entered in the unwanted word We would like to note that this user story was
list will be dismissed from the analysis. Also, ar- originally developed to be situated in Argentina,
ticles, prepositions and conjunctions can be ex- the local context of this project. It was distilled
cluded from the analysis. If the words of inter- from experiences with data scientists and experts
est are entered by the user, the output panel (Fig- in discrimination that are described in Annex A.
ure 7B) shows a list of sentences with their cor- However, in order to make understanding easier
responding probabilities on the right. These prob- for non-Spanish speaking readers, we adapted the
abilities indicate the possibility of the occurrence case to work with English, and consequently lo-
of the sentences completed with each word of in- calized the use case as if it had happened in the
United States. Finding systematic errors. Intrigued by this
behaviour of the automatic pipeline, she makes
The users. Marilina is a data scientist working a more thorough research into how requests by
on a project to develop an application that helps immigrants are classified, in comparison with re-
the public administration to classify citizens’ re- quests by non-immigrants. As she did for Latin
quests and route them to the most adequate de- American requests, she finds that documents pre-
partment in the public administration office she sented by other immigrants have a higher error rate
works for. Tomás is a social worker within the than the non immigrants requests. She suspects
non-discrimination office, and wants to assess the that other societal groups may suffer from higher
possible discriminatory behaviours of such soft- error rates, but she focuses on Latin American im-
ware. migrants because she has a better understanding
of the idiosyncrasy of that group, and it can help
her establish a basis for further inquiry. She finds
The context. Marilina addresses the project as a
some patterns in the misclassifications. In partic-
supervised text classification problem. To classify
ular, she finds that some particular business, like
new texts from citizens, they are compared to doc-
hairdressers or bakeries, accumulate more errors
uments that were manually classified in the past.
than others.
New texts are assigned the same label as the doc-
ument that is most similar. Calculating similar- Finding the component responsible for bias.
ity is a key point in this process, and can be done She traces the detail of how such documents are
in many ways: programming rules explicitly, via processed by the pipeline and finds that they are
machine learning with manual feature engineer- considered most similar to other documents that
ing or by deep learning, where a key component are not related to professional activities, but to im-
is word embeddings. Marilina observes that the migration. The word embedding is the pipeline
latter approach has the least classification errors component that determines similarities, so she
on the past data she separated for evaluation (the looks into the embedding with the EDIA toolkit1 .
so called test set). Moreover, deep learning seems She defines a bias space with "Latin American" in
to be the preferred solution these days, it is of- one extreme and "North American" in the other,
ten presented as a breakthrough for many natural and checks the relative position of some profes-
language processing tasks. So Marilina decides to sions with respect to those two extremes, as can
pursue that option. be seen in Figure 8, on the left. This graph is
An important component of the deep learning generated using the button called "Find 2 stereo-
approach she uses are word embeddings. Marilina types" in the tab. She finds that, as she suspected,
decides to try a well-known word embedding, pre- some of the words related to the professional field
trained on Wikipedia content. When she integrates are more strongly related to words related to Latin
it in the pipeline, there is a boost in the perfor- American than to words related to North Ameri-
mance of the system: more texts are classified cor- can, that is, words like "hairdresser" and "bakery"
rectly in her test set. are closer to Latin American. However, the words
more strongly associated to North American do
Looking for bias. Marilina decides to look at not correspond to her intuitions. She is at a loss
the classification results beyond the figures of clas- as to how to proceed with this inspection beyond
sification precision. Being a descendant of Latin the anecdotal findings, and how to take action with
American immigrants, she looks at documents re- respect to the findings. That is when she calls for
lated to this societal group. She finds that applica- help to the non-discrimination office.
tions for small business grants presented by Latin
Assessing harm. The non-discrimination office
American immigrants or citizens of Latin Ameri-
appoints Tomás for the task of assessing the dis-
can descent are sometimes erroneously classified
criminatory behavior of the software. Briefed by
as immigration issues and routed to the wrong de-
Marilina about her findings, he finds that misclas-
partment. These errors imply a longer process to
sifications do involve some harm to the affected
address these requests in average, and sometimes
people that is typified among the discriminatory
misclassified requests get lost. In some cases, this
1
mishap makes the applicant drop the process. https://github.com/fvialibre/edia
Figure 8: Different characterizations of the space of bias "Latin American" vs "North American", with different
word lists created by a data scientist (left) and a social scientist (right), and the different effect to define the bias
space as reflected in the position of the words of interest (column in the left).