You are on page 1of 16

Gender Bias in Artificial Intelligence- non technical and technical approaches.

ABSTRACT:

While organizations such as Amazon have used machine learning-based algorithms for
their hiring processes, diverse employees are not equitably hired due to biased
datasets. This paper dives into the issue, some non technical approaches including
deploying a more biased data set, work force, and increased transparency in
algorithms, and technical approaches including an approach that takes inspiration from
the classical GAN architecture -two neural networks are created along with an
adversarial network, and modifying the captions in existing image datasets to balance
gender representation, by swapping gendered terms and evaluating the resulting impact
on the performance of image captioning models which can be applied to real-world
situations such as hiring employees or approving loans.

● INTRODUCTION:

Gender bias and AI- we have all heard of these 2 terms before. Well, here is a brief
overview anyways- Gender bias refers to the unfair and unequal treatment or
perceptions of individuals based on their gender, often resulting in discrimination against
individuals of a certain gender. It can be directed towards both men and women, but it is
most commonly associated with discrimination against women, particularly in areas
such as employment, education, and access to healthcare.
AI (Artificial Intelligence) refers to a technique which enables machines to mimic human
behaviour.

Why is this topic relevant?

Gender is an important topic in Natural Language Processing. It effects the data sets
that we train on and the models we produce. A classic example is the translation from
Turkish to English that was given by google translator. The turkish side had a gender
neutral pronoun, but it got translated to he is a doctor without recognising that it could
be he or she. This has now been fixed, with a tagline saying translations are gender
specific and neutral pronoun could become he or she. However, this is a relevant
example to establish the topic we are addressing here.
AI has become increasingly important in today's world due to the vast amounts of data
that are generated by individuals, businesses, and organizations. AI systems can
analyze and make sense of this data, allowing companies to gain insights into customer
behavior, optimize their operations, and make more informed decisions. In addition, It
has the potential to revolutionize a wide range of industries, including healthcare,
finance, transportation, and education.

Gender bias in AI is a significant problem because it undermines the fairness and


accuracy of AI systems. Biased outcomes can perpetuate existing inequalities and
discrimination, especially for underrepresented groups like women and minorities. In
addition, biased AI systems can erode trust in the technology and its applications, which
can have negative consequences for the wider adoption and advancement of AI.

● Thesis Statement:

Gender bias in AI is a significant problem that undermines the fairness and accuracy of
AI systems, and addressing this issue requires a multifaceted approach.

● How it occurs:

For the sake of this paper, the term bias is often used to refer to demographic disparities
in algorithmic systems that are objectionable for societal reasons.

Researchers found that if you ask an AI system man is to computer programmer as


women is to what? The same AI system would output the answer, woman is to
homemaker. How does the system output this?

AI systems store words is using a set of numbers. So, let's say the word man is stored
as the two numbers (1,1). AI system comes up with these numbers through statistics of
how the word man is used on the Internet. The specific process for how these numbers
are computed is quite complex, but these numbers represent the typical usage of these
words. In practice, an AI might have hundreds or thousands of numbers to store a word.
So, the word man, is plotted at the position 1,1 (refer to the figure below). By looking at
the statistics of how the words or how the phrase computer programmer is used on the
Internet, the AI will have a different pair of numbers, say (3,2), to store or to represent
the phrase computer programmer. Similarly, by looking at how the word woman is used,
it'll come up with a different pair of numbers, say (2,3), to store or to represent the word
woman. When you ask the AI system to compute the analogy man is to computer
programmer, as women is to what? Then what the AI system will do, is construct a
parallelogram. It will find the word associated with the position (4,4), Because it will think
that is the answer to this analogy. Another way one can think about it is that you start
from the word man, go two steps to the right, and one step up. So, to find the same
answer for women is to what, you would also go two steps to the right, and one step up.
When these numbers are derived from texts on the Internet, and the AI system finds
that the way the word homemaker is used on the internet causes it to be placed to the
position (4,4), which is why the AI system comes up with this bias analogy.

● Examples of gender bias in AI:

AI systems are already making important decisions today, and will continue to do so in
the future as well. So, bias matters. Let us discuss some examples of general as well as
gender bias in AI.

- There's a company that was using AI for hiring, and found that their hiring too
discriminated against women. This is clearly unfair, and so this company shut
down their tool.
- Second, there're also some facial recognition systems that seem to work more
accurately for light-skinned and dark-skinned individuals. If an AI system is
trained primarily on data of lighter skin individuals, then it will be more accurate
for that category of individuals to the extent that these systems are used in, for
example, criminal investigations, this can create a very biased and unfair effect
for dark-skinned individuals. So, many face recognition teams today are working
hard to ensure that the systems do not exhibit this type of bias.
- In two photos depicting both women and men in swimming costumes, Microsoft’s
tool classified the picture showing two women as racy and gave it a 96% score.
The picture with the men was classified as non-racy with a score of 14%. In
addition to this, the photo of the women got eight views within one hour, and the
picture with the two men received 655 views, suggesting the photo of the women
in their costumes was either suppressed or shadowbanned
- There have also been AI or statistical loan approval systems that wound up
discriminating against some minority ethnic groups, and quoted them a higher
interest rate. Banks have also been working to make sure to diminish or eliminate
this type of bias in their approval systems.

● FACTORS CONTIBUTING TO BIAS & NON-TECHNICAL SOLUTIONS:

1. Lack of diversity in AI development teams:

As artificial intelligence is quickly getting more robust and gaining capabilities, people
are looking more closely at why it’s important to have diverse representation — both in
the data that is fed into these algorithms, and in the teams of people who work on them.

The AI field, which is overwhelmingly white and male, is at risk of replicating or


perpetuating historical biases and power imbalances. Examples include image
recognition services making offensive classifications of minorities, chatbots adopting
hate speech, and Amazon technology failing to recognize users with darker skin colors.
The biases of systems built by the AI industry can be largely attributed to the lack of
diversity within the field itself- with such a biased workforce, how can we expect our AI
to fare any better?

A report by the AI Now Institute of New York University (profiled in Forbes by Maria
Klawe) found that 80% of AI professors are men and only 15% of Facebook
researchers, and 10% of Google researchers, are women. The same research found
that less than 25% of PhDs in 2018 were awarded to women or minorities..
Even if they have good intentions of serving everyone, their innate biases drive them to
design toward what they are most familiar with. As we write algorithms, our biases
inherently show up in the decisions we make about how to design the algorithm or what
and how data sets are used, and then these biases can get reified in the technology that
we produce.

Here are some ways that diverse teams can help address gender bias in AI:

1. Diverse teams bring diverse perspectives: When AI development teams include


individuals from different backgrounds and experiences, they can bring a range
of perspectives to the table. This diversity can help identify and address potential
biases that might be overlooked by a homogeneous team.

2. Diverse teams can identify and correct biased datasets: Datasets that are used
to train AI algorithms can be biased, and this bias can perpetuate in the AI
systems. Diverse teams can help identify these biases and work to correct them,
ensuring that the AI algorithms are more inclusive and equitable.

3. Diverse teams can create AI systems that reflect diverse user needs: AI systems
are designed to serve humans, and if the development team is not diverse, there
is a risk that the AI system will not meet the needs of all users. By having a
diverse team, AI systems can be designed to reflect the needs of all users,
regardless of gender, race, ethnicity, or other factors.

4. Diverse teams can challenge assumptions: Biases can be perpetuated by


assumptions that are not necessarily grounded in fact. A diverse team can
challenge assumptions and ensure that the AI system is designed based on data
and evidence, rather than assumptions that might be biased.

It is important to ensure that all individuals, regardless of gender or other factors, have
equal access to AI systems and are not negatively impacted by any biases in these
systems

B. Biases in data sets used to train AI systems:

Biases in data sets can arise due to a variety of factors, including historical patterns of
discrimination, unequal representation in the data, and algorithmic feedback loops. For
example, if a data set used to train a language model contains biased language or
historical stereotypes, the model may perpetuate those biases in its outputs.

Addressing biases in data sets requires careful examination of the data to identify
patterns of unequal representation and feedback loops. For example, if a data set
contains historical data that reflects past discrimination against certain groups, the
algorithm may perpetuate those biases and exclude qualified candidates from
underrepresented groups. To address these biases, it may be necessary to collect more
diverse and representative data or use algorithms to correct for biases in the data.

Incorporating ethical considerations into algorithm design and involving diverse


stakeholders in the development and deployment of AI systems can also help to ensure
that the resulting algorithms are fair, accurate, and effective for everyone. It is important
to note that biases in data sets are not always intentional or malicious. They can also
arise due to the limitations of the data collection process or the complexity of the real
world. However, it is still important to address these biases to ensure that AI systems
are developed and deployed in ways that are fair, inclusive, and effective for all users.
Some steps that can be taken to ensure diverse data sets are included in AI training:

1. Collect diverse data: The first step is to collect diverse data sets that represent
different populations. Thoughtfully add more data from under-represented
classes and expose your models to varied data points by gathering data from
different data sources. The data should be representative of the population being
studied and should not exclude any particular group. Once data has been
collected, it's important to ensure that the data should not be skewed towards
any particular group, but rather reflect the actual distribution of the population
being studied.

2. Assess for bias: After the data has been collected and balanced, it's important to
assess the data for bias. This can be done by analyzing the data for patterns that
may indicate bias, such as over or under-representation of certain groups or the
presence of stereotypes. This step can be performed by a diverse team of
individuals to ensure that different perspectives are taken into account.

3. Address bias: If bias is identified in the data, steps should be taken to address it.
This can include removing biased data points, augmenting the data with
additional data from underrepresented groups, or using techniques like
adversarial training to mitigate the effects of bias.
4. Introducing regulations to build diversity and inclusivity in AI systems from the
grassroots level. Various governments have developed guidelines to ensure
diversity and mitigate AI bias that can deliver unfair outcomes.
5. Regularly update and evaluate the data: It's important to regularly update and
evaluate the data to ensure that it remains diverse and unbiased. This can be
done by monitoring the data sources and collecting new data from
underrepresented groups. By following these steps, developers can ensure that
the data sets used to train AI models are diverse and representative of the
population being studied

C. Lack of transparency in AI algorithms

The lack of transparency in AI algorithms is a significant concern that can have


far-reaching implications for accountability, trust, and ethical considerations. When AI
systems are not transparent, it can be difficult to understand how they are making
decisions or predictions, which can lead to mistrust, misunderstandings, and potential
harm.

There are several reasons for the lack of transparency in AI algorithms. One is the
complexity of the algorithms themselves, which can make it difficult to understand how
they are working. Another is the proprietary nature of some algorithms, which can make
it difficult to access the underlying code and data used to train the algorithm. To address
the lack of transparency in AI algorithms, it is important to prioritize openness and
explainability in algorithm design and deployment. This includes making sure that the
algorithms are designed to be interpretable, so that they can be understood and audited
by humans. Additionally, it may be necessary to develop standards and best practices
for transparency in AI, such as open-sourcing algorithms or providing clear
documentation and explanations for how they work.

Addressing the lack of transparency in AI algorithms is crucial for ensuring that these
systems are developed and deployed in ways that are ethical, accountable, and
trustworthy. By prioritizing transparency and explainability in AI development, we can
help to build a more equitable and just future for all.

To increase the transparency:


First, whenever feasible, data sets and/or models be shared at time of publication. Even
if the full model cannot be made public owing to intellectual property issues, developers
should at least provide application programming interface access to the model to
researchers using platforms such as gradio. If images are from public data sets or
repositories, the images that were used should be clearly delineated. Images scraped
from the internet via web search should be shared with their attached labels; these data
cannot be considered publicly available if it is impossible to identify which images were
used or how they were labeled.

Second, if data sets cannot be shared, there should be a clear description of important
data set characteristics.

Third, there should be a clear description of how data sets were used, that is, for
training, validation, testing, or additional external validation. Running models on external
data sets is an important step for demonstrating algorithmic robustness. Previous
studies have demonstrated that significant performance drops can occur when
algorithms trained exclusively at a single site are applied to an external site.

● Importance of ethical considerations in AI development:

Ethics in AI refers to the principles and values that guide the development and use of
artificial intelligence systems. It involves considering the potential social, cultural, and
ethical implications of AI and ensuring that these systems are developed and used in a
responsible and accountable manner

Numerous researchers are taking the initiative to develop AI that would follow ethical
standards. Some ethical frameworks can minimize AI risks and ensure a safe, fair, and
human-centered AI. Thus, Ethical considerations are crucial in AI development for a
variety of reasons, they including avoiding harm if development takes place without
these considerations and building trust- AI systems can only be successful if people
trust them. Ethical considerations can help build trust by ensuring that AI systems are
developed in a way that aligns with people's values and expectations.
● Technical solutions:

As mentioned in the previous sections, a current approach includes collecting more data
which can be expensive and time-consuming. Thus, contemporary research has found
that most industry leaders are not taking this approach. A different approach is required
since the ultimate goal is to create an algorithm that can be used by financial institutions
and the government.

Machine learning consists of algorithms that are exposed to training data which then
improve their abilities through experience. Unfortunately, biases are often associated
with this process. As machine learning is trained on data, bias present in any given data
is paralleled in the machine learning algorithm.

For the general process of machine learning, data are first inputted into the neural
network. Processing then takes place in the hidden layers through connections.
Patterns are found and weights are assigned to each pattern, depending on how
important that pattern seems. Finally, the hidden layers link to the output layer, where
the outputs are retrieved

Natural Language Processing, or NLP, is a subfield of linguistics, computer science, and


artificial intelligence that discovers interactions between computers and human
language. It focuses on computer processing and analyzing large amounts of natural
language data. NLP uses word vectors, or geometrical representations of words, to
calculate word similarities.

I will be stating 2 approaches that could be taken, to combat the issue of Bias in AI.
Method 1: Exposing and Correcting Gender Bias in image captioning datasets and
models.

Image captioning refers to the process of generating a textual description of an image


using advanced computer vision techniques and natural language processing (NLP)
algorithms. It involves analyzing the visual content of an image and extracting relevant
features such as objects, scenes, and their relationships, and then generating a
human-like sentence that describes the image in a meaningful and accurate way.

The task of image captioning implicitly involves gender identification. However, due to
the gender bias in data, gender identification by an image captioning model suffers.
Also, the gender-activity bias, owing to the word-by-word prediction, influences other
words in the caption prediction, resulting in the well-known problem of label bias.

A technique was proposed by Shruti Bhargava to get rid of the bias.This approach
involved modifying the captions in existing image datasets to balance gender
representation, by swapping gendered terms and evaluating the resulting impact on the
performance of image captioning models. Specifically, they used a "gender-swapping"
technique to replace gender-specific words such as "he" or "she" with their counterparts
in the opposite gender. After modifying the dataset, they trained image captioning
models on the balanced dataset and evaluated their performance in generating
gender-neutral captions. They found that this gender-balancing approach resulted in
more accurate and fair representations of gender in the image captioning models,
compared to models trained on biased datasets.

A more detailed analysis of the method is as follows:

A technique to reduce these issues by splitting the task into two: genderneutral image
captioning and gender classification, to reduce the context-gender coupling. We trained
a gender-neutral image captioning model by substituting all gendered words in the
captions with gender-neutral counterparts. The model trained this way does not exhibit
the language model based bias arising from gender and gives good quality captions.

They trained gender classifiers using the available bounding box and mask-based
annotations for the person in the image, which allowed them to get rid of the context
and focus on the person to predict the gender. By substituting the genders into the
gender-neutral captions, they got the final gendered predictions. Their predictions
achieved similar performance to a model trained with gender, and at the same time are
devoid of gender bias. Their main result was on an anti-stereotypical dataset, our model
outperforms a popular image captioning model which is trained with gender.

This model gives comparable results to a gendered model even when evaluating
against a dataset that possesses similar bias. Then, for injecting gender into the
captions, gender classifiers were trained using cropped portions that contain only the
person, that allowed them to get rid of the context and focus on the person to predict the
gender. They trained bounding box based and body mask based classifiers, giving a
much higher accuracy in gender prediction than an image captioning model implicitly
attempting to classify the gender from the full image. On substituting the genders into
the genderneutral captions, the final gendered captions were obtained. The final
predictions of this model achieved similar performance on the test data as a model
trained with gender, but at the same time did not possess any gender bias.

This overall technique significantly outperforms the gender trained model on an


anti-stereotypical dataset, demonstrating that removing bias helps to a large extent in
such scenarios

● METHOD 2: The GAN Approach:

The GAN approach:

Generative Adversarial Networks, or GANs, is a specific type of machine learning


(neural network architecture) that consists of two sub-networks: a generator and a
discriminator that compete with each other in a zero-sum game. The generator network
is responsible for generating new data samples, while the discriminator network is
responsible for distinguishing between the generated samples and real data samples.

The generator network takes in a random input, such as noise, and generates new data
samples that are similar to the real data. The discriminator network then tries to
distinguish between the generated data samples and the real data samples. The two
networks are trained together in an adversarial manner, with the generator network
trying to generate more realistic data samples to fool the discriminator network, and the
discriminator network trying to accurately distinguish between the generated and real
data.
The training process continues until the generator network is able to generate data
samples that are indistinguishable from the real data by the discriminator network.
(until the discriminator is not able to distinguish between the real and the generated
data)

GAN’s have been used in attempts to combat gender bias through detecting and
quantifying gender bias in existing datasets or machine learning models by generating
synthetic data that is similar to the real data, but with a known gender balance,
researchers can compare the performance of machine learning models on the real data
vs the synthetic data to determine if there is a gender bias present and generating
counterfactual examples, for example, they can be used to generate images of women
in traditionally male-dominated occupations, or men in traditionally female-dominated
occupations, in order to challenge gender stereotypes and biases in the dataset. These
counterfactual examples can be used to help train machine learning models that are
more robust to gender bias.

Drawing inspiration from Zhang el al.- who used adversarial learning, more specifically
logistic regression (a statistical model that determines if a variable has an effect on the
output), to eliminate the bias in results produced by data including gender disparities, he
had also aimed to reduce the bias in word embeddings, Beutel et al- who used
adversarial learning to remove certain sensitive attributes in order to not expose the
machine learning algorithm to these attributes and Tonk who used adversarial learning
to attempt to debias a linear binary classifier’s ability to predict whether a person earned
a salary of more than $50,000 without basing those predictions on biased assessments
such as race and gender, a machine learning multi-class classifier was tested by
Isabelle Mandis that could be used to expand the scope of debiasing and create a GAN
algorithm that further decreases biases.

The proposed GAN algorithm was changed from three tiers to a two-layer neural
network with many of the parameters altered. It was subsequently then applied to an
NLP word vector association task in an attempt to debias word associations so as to
demonstrate the generalizability of the debiasing GAN algorithm to a different dataset
and underlying algorithm.

A multi-class classifier was used as a baseline metric to compare to the algorithm’s


performance. In addition, the GAN algorithm is used in conjunction with the classifier so
as to eliminate the bias in its results.
This GAN algorithm was expected to be able to decrease the biased results of both
machine learning tasks as it takes advantage of the zero-sum-game nature of
adversarial learning in order to improve the classification accuracy and word vector
associations while minimizing the ability to predict certain attributes such as race and
gender. When applied, this algorithm was meant to ensure that companies were not
able to detect one’s race or gender when determining whether or not they hire a
candidate.

The layers of the discriminant used included a fully connected layer with input size 784
and output size 256, Leaky ReLU activation function, a fully connected layer with input
size 256 and output size 256, Leaky ReLU activation function, and a fully connected
layer with input size 256 and output size 2 and the layers of the generator network
included a fully connected layer with input size 1024, Leaky ReLU activation function, a
fully connected layer with input size 1024, Leaky ReLU activation function, a fully
connected layer with input size 784, and a hyperbolic tangent activation function to
restrict all classification outputs for the sensitive attributes to a range of [-1, 1].

It was found, that after the GAN algorithm is applied to the classifier, the results were
less biased- Initially, the p-% (which is a measure of performance of a classifier that
evaluates how well it works for a specific group of people) for minority races compared
to White were as follows: 28% for Black, 63% for Asians, and 25% for Native American.
The p-% for gender was 30%.And after applying the algorithm, The p-% was above the
80% threshold.

Thus, In the future, this GAN algorithm can be used with resume data in order to
minimize the influence of sensitive attributes, such as race or gender, on who is hired.
● Conclusion:

While the concept of discrimination is mostly understood one fiercely debated issue is
the concept of “fairness” in AI. This is likely to become a battleground between the
business world and policy makers, unless we be mindful and implement some of the
above mentioned changes.

To fight this battle, many AI teams are subjecting their systems to better transparency
and or auditing processes, so that we can constantly check what types of bias, if any,
these AI systems are exhibiting, so that we can at least recognize the problem if it
exists, and then take steps to address it. By having more unique points of view as you're
building AI systems, I think there's a hope all of us create less biased applications. AI
systems are making really important decisions today, and so the bias or potential for
bias is something we must pay attention to and work to diminish.

We can start thinking about how we create more inclusive code and employ inclusive
coding practices. It really starts with people. So who codes matters. Are we creating
full-spectrum teams with diverse individuals who can check each other's blind spots?
On the technical side, how we code matters. Are we factoring in fairness as we're
developing systems? And finally, why we code matters. We've used tools of
computational creation to unlock immense wealth. We now have the opportunity to
unlock even greater equality if we make social change a priority and not an afterthought.

More technically advanced solutions can by deployed including an approach that takes
inspiration from the classical GAN architecture -two neural networks are created along
with an adversarial network, and modifying the captions in existing image datasets to
balance gender representation, by swapping gendered terms and evaluating the
resulting impact on the performance of image captioning models

One thing that makes me optimistic about the topic is that we actually have better ideas
today for reducing bias in AI than reducing bias in humans. So, while we should never
be satisfied until all AI bias is gone, and it will take us quite a bit of work to get there, I'm
also optimistic if we could take AI systems that started off with a level similar to humans,
because it learned from humans, and we can cut down the bias from there through
technical solutions or other means, so that as a society, we can hopefully make the
decisions we're making through humans or through AI rapidly become more fair and
less biased.
Sources:

https://www.techopedia.com/why-diversity-is-essential-for-quality-data-to-train-ai/2/3420
9

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight-idUSKCN1MK0
8G

https://www.theguardian.com/technology/2019/apr/16/artificial-intelligence-lack-diversity
-new-york-university-study

https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-pract
ices-and-policies-to-reduce-consumer-harms/

https://www.theguardian.com/technology/2023/feb/08/biased-ai-algorithms-racy-women-
bodies

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9379852/

https://www.duo.uio.no/bitstream/handle/10852/88551/1/All_Chapters_Master-s_thesis-
Gender_Bias_in_AI-ver--9.pdf

https://hbr.org/2020/10/to-build-less-biased-ai-hire-a-more-diverse-team

https://www.shaip.com/blog/diverse-ai-training-data-for-inclusivity-and-eliminating-bias/

https://www.xenonstack.com/blog/ethics-artificial-intelligence#:~:text=It%20provides%20
equitable%20access%20and,governance%20and%20model%20management%20syste
ms.

https://www.wizata.com/knowledge-base/the-importance-of-ethics-in-ai

https://www.reworked.co/information-management/why-ethical-ai-wont-catch-on-anytim
e-soon/

https://news.harvard.edu/gazette/story/2020/10/ethical-concerns-mount-as-ai-takes-bigg
er-decision-making-role/

https://news.mit.edu/2018/study-finds-gender-skin-type-bias-artificial-intelligence-syste
ms-0212
https://www.npr.org/transcripts/929204946

https://www.coursera.org/learn/ai-for-everyone

https://www.youtube.com/watch?v=UG_X_7g63rY&list=PL0E_T5EdqhEfkDfuWkCe5Tjw
Dfo6pr7Zt&index=6

https://www.youtube.com/watch?v=K32AAo6HuaU&t=7s

https://www.techopedia.com/why-diversity-is-essential-for-quality-data-to-train-ai/2/3420
9

https://terra-docs.s3.us-east-2.amazonaws.com/IJHSR/Articles/volume3-issue6/2021_3
6_p17_Mandis.pdf

https://www.internationalwomensday.com/Missions/14458/Gender-and-AI-Addressing-bi
as-in-artificial-intelligence

https://ssir.org/articles/entry/when_good_algorithms_go_sexist_why_and_how_to_adva
nce_ai_gender_equity

https://hbr.org/2019/11/4-ways-to-address-gender-bias-in-ai

https://aiforgood.itu.int/how-can-we-solve-the-problems-of-gender-bias-in-ai-experts-wei
gh-in/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10023594/

https://arxiv.org/pdf/1912.00578.pdf

You might also like