ML Project

College of Natural and Computational Sciences School of Information Science
ADDIS ABABA UNIVERSITY SCHOOL OF INFORMATION SCIENCE

DEPARTMENT OF INFORMATION SCIENCE MBA PROGRAM
Machine Learning Project: Measuring 15 years of effort fighting gender bias in

vocational training in case of Addis Ketema Industrial College district and the
need for data bias mitigation for further ML processing
Submitted to Michael M. (PhD)
Prepared by
Name ID No
• Eyob Negussie GSE/0249/14
Dec. 30, 2022 G.C
1
(I) Introduction
The gender discrimination problem started from day one when society formed professional
offices, factories, businesses, institutions, and other organizations. Despite strict regulations and laws,
gender-based discrimination can be seen in almost all working places.
These historical biases slip into the automated systems through the data. ML/AI systems trained
on these data are able to pick the implicit biases exercised over the years, which we might not detect at
first glance.
The literature survey I reviewed has seen the prevalence of the matter, especially, in widely
used language translation systems. ML models, for example, automatically assign pronouns to
profession confirming to the gender stereotypes, like she/her pronouns to nurses and flight attendants
whereas it assigns he/him pronouns to professions like doctors and pilots (Cho et al., 2021).
Such social biases, are becoming a hot topic recently to the point the industry leaders like IBM
has a dedicated code repository, that encourages ML developers to learn and normalize the use of bias
detection and mitigation methods within their models. Still more effort and work is needed to prevent
gender biases and make applications more inclusive and fairer across all users.
One other study from Lambrecht and Tucker focused on biases on AI solution for job allotment
in what they called tailored job listings. The system can get biased by presenting different job
opportunities to different applicants based on applicants' gender.
An interesting insight was given into the implicit nature of the bias. job applicants are not only
affected by but also affirm to the gender stereotypes, i.e., men tend to apply for more technical and
ambitious jobs as compared to women (Tang et al., 2017) applicants preserves this by selecting from
these jobs instead of searching for non-stereotypic listing. As Donnelly and Stapleton stated a gender-
biased system reinforces gender bias.
This perpetual cycle of bias have to be addressed not only from the technological side but also
from social efforts by government. “Howcroft and Rubery discuss the effects of gender bias in the labor
markets in disrupting social order and point out the need to tackle these biases from outside-in (fixing
the issue in the society before fixing the algorithm.)”
The human cognitive bias in machine learning data pipeline

There are basically two algorithmic biases that needed to be addressed for AI systems:
Pre-existing biases: these stem from biased social norms and practices. These biases get introduced into
the ML systems through data.
2
Emergent biases: these are biases caused when ML system trained for a specific purpose is implemented
for a different goal.
Data bias mitigation

By understanding the type and characteristics of the dataset, ML models appropriate application
can be gauged, and algorithm designers could use this information to proactively implement bias
mitigation methods. Understanding the context of model implementation and the data used for model
training to prevent pre-existing/data biases (Courtland, 2018; Baeza-Yates, 2020)
The Addis Ketema TVET project

I focused on the young generations as they are the reflection of our society especially those enrolling in
the vocational institutes. They can show how much we made progress in the gender gap seen in many
parts of the industry.
I found Collab environment a decent interface to execute python code on the browser without any
setup on the pc, so I adopted it for this project.
Benchmarking and Interpretation

I have used the parametric and the nonparametric modeling approach to have an inclusive insight from
both domains of demonstrating the data. First let me present the loss and accuracy both for the baseline
model which is logistic regression and for the presumably more predictive model: decision tree.
The validation error is a good metrics of how well the model is representative of the figures at those
years.
Models
Measures Logistic regression Decision tree
1999-2003 2004-2008 2009-2014 1999-2003 2004-2008 2009-2014
Log Loss/entropy 0.67 0.64 0.61 0.79 0.632 0.54
Accuracy 0.61 0.66 0.74 0.68 0.7 0.76
The data shows that the model performed fairly in predicting the gender norms in the different sectors,
for the most part; however log loss and entropy is indicating that the data is not skewed enough to show
gender bias. Loss will be sum of the difference between predicted probability of the real class and the
class label.
The table also has an indication of how the gap in gender has gotten narrower and narrower ever so
slightly.
3
As a baseline model I have used Logistic Regression to see the probability of the gender bias. And on the
prediction I took the accuracy metrics to see if the sectors are male dominated or not.
The decreasing accuracy direct implication of

Female particiation throughout the sectors
70
60
50
40 The decreasing accuracy

direct implication of Female
30 particiation throughout the
sectors
20
10
0
1999-2004 2005-2009 2010-2014
CONCLUSION
As the graph above presents, starting from the year 1999 E.C there haven’t been a concerning societal
gender biased stigma across the vocational sectors as for Addis Ketema TVET. Only the field of vehicle
service has a very one-sided figure. And this is understandable considering the fact that our country has
almost abandoned the area of automotive technological advancements.
That being said the graph is predicting more female participation in almost all area of vocational
training. So I conclude the data from Addis Ketema TVET is in satisfactory quality to be used for further
ML processing concerning gender based implications.
4
Reference
Nuseir, M.T., Al Kurdi, B.H., Alshurideh, M.T., Alzoubi, H.M. (2021). Gender Discrimination at Workplace:
Do Artificial Intelligence (AI) and Machine Learning (ML) Have Opinions About It. In: , et al. Proceedings
of the International Conference on Artificial Intelligence and Computer Vision (AICV2021). AICV 2021.
Advances in Intelligent Systems and Computing, vol 1377. Springer, Cham. https://doi.org/10.1007/978-
3-030-76346-6_28
Cho, W. I., Kim, J., Yang, J., and Kim, N. S. (2021). “Towards cross-lingual generalization of translation
gender bias,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and
Transparency (New York, NY), 449–457. doi: 10.1145/3442188.3445907

ML Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML Project

Uploaded by

Copyright:

Available Formats

College of Natural and Computational Sciences School of Information Science

ADDIS ABABA UNIVERSITY SCHOOL OF INFORMATION SCIENCE

Machine Learning Project: Measuring 15 years of effort fighting gender bias in

Submitted to Michael M. (PhD)

• Eyob Negussie GSE/0249/14

Dec. 30, 2022 G.C

The human cognitive bias in machine learning data pipeline

Data bias mitigation

The Addis Ketema TVET project

Benchmarking and Interpretation

The decreasing accuracy direct implication of

40 The decreasing accuracy

You might also like