You are on page 1of 13

ENGLISH TO LUGANDA AI PROJECT

Authors:
1. KISEJJERE RASHID

Presented by -
KISEJJERE RASHID
MAKERERE UNIVERSITY
UGANDA
INTRODUCTION

❑ Machine translation is a field that was and is still in


the research and so far there are multiple machine
translation Approaches that researchers have come
up with. These machine translation techniques
mainly include; Rule-Based Machine Translation
(RBMT), Statistical Machine Learning (SMT), and
Neural Machine translation (NMT).
❑ The most recent and much more accurate one is the
Neural Machine Translation.

2
LITERATURE REVIEW

❑ Translation is a crucial aspect of communication for


individuals who speak different languages. With the
advent of Artificial Intelligence (AI), translation has
become more efficient and accurate, making it
possible to communicate with individuals in other
languages in real-time. There are basically two major
learning techniques that can be used ;
❑ Supervised learning is a type of machine learning
where the model is trained on a labeled dataset and
makes predictions based on the input data and the
labeled output.
❑ Unsupervised learning is a type of machine learning
where the model is not trained on labeled data but
instead learns from the input data.
3
RESEARCH GAPS

❑ Limited Training Data: The quality of AI-powered


translations is heavily dependent on the amount
and quality of training data used to train the model.
Further research is needed to explore methods for
obtaining high-quality training data.
❑ Lack of Cultural Sensitivity: AI-powered translation
systems can produce translations that are
grammatically correct but lack the cultural
sensitivity of human translations.
❑ Vulnerability to Errors of the machine learning
system. AI can only understand what it has been
trained on.

4
METHODOLOGY

❑ The proposed AI approach is to develop a neural


machine translation (NMT) model. The NMT model
will be trained on the English and Luganda parallel
corpus dataset, and will use this data to learn the
relationship between the two languages.The AI
process can be summarized as follows:
❑ Data Collection: Collect a large corpus of parallel
text data in English and Luganda.
❑ Pre-processing: Pre-process the data to remove
irrelevant information and standardize the text.
❑ Model Selection: Choose the neural machine
translation model that is best suited for the
problem.

5
Contn

❑ Model Training: Train the NMT model on the pr-


processed data.
❑ Model Evaluation: Evaluate the trained model on a
held-out set of data to determine its performance.
❑ Deployment: Deploy the trained model for use in a
real-world setting.
❑ Continuous Improvement: Continuously evaluate
the performance of the model and make
improvements as needed.

6
DATA PREPARATION

❑ Data preparation refers to the steps taken to


prepare raw data into improved data which can be
used to train a machine learning model. The data
preparation process for my model was as follows;
❑ Removal of any punctuation plus any unnecessary
spaces this is necessary to prevent the model from
training on a large amount of unnecessary data.
❑ Converting the case of words in the dataset to
lowercase.
❑ Vectorization of the dataset. Vectorization is
referred to as the process of converting a given text
into numerical indices.

7
Heat Map

❑ Drawing a heat map for a text dataset is almost


impossible because of its it isn’t numeric. But its major
aim is to show the how the data is collerated with each
other and values with the higjly collerated are coloured
brightly. For the case of text data, this phonomenon
can be represented with a word cloud. Below is a word
cloud on my image English dataset.

8
RESULTS

❑ The training accuracy is of 92% and the validation


accuracy is of 52%
❑ Its clear that the model is over fitting the dataset but
it’s accuracy is still fairly good.

9
Attention Plot (Contn)

10
REFERENCES

❑ M. Singh, R. Kumar and I. Chana, "Neural-Based Machine Translation System Outperforming Statistical
Phrase-Based Machine Translation for Low-Resource Languages," 2019 Twelfth International Conference on
Contemporary Computing (IC3), 2019, pp. 1-7, doi: 10.1109/IC3.2019.8844915.
❑ V. Bakarola and J. Nasriwala, "Attention based Neural Machine Translation with Sequence to Sequence
Learning on Low Resourced Indic Languages," 2021 2nd International Conference on Advances in
Computing, Communication, Embedded and Secure Systems (ACCESS), 2021, pp. 178-182, doi:
10.1109/ACCESS51619.2021.9563317.
❑ McCombes, S. (2022) How to Write a Strong Hypothesis | Steps & Examples, Scribbr. Available at:
https://www.scribbr.com/methodology/hypothesis/#:~:text=about%20writing%20hypotheses-,What%20is
%20a%20hypothesis%3F,aspects%20of%20your%20research%20question. (Accessed: 17 November 2022).
❑ Academy, E. (2022) How to Write a Research Hypothesis| Enago Academy, Enago Academy. Available at:
https://www.enago.com/academy/how-to-develop-a-good-research-hypothesis/ (Accessed: 17 November
2022).
❑ What is project scope? (2022). Available at: https://www.techtarget.com/searchcio/definition/project-scope (Accessed: 17 November
2022).
❑ Machine translation - Wikipedia (2022). Available at: https://en.wikipedia.org/wiki/Machine_translation (Accessed: 17 November 2022).
❑ Cristina, S. (2022) The Attention Mechanism from Scratch - MachineLearningMastery.com, MachineLearningMastery.com. Available at:
https://machinelearningmastery.com/the-attention-mechanism-from-scratch/#:~:text=The%20idea%20behind%20the%20attention,being
%20attributed%20the%20highest%20weights. (Accessed: 17 November 2022).
❑ K. Chen et al., "A Neural Approach to Source Dependence Based Context Model for Statistical Machine
Translation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 266-
280, Feb. 2018, doi: 10.1109/TASLP.2017.2772846.

11
References continuation

❑ How to Build Accountability into Your AI. (2021).


Retrieved 24 November 2022, from
https://hbr.org/2021/08/how-to-build-accountability-i
nto-your-ai
❑ Mukiibi, J., Hussein, A., Meyer, J., Katumba, A., and
Nakatumba-Nabende, J. (2022). The Makerere
RadioSpeech Corpus: A Luganda Radio Corpus for
Automatic Speech Recognition. Retrieved 24
November 2022, from
https://zenodo.org/record/5855017

12
ACKNOWLEDGMENTS

❑ Special Thanks to Mr.Galiwango Marvin and


Dr.Rose Nakibuule for their determined and never
ending guidance towards my research and
implementations of this project.

13

You might also like