English To Luganda Translation

ENGLISH TO LUGANDA AI PROJECT
Authors:
1. KISEJJERE RASHID
Presented by -
KISEJJERE RASHID
MAKERERE UNIVERSITY
UGANDA
INTRODUCTION
❑ Machine translation is a field that was and is still in

the research and so far there are multiple machine
translation Approaches that researchers have come
up with. These machine translation techniques
mainly include; Rule-Based Machine Translation
(RBMT), Statistical Machine Learning (SMT), and
Neural Machine translation (NMT).
❑ The most recent and much more accurate one is the
Neural Machine Translation.
2
LITERATURE REVIEW
❑ Translation is a crucial aspect of communication for

individuals who speak different languages. With the
advent of Artificial Intelligence (AI), translation has
become more efficient and accurate, making it
possible to communicate with individuals in other
languages in real-time. There are basically two major
learning techniques that can be used ;
❑ Supervised learning is a type of machine learning
where the model is trained on a labeled dataset and
makes predictions based on the input data and the
labeled output.
❑ Unsupervised learning is a type of machine learning
where the model is not trained on labeled data but
instead learns from the input data.
3
RESEARCH GAPS
❑ Limited Training Data: The quality of AI-powered

translations is heavily dependent on the amount
and quality of training data used to train the model.
Further research is needed to explore methods for
obtaining high-quality training data.
❑ Lack of Cultural Sensitivity: AI-powered translation
systems can produce translations that are
grammatically correct but lack the cultural
sensitivity of human translations.
❑ Vulnerability to Errors of the machine learning
system. AI can only understand what it has been
trained on.
4
METHODOLOGY
❑ The proposed AI approach is to develop a neural

machine translation (NMT) model. The NMT model
will be trained on the English and Luganda parallel
corpus dataset, and will use this data to learn the
relationship between the two languages.The AI
process can be summarized as follows:
❑ Data Collection: Collect a large corpus of parallel
text data in English and Luganda.
❑ Pre-processing: Pre-process the data to remove
irrelevant information and standardize the text.
❑ Model Selection: Choose the neural machine
translation model that is best suited for the
problem.
5
Contn
❑ Model Training: Train the NMT model on the pr-

processed data.
❑ Model Evaluation: Evaluate the trained model on a
held-out set of data to determine its performance.
❑ Deployment: Deploy the trained model for use in a
real-world setting.
❑ Continuous Improvement: Continuously evaluate
the performance of the model and make
improvements as needed.
6
DATA PREPARATION
❑ Data preparation refers to the steps taken to

prepare raw data into improved data which can be
used to train a machine learning model. The data
preparation process for my model was as follows;
❑ Removal of any punctuation plus any unnecessary
spaces this is necessary to prevent the model from
training on a large amount of unnecessary data.
❑ Converting the case of words in the dataset to
lowercase.
❑ Vectorization of the dataset. Vectorization is
referred to as the process of converting a given text
into numerical indices.
7
Heat Map
❑ Drawing a heat map for a text dataset is almost

impossible because of its it isn’t numeric. But its major
aim is to show the how the data is collerated with each
other and values with the higjly collerated are coloured
brightly. For the case of text data, this phonomenon
can be represented with a word cloud. Below is a word
cloud on my image English dataset.
8
RESULTS
❑ The training accuracy is of 92% and the validation

accuracy is of 52%
❑ Its clear that the model is over fitting the dataset but
it’s accuracy is still fairly good.
9
Attention Plot (Contn)
10
REFERENCES
❑ M. Singh, R. Kumar and I. Chana, "Neural-Based Machine Translation System Outperforming Statistical
Phrase-Based Machine Translation for Low-Resource Languages," 2019 Twelfth International Conference on
Contemporary Computing (IC3), 2019, pp. 1-7, doi: 10.1109/IC3.2019.8844915.
❑ V. Bakarola and J. Nasriwala, "Attention based Neural Machine Translation with Sequence to Sequence
Learning on Low Resourced Indic Languages," 2021 2nd International Conference on Advances in
Computing, Communication, Embedded and Secure Systems (ACCESS), 2021, pp. 178-182, doi:
10.1109/ACCESS51619.2021.9563317.
❑ McCombes, S. (2022) How to Write a Strong Hypothesis | Steps & Examples, Scribbr. Available at:
https://www.scribbr.com/methodology/hypothesis/#:~:text=about%20writing%20hypotheses-,What%20is
%20a%20hypothesis%3F,aspects%20of%20your%20research%20question. (Accessed: 17 November 2022).
❑ Academy, E. (2022) How to Write a Research Hypothesis| Enago Academy, Enago Academy. Available at:
https://www.enago.com/academy/how-to-develop-a-good-research-hypothesis/ (Accessed: 17 November
2022).
❑ What is project scope? (2022). Available at: https://www.techtarget.com/searchcio/definition/project-scope (Accessed: 17 November
2022).
❑ Machine translation - Wikipedia (2022). Available at: https://en.wikipedia.org/wiki/Machine_translation (Accessed: 17 November 2022).
❑ Cristina, S. (2022) The Attention Mechanism from Scratch - MachineLearningMastery.com, MachineLearningMastery.com. Available at:
https://machinelearningmastery.com/the-attention-mechanism-from-scratch/#:~:text=The%20idea%20behind%20the%20attention,being
%20attributed%20the%20highest%20weights. (Accessed: 17 November 2022).
❑ K. Chen et al., "A Neural Approach to Source Dependence Based Context Model for Statistical Machine
Translation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 2, pp. 266-
280, Feb. 2018, doi: 10.1109/TASLP.2017.2772846.
11
References continuation
❑ How to Build Accountability into Your AI. (2021).

Retrieved 24 November 2022, from
https://hbr.org/2021/08/how-to-build-accountability-i
nto-your-ai
❑ Mukiibi, J., Hussein, A., Meyer, J., Katumba, A., and
Nakatumba-Nabende, J. (2022). The Makerere
RadioSpeech Corpus: A Luganda Radio Corpus for
Automatic Speech Recognition. Retrieved 24
November 2022, from
https://zenodo.org/record/5855017
12
ACKNOWLEDGMENTS
❑ Special Thanks to Mr.Galiwango Marvin and

Dr.Rose Nakibuule for their determined and never
ending guidance towards my research and
implementations of this project.
13

English To Luganda Translation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

English To Luganda Translation

Uploaded by

Copyright:

Available Formats

ENGLISH TO LUGANDA AI PROJECT

❑ Machine translation is a field that was and is still in

❑ Translation is a crucial aspect of communication for

❑ Limited Training Data: The quality of AI-powered

❑ The proposed AI approach is to develop a neural

❑ Model Training: Train the NMT model on the pr-

❑ Data preparation refers to the steps taken to

❑ Drawing a heat map for a text dataset is almost

❑ The training accuracy is of 92% and the validation

❑ How to Build Accountability into Your AI. (2021).

❑ Special Thanks to Mr.Galiwango Marvin and

You might also like