You are on page 1of 4

NATIONAL UNIVERSITY OF LESOTHO

Department of Mathematics and Computer Science


F: (int) 266 340601 P.O. Roma 180
Fax: (int) 266 340000 LESOTHO
Telex: 4303 LO UNITER Southern Africa

YEAR IV PROJECTS ACADEMIC YEAR 2020/21


B.Sc. Computer Science, B.Sc. Information Systems

Supervisor: mofana mphaka

1. Sesotho Digital Dictionary (Category: Computer Science)


Motivation:-

The written Sesotho language (of Lesotho) uses an orthography that was introduced by the early
French Missionaries. Consequently, the orthography had many (French) accents to aid with
different pronunciations and contexts. However, the use of accents has all but disappeared with
devastating consequences on ambiguity of words in terms of spelling and contexts. On top of that
there are now new words in Sesotho whose pronunciations require syllables which have not yet
been formalised by the language gurus. For example, words like “café” (some write in Sesotho,
“k’hefi”, others write “kh’fi”), a shack (“mok’huk’hu” or “mokh’ukh’u”), etc.

The purpose of this project is to assume that a “stable” orthography exists and then to find digital
methods of storing Sesotho words for fast access and minimal storage. This is possible since
many Sesotho words can be categorised as a “stem” followed by a “suffix”. Different words from
the same stem is, by and large, due to different suffixes. By studying the generic way in which
suffixes are generated, it may be possible to store stems or word bases and then generating
corresponding words by some kind of algorithm.

In this project, students are not required to collect Sesotho words to build a dictionary. They are
required to design and implement a “digital library”. This, therefore, presupposes that a written
language dictionary exists or that it could be built. For all words that have no formal syllables
like the syllable “k’h” or “kh’”, students will just decide on their temporary standard. For
example, they may choose “kkh” instead.

Suggested Methodology:-

1. First, one needs to find and study any literature containing at least a list of Sesotho words with
corresponding meanings, for the purpose of discovering their “stems/bases” and “suffixes”

Page 1 of 4
wherever possible.

2. Once the work in 1 above has been completed, then an IT design and implementation of an
appropriate data dictionary will be undertaken.

Requirements:-

I need at least two (2) students who are interested in Natural Language Processing techniques in
Computing with a good handle of the Sesotho language (mainly in the written form). Proficiency
in Data Structures and Algorithms is indispensable. Knowledge of Compiler Construction and
Design may be an added advantage.

For implementation, students are free to choose their programming environment: language, OS,
etc.

2. Sesotho Grammar Checker (Category: Computer Science –


Computational Linguistics)
Motivation:-

As long ago as 1994, I had a student, Ms ’Masophia Lesaoana, who showed that the English
grammar is context free (strong LL(1)) and as such a syntax parser could be built for it. Indeed,
she built the first English grammar checker on campus. Her grammar checker out smarted the
commercially provided grammar checkers of the time, e.g. MS Word and Novel Wordperfect
Their syntax checking was pretty rudimentary!

Sesotho like English is phrase structured and the two languages nearly have the same structure.
However, there are slight differences and sometimes huge differences. In a certain sense then one
cannot use the English grammar checker for Sesotho just by replacing the English dictionary with
the Sesotho one, or is it?

The purpose of this project is to compare and contrast the phrase structure of the Sesotho
grammar and that of the English grammar (Lesaoana’s project) and to come up with a Sesotho
Grammar Checker either as an adaptation of the English grammar checker that was developed
by Lesaoana or as a new innovation altogether.

Suggested Methodology:-

1. The researcher will have to find a co-supervisor in linguistics (i.e. the department of language
and linguistics) so that he/she can learn relevant grammar concepts for a natural language like
Sesotho and English. If co-supervision is not possible then the researcher will have to learn
these basic concepts somehow – maybe by book reading.

2. Parallel to step 1 above, one will have to study the work of Lesaoana, familiarising oneself
with the concepts of formal language theory as applied to a natural language like English with
a view to eventually building a syntax checker for it (i.e. a compiler sort of).

Page 2 of 4
3. Finally, as a result of 1 and 2 above, one can build a Sesotho parser. The assumption is that
a dictionary exists.

Requirements:-

This project will need 2 students with sound knowledge of formal language theory or compiler
construction and design and who are interested in applying these skills to a natural language
processing problem – computational linguistics.

A preferred language of choice for implementation is (Visual) C++.

3. C++ Student Marker (Category: Computer Science)


Motivation:-

Teaching elementary programming course(s) to large classes, such as our CS2431 and CS2432,
could be an absolute nightmare when it comes to marking the source programs as the process is
too labour intensive due to the bulkiness of the program source codes. In most cases, one may
choose the easier option: mark program correctness by running it on some chosen bench mark
data. However, there is need to look at the source code and mark such things as: programming
style, choice of meaningful identifiers, best practices on the usage of program control constructs
such as the choice of looping constructs which are known to be, for example, used by the
seasoned programmers in the field, etc. It would be nice to have a digital tool for this purpose,
good enough to be acceptable to “most” human markers (again in the field). This tool is
tantamount to writing a C++ parser whose generated code shall be the score given to the parsed
C++ source program.

Last academic year, three students took up the project. Due to unforseen reasons beyond their
control, the students could not finish the project. However, they produced a fairly working C++
parser in Python (using the resources of PyParsing tools). This year’s task will be a follow up
project which will be to revisit this last year’s results, polish them (where necessary) and code
the marking heuristics in the parser.

Suggested Methodology:-

1. First, one needs to familiarise himself with the previous year’s work and the Python
programming language (and its parsing toolkit).

2. Then a recursive descend parser will be developed from 1 above with the built-in heuristics
for awarding scores and formal C++ syntax checking.

Requirements:-

I need at least two (2) students who are good in C++ syntax and C++ programming and Data
Structures and Algorithms. Knowledge of Python language will be an added advantage.

NB: For a student wishing to go into Compiler Construction and Design this could be an eye

Page 3 of 4
opener!

Page 4 of 4

You might also like