You are on page 1of 10

TRANSLATION OF FREQUENTED WORDS OF

TELUGU MEDIUM TEXT BOOKS TO ENGLISH


The Project report submitted in partial fulfillment of the requirement for
the award of the Degree of

MASTER OF TECHNOLOGY
in
COMPUTER SCIENCE AND TECHNOLOGY with COMPUTER NETWORKS

Submitted by

G.MOHANA SOWJANYA

(Reg.No:320206418003)

Under the guidance of

Professor. D. LALITHA BHASKARI

Department of CS&SE

DEPARTMENT OF COMPUTER SCIENCE AND SYSTEMS ENGINEERING

ANDHRA UNIVERSITY COLLEGE OF ENGINEERING (A)

ANDHRA UNIVERSITY, VISAKHAPATNAM-530003 MAY 2022


ABSTRACT

Number of the students are studying in Telugu medium. They face a lot of difficulties
when it comes to pursuing higher education in English medium. Although they have a lot
of skill they are not able to compete with English medium students. Because until then
they read everything in Telugu and they don’t know English terminology. If they know the
English terminology while studying in telugu medium then it will be easier for them when
they have to pursue education in English medium .To assist them this project implements
Frequency based approach for giving english meaning of most frequented important
telugu words gathered from telugu medium textbooks. I hope it will help a little bit to the
telugu medium students.
Existing System

The telugu medium students read everything in telugu and they don’t know English
terminology .Most of the them use dictionaries to search for Telugu meanings of English
words when they have to pursue education in English medium and some of them are taking
the help of lecturers. However it takes some time to learn and get good marks in
examinations .

Proposed System

It is difficult to learn and memorize English terminology at once for telugu medium
students. If they learn the English terminology while studying in telugu medium then it will
be easier for them when they have to pursue education in English medium. To assist them
in learning English terminology for telugu words, this project implements the translation of
the most frequently occurring Telugu words of Telugu medium text books to English.
Hardware Requirements

Operating System: Windows or Linux systems Device


Processor : Intel Core i3 or above Installed
RAM: 8GB Disk Space: 32GB or more
Memory Space: 512MB

Software Requirements

Programming Language: Python version 3.6


Software Editor: PyChram Editor
TRANSLATION OF FREQUENTED WORDS OF
TELUGU MEDIUM TEXT BOOKS TO ENGLISH

Introduction
Number of stundents are studying in telugu medium. When they have to go for further
education in englsh medium they are facing some problems with the language. Eventhough
they have a lot of skill they are not able to compete with English medium students. Because
until then they read everything in Telugu and they don’t know English terminology.
Although they use dictionaries and take help from lecturers it is difficult to learn and
memorize English terminology at once for telugu medium students and finally they are
unable to score good marks in examinations. If they learn the English terminology while
studying in telugu medium then it will be easier for them when they have to pursue
education in English medium. To assist them in learning English terminology for telugu
words, this project implements the translation of the most frequently occurring Telugu
words of Telugu medium text books to English using Frequency based approach.
Techniques

This work involves various techniques. These techniques helps in reducing the size of the
document that results a short set of words that gives main meaning of the text. They are as
follows-
A) K-Means Clustering
B) Frequency based approach

A)K-Means clustering
K-Means algorithm is for finding similar group of data by forming various clusters. It is an
Iterative algorithm which follows two steps. Cluster Assignment,Move Centroid.
This algorithm goal is to divide M observations into K clusters in which each observation is
a part of a cluster with nearest mean.

B) Frequency based approach


1) Frequency of keywords
In this we are going to calculate the frequency of each word. The words with maximum
frequency are called Keywords. Based on keywords, Score is awarded to each sentence. For
each keyword the sentence gets 0.1 score.
2) Filtering of stop words
In any document there are words without having any meaning such as ఆ, ��, ఉం��, ల�,
క���, �ా, ��ౖ Etc., used very frequently in Telugu language. Those words are not useful for
what users are searching for, while executing queries.
Frequency based approach

Algorithm Steps

• Read telugu text from the user and extract sentences.


• Develop the tokens from the sentences.
• After tokenization remove the stop words.
• Count the frequency of each word in the document.
• Remove the low frequency words from the document and consider the words with
maximum frequency count.
• Traslate those frequented words to English

Read telugu text from the user and extract sentences:


To process the data, the text has to be divided or simplified into sentences for performing
the necessary operations and classifying into important or unimportant text.
sentence_tokenize of indic_nlp_library is the module used for this purpose. The splitting is
done based on a full stop, which marks the end of the sentence.

Develop the tokens from the sentences:


Splitting the Sentences into tokens i.e words. This process is called tokenization.

Remove stop words:


After tokenization the document is cleaned by removing the stop words like full stops,
commas etc. Some of the stop words in Telugu are:
ఆ, ��, ఉం��, ఉ��న్ �, ఇ��, ల���, �ా�, మ��య�, ఈ, ��సం, ��, ల�, క���, �ా, ��ౖ, మ�త్ర��,గ���ం�

Count the frequency of each word in the document:


After removing stop words, frequency of each word is calculated.

Extract words with maximum frequency count:


Words with low frequency are exempted from the summary. Words with high frequency
are extracted.

Traslate frequented words to English:


After extracting high frequency words translate them to english.
Flowchart:

Text File

Extract Sentences

Develop tokens

Remove stop words

Frequency Count

Remove low frequency terms

Translate from telugu to english
RESULT

INPUT:

మనం ��ాయ్�, ప��హ� ర, ఇ��్ల , చ�ా�, పప�ప్ల� �దల�ౖన ��ధ ర�ాల ఆ�ర ప���ా్థలను గ���ం�
చ��్చం��ం. ���న్ ర�ాల ఆ�ర ప���ా్థలను వం�ే �����లను ��ర�్చక���న్ం. నూ��, సుగంధ
ద్ర�ాయ్ల� కలపటంవల్ల ఆ�రం ర���ా, �న����� �ల��ా ఎల� తయ�ర��ేసుక�ంట�మ��
�షయ�లను గ���ం� క��� �ెల�సుక���న్ం.
బ��ర�గ�ర� ���య్ర�్థల�� జట�్ట�ా ఏరప్డం��. ప్ర���� మనం �ే�� ��ధ ర�ాల పనుల జ����ను
�ాయం��.మనం ��� ఆ�ర ప���ా్థల జ���� క��� �ాయం��.��ం��ంట�� �� ల్చం��.�ర�
తయ�ర��ే�ిన �����కను ప్రద��్శంచం��. ప్ర���� మనం �ే�� పనులక�, �సు���� ఆ��ా��� మధయ్
ఏ����� సంబంధం ఉం��?� �త�్రల��, ఉ�ా��య్య�ల�� చ��్చంచం��. ప్ర���� మనం రకర�ాల
పనుల� �ేయ����� �ావల�ిన శ��త్ మనం ��� ఆ�రంనుం� ల�సుత్ం��.�ద్ర�� �� సమయంల�
క��� మనక� శ��త్ అవసరమ�? ఎందుక�?"మనం ���స
్ర త్ ునన్ప�ప్డ� క��� �ా్వస���య రకత్ పస
్ర రణ
మన శ��రంల� జర�గ�త��� ఉంట��.అందు�� ���ం్ర �ే సమయంల� క��� మనక� శ��త్ అవసరం"
అ� �ెబ�త�ంట�ర�.���� �వ� అం��క���త్ ా�ా?�ద్ర�� ��టప�ప్డ� మన ప్ర� ఒక్క���� తనకంట�
బ��ా ఇష్ట ��న ఆ�రం: శ��రంల� ఇం�ా ఏ ఏ పనుల� జర�గ����.� ��ట�ప�సత్ కంల�
�ాయం��.ఏ�ో ఒకట� ఉంట�ం��.

OUTPUT:

The most frequent words are


['ఆ�ర', 'ప్ర����', 'శ��త్', '��ధ']

ఆ�ర food
ప్ర���� Everyday
శ��త్ energy
��ధ Different
References:

[1]Sana Shashikanth1 , Sriram Sanghavi2. “Text Summarization Techniques Survey on


Telugu and Foreign Languages”. International Journal of Research in Engineering, Science
and Management Volume-2, Issue-1, January-2019.
[2]K Usha Manjari. “Extractive Summarization of Telugu Documents using TextRank
Algorithm”. Proceedings of the Fourth International Conference on I-SMAC (IoT in Social,
Mobile, Analytics and Cloud) (I-SMAC) .
[3]Jagadish S KALLIMANI, Srinivasa K G, Eswara REDDY B. “Information Extraction
by an Abstractive Text Summarization for an Indian Regional Language”

You might also like