You are on page 1of 15

RV College of Engineering®

(Autonomous Institution Affiliated to VTU ,Belagavi)

REAL TIME LANGUAGE TRANSLATION

Experiential Learning Report

Submitted by

Hanisha - 1RV22CS

Shreya Chakote - 1RV22CS189

Vanshika Khandelwal- 1RV22CS224

COMPUTER SCIENCE & ENGINEERING


Contents
1. Introduction

2. Objectives

3. Methodology

4. Patents and papers

5. Implementations

8. Conclusion
INTRODUCTION

A transformative solution poised to revolutionize


communication on a global scale. These cutting-edge
spectacles employ state-of-the-art technology to
instantaneously interpret spoken language into the
wearer's native tongue, offering seamless communication
across linguistic boundaries.

Imagine traveling to foreign lands and effortlessly


conversing with locals, conducting business negotiations
with ease, or connecting with people from different
cultures without the constraints of language barriers.
These glasses redefine the way we communicate,
fostering greater understanding and cooperation among
people worldwide.
PATENT - US 2015/0120276 A1
LINK: https://patents.google.com/patent/US20150120276A1/en

METHODOLOGY AND COMPONENTS:

Hardware Components:
Frame: Holds all the components together.
Pair of Glasses: Lenses held by the frame, potentially translucent.
Input Unit: Allows user interaction, including buttons for taking
photos and selecting languages.
Camera Unit: Captures images of text that need translation.
Projection Device: Rotatably coupled to the frame, used to
project translated text.

Software Components:

Camera Control Module: Activates and controls the camera unit


to capture images of text.
Word Identification Module: Identifies the words shown in the
captured image.
Translation Module: Translates the identified words from the
initial language into the target language.
Display Control Unit: Controls the projection device to display
the translated words onto a surface.
User Interaction: Users interact with the intelligent glasses
through the input unit, which includes buttons for taking photos
and selecting target languages. When the user presses the photo
button, the camera unit captures an image of the text needing
translation.
METHODOLOGY:

Image Processing and Translation: The captured image


is processed by the camera control module and word
identification module to identify the text. The identified
text is then translated using the translation module from
the initial language to the target language selected by the
user.

Projection of Translated Text: Once translated, the


display control unit directs the projection device to
display the translated text onto a surface. This could
include projecting onto the lenses of the glasses
themselves or onto another surface, depending on the
design and user preferences.

Hardware and Software Integration: The hardware and


software components work together seamlessly to
provide the translation functionality in real-time,
allowing users to understand foreign text
instantaneously.
PAPER

LINK:https://www.ripublication.com/ijaer18/ijaerv13n9_90.pdf

METHODOLOGY AND COMPONENTS:

➔ The system proposed consists of four main modules:


● data acquisition
● pre-processing
● feature extraction
● sign recognition
● Two main approaches for sign language
recognition are described:

1.Glove-based approaches
Glove-based approaches involve wearing a sensor glove,
simplifying the segmentation process, while vision-based
approaches use image processing algorithms to detect
and track hand signs and facial expressions.

2. Vision-based approaches
Vision-based methods are preferred due to their ease of
use and absence of additional hardware requirements,
but they may have accuracy issues that need to be
addressed.

➔ Principal Component Analysis (PCA) is mentioned as


the main feature extraction technique used in the
proposed system.

➔ ALGORITHM USED FOR SIGN RECOGNITION


● Linear Discriminant Analysis (LDA) algorithm is used
for sign recognition, which involves dimensionality
reduction and comparing gestures using Euclidean
distance.
● Training and recognition phases are described,
where gestures are represented as vectors,
normalized, projected onto gesture space, and
compared for recognition.
OUR APPROACH AND IMPROVEMENTS

● Instantly translating spoken language, sign language,


and written text into your desired language.
● Recognition technology accurately interpreting sign
language gestures, providing real-time translation for
both deaf and hearing individuals.
● Utilizing Optical Character Recognition (OCR)
technology to translate written text, such as signs,
menus, and documents, enhancing accessibility in
diverse environments.
● Advanced speech recognition capabilities capture
spoken words with precision, enabling accurate
translation into the target language in real-time.
● User-friendly design with touch-sensitive controls and
voice commands for easy operation, making it
accessible to users of all technical backgrounds.
● Integrating with Wi-Fi or cellular networks for access
to online translation services, expanding language
options and improving translation accuracy.

HARDWARE COMPONENTS
● Frame, pair of glasses
● Raspberry Pi 4 Model B
● Spoken Input : USB or I2S microphone
● Camera unit: To detect text and sign language
● Memory unit: language database
● Speakers to output the translated audio

SOFTWARE COMPONENTS

● Operating system- installed in raspberry Pi


like Raspberry Pi OS or Ubuntu Mate
● Speech Recognition Software
● Language Translation Software
● Audio Processing Software: for noise
recognition, echo cancellation
● Connectivity Software: such as Wi-Fi or
Bluetooth
● Gesture Recognition Software
● Gesture Database

WORKING

1.Initialising:
Upon startup, the Raspberry Pi boots up and
initializes all necessary hardware components,
including the camera module, microphone,
speaker/headphones, and display module.

2.Input Acquisition:
The camera module captures real-time video
feed of the user's hand gestures. The
microphone captures spoken language input
from the user.

3.Gesture Recognition:
The captured video frames are processed using
image processing algorithms implemented on
the Raspberry Pi.Image segmentation
techniques are applied to isolate the hand
region from the background.Feature extraction
algorithms, such as contour detection and
keypoint extraction, are used to identify
relevant hand gestures.

4.Speech Recognition:
The captured spoken language input from the
microphone is processed using speech
recognition software libraries or services.The
speech recognition algorithms convert the
spoken language into text format, which serves
as the input for the translation process.

5.Language Translation:
The recognized sign language gestures and the
transcribed spoken language text are input into
the translation system. Language translation
software, such as Google Translate API or
Microsoft Translator, translates the spoken
language text into the desired target
language.For sign language translation, a
database of sign language gestures mapped to
corresponding spoken language translations is
used to translate recognized gestures into text
format.

6.Output Generation:
The translated text, both from the spoken
language and sign language inputs, is
displayed on the wearable display module.
7.User Interaction:The user interacts with the
system through an app, adjusting modes and
receiving real-time feedback for a tailored
translation experience.
8.Feedback and Optimization:The system
provides feedback to the user in the form of visual
and auditory cues, confirming successful
translation and providing assistance in case of
errors or misunderstandings.

IMPLEMENTATION

Image:
Camera:

Text and audio:


CONCLUSION

In conclusion, the implementation of the Image


to Text and Text to Speech translator
represents a significant step forward in
bridging the gap between visual content and
accessibility for individuals with visual
impairments. Through the integration of
OpenCV, pytesseract, and gTTS libraries, we
have developed a robust system capable of
extracting text from images and converting it
into speech in a seamless manner.

While the current implementation


demonstrates functional capabilities, there is
scope for further refinement and expansion.
Future iterations could focus on enhancing
OCR accuracy through advanced
preprocessing techniques, optimizing speech
synthesis for improved naturalness and clarity,
and exploring additional features such as
multilingual support and compatibility with
diverse image formats.
Overall, the Image to Text and Text to Speech
translator project underscores the potential of
technology to empower individuals with
disabilities, improve accessibility across
diverse contexts, and contribute to the creation
of more inclusive digital environments. With
continued innovation and refinement, such
systems have the capacity to make a
meaningful difference in the lives of users
worldwide.

You might also like