You are on page 1of 4

ASSISTED VISION

Team Members:
Khushang Thakkar - 20BCE2014
Patel Tirth Ashvinbhai - 20BCE2022
Ritu Raj - 20BCE2938
Keshav Varshney - 20BCE0285
Avineesh Sathyakumar - 20BCT0043

Report submitted for the


Final Project Review of

Course Code: CSE3013


Artificial Intelligence

Slot: B2+TB2

Professor:
Sharmila Banu K
Object detection with voice output
What’s new?
Seamless voice output and object detection in parallel. We often see object detection but the
speaking of what is being detected is new.

Problems faced:
1. When we were trying to implement the voice output on object detection, as the object was
being read out, the video stream started to lag.
It was inferred that the system paused to speak the object and then resumed with the object detection.

Fixing:
To fix this we created a separated function which worked separately on a thread such that
both functionalities can work parallelly.

2. As expected, the model detected multiple objects in front of it. The problem was that when
multiple objects were detected, the code continuously read out the objects alternatively and
over each other. Such that it wasn’t audible.

Fixing:
Instead of iterating through all the detected objects, we chose the first object in the list
and only one object at a time was made to be detected and spoken.
There was a minor bug in this, where the website encountered null values sometimes, we
added a null check in order to cope with this problem.
Tic-Tac-Toe
What’s new?
The system allows user to play as usual, by clicking on the boxes. The system will speak out
which user’s turn it is, whether the slot is already occupied or not and who won/tie.
We also implemented a voice recognition module. It recognizes specific commands which can be
used to play the game. Numbers 1 to 9 for 9 boxes can be said to specify which box to select. We
can also say, “top”, “bottom right”, etc. to specify which box to choose.

Problems faced:
1. We tried to implement voice text-to-speech right at the loading of the website, and a lot of
attempts were made. But it turned out, it is a Web3 policy where audios are not allowed to be
played without some user interaction first.

Fixing:
We added a button to start the game, it also started the voice recognition, after which the
system works smoothly

2. Voice recognition running multiple commands at once. While the game was initially made,
the voice recognition took in more than one recognition of a specific command, for example,
if ‘top’ was said, the system registered it two times or more. Because of this when the system
read out the messages required, it became lengthy as the system kept speaking unstoppably.

Console.log for voice recognition

As you can see the 2 before “mustard 3” or the 3 before “3”, it means that the system
processed those inputs more than one time.
Fixing:
The text-to-speech API had a function to cancel if the system is already speaking, we put
that in, and if it was found that the system was already speaking, we canceled it and made
it speak the new message. We also found a parameter called queue, which takes a
Boolean value, if put false, it doesn't allow the system to wait for the message to be
spoken.

Link:
https://github.com/Thakkar-Khushang/assisted-vision

Our Work:
https://drive.google.com/drive/folders/1StzBd2hBOZ1DBNwmzOuxTfHZtyIJZ3Ve?usp=sharin
g

You might also like