Professional Documents
Culture Documents
’ Application
Digital Siganls Processing
I. INTRODUCTION Hence comparing the user’s input voice with the calculated
averages for example if the zero crossing for the user is high
A simple python voice recognition program. The project it may highly mean that it’s a YES and vice versa.
consists of 3 phases, first being collecting voice data of the Note that splitting the signal and calculating the zero
words “YES” and “NO” by different people in order to crossing for it gives more accurate results.
analyze this data and compare it with data that the computer Finally comparing the cosine spatial distance of the bulk
has trained for. Second being a basic programming guess calculated data and the input data in order to determine the
“Who am I?” game through the terminal. The program has similarity.
12 pictures in which the user picks one, the program
afterwards starts asking yes or no questions in order to guess Making a guess who am I game was fairly easy, it was just
the picture. Third and final phase is connecting both phase made up of reducing the characteristics array after each
one and two to complete a whole Guess who am I program. input until we finally have one entry left.
Simply said the program asks the user through computer
generated speech (TTS) and records the answer for analysis, And so was connecting both phases, it was just a matter of
it being a YES or a NO. code organization.
Methods for analyzing the user’s voice pattern in order to Asserting a simple collection of test data after training,
determine if it’s a yes or a no. optimizing the recognition to which had a 100% accuracy rate. But it’s hard to confirm
be fairly accurate for each case by adding more test data the accuracy rate with such data
with a variety of spectrum cases
After merging into phase 3 we encountered a shortfall where
the test data was fed US voice training data, hence, running
III. DATA
the program on our own voices almost never gave a right
We’ve used two parts in phase one’s data. Part one: answer.
Training voice samples where the program feeds the training
So we went off our shoes and made our own data with a
data to itself and compares it to Part two: which are the test
simple program that records 10 samples of yes/no and uses
voice samples.
it as training data. We’ve added subtitles in the console for
Phase II we pulled a few easily characterized cartoonish better observation, we can see that... IT WORKS!
pictures straight out of google for the user to use.
VII. DEVELOPMENT
Phase III we’ve added a simple program that asks the user to
say yes/no to the mic 5 times each and adds it to the training For better recognition there has to be a furthermore accurate
data for better recognition next time! way, for example calculating a more precise ZCR by
chipping the signal to small pieces before calculation.