You are on page 1of 4

5/27/2019 wk_speech_recognition_v2.0 5/27/2019 wk_speech_recognition_v2.

In [1]:

# Import packages
Workshop: Speech cognitive system
import os
Contact: Tian Jing import numpy as np
from scipy.io import wavfile
Email: tianjing@nus.edu.sg
# For HMM model
from hmmlearn import hmm
Objective from python_speech_features import mfcc

In this workshop, we will need to perform following tasks

Build a HMM-based speech recognition system In [2]:


Build a speech recognition system using SpeechRecognition package
# Class to handle all HMM related processing
class HMMTrainer(object):
Installation guideline def __init__(self, model_name='GaussianHMM', n_components=4, cov_type='diag', n_ite
r=1000):
Required packages self.model_name = model_name
numpy=1.15.1 self.n_components = n_components
matplotlib=2.2.3 self.cov_type = cov_type
self.n_iter = n_iter
scipy=1.1.0
self.models = []
scikit-learn=0.19.1
hmmlearn=0.2.1 # WRITE YOUR OWN CODE
notebook=5.7.6 self.model = hmm.GaussianHMM(n_components=self.n_components, covariance_type=se
python_speech_features lf.cov_type, n_iter=self.n_iter)
SpeechRecognition
# X is a 2D numpy array where each row is 13D
pocketsphinx def train(self, X):
np.seterr(all='ignore')
Create a new virtual environment or install additional packages in your own environment self.models.append(self.model.fit(X))
Open Anaconda Prompt
conda create -n cogspeech python=3.6 numpy=1.15.1 matplotlib=2.2.3 scipy=1.1.0 scikit- # Run the model on input data
learn=0.19.1 hmmlearn=0.2.1 notebook=5.7.6 def get_score(self, input_data):
Conda activate cogspeech return self.model.score(input_data)
pip install python_speech_features SpeechRecognition --upgrade pocketsphinx
Browse to the folder that contains the workshop files, then type 'jupyter notebook'

Submission guideline
Once you finish your workshop, rename your .ipynb file to your name, and submit your .ipynb file into
LumiNUS.

Exercise 1: HMM-based speech recognition

localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 1/7 localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 2/7
5/27/2019 wk_speech_recognition_v2.0 5/27/2019 wk_speech_recognition_v2.0

In [3]: Process the file data\apple\apple01.wav


Process the file data\apple\apple02.wav
# Build an HMM model Process the file data\apple\apple03.wav
Process the file data\apple\apple04.wav
input_folder = "data" Process the file data\apple\apple05.wav
Process the file data\apple\apple06.wav
hmm_models = [] Process the file data\apple\apple07.wav
# Parse the input directory Process the file data\apple\apple08.wav
for dirname in os.listdir(input_folder): Process the file data\apple\apple09.wav
# Get the name of the subfolder Process the file data\apple\apple10.wav
subfolder = os.path.join(input_folder, dirname) Process the file data\apple\apple11.wav
if not os.path.isdir(subfolder): Process the file data\apple\apple12.wav
continue Process the file data\apple\apple13.wav
Process the file data\apple\apple14.wav
# Extract the label Process the file data\banana\banana01.wav
label = subfolder[subfolder.rfind('\\') + 1:] Process the file data\banana\banana02.wav
Process the file data\banana\banana03.wav
# Initialize variables Process the file data\banana\banana04.wav
X = np.array([]) Process the file data\banana\banana05.wav
y_words = [] Process the file data\banana\banana06.wav
Process the file data\banana\banana07.wav
# Iterate through the audio files (leaving 1 file for testing in each class) Process the file data\banana\banana08.wav
for filename in [x for x in os.listdir(subfolder) if x.endswith('.wav')][:-1]: Process the file data\banana\banana09.wav
# Read the input file Process the file data\banana\banana10.wav
filepath = os.path.join(subfolder, filename) Process the file data\banana\banana11.wav
print('Process the file', filepath) Process the file data\banana\banana12.wav
sampling_freq, audio = wavfile.read(filepath) Process the file data\banana\banana13.wav
Process the file data\banana\banana14.wav
# Extract MFCC features Process the file data\kiwi\kiwi01.wav
# WRITE YOUR OWN CODE Process the file data\kiwi\kiwi02.wav
mfcc_features = mfcc(audio, sampling_freq) Process the file data\kiwi\kiwi03.wav
Process the file data\kiwi\kiwi04.wav
# Append to the variable X Process the file data\kiwi\kiwi05.wav
if len(X) == 0: Process the file data\kiwi\kiwi06.wav
X = mfcc_features Process the file data\kiwi\kiwi07.wav
else: Process the file data\kiwi\kiwi08.wav
X = np.append(X, mfcc_features, axis=0) Process the file data\kiwi\kiwi09.wav
Process the file data\kiwi\kiwi10.wav
# Append the label Process the file data\kiwi\kiwi11.wav
y_words.append(label) Process the file data\kiwi\kiwi12.wav
Process the file data\kiwi\kiwi13.wav
# Train and save HMM model Process the file data\kiwi\kiwi14.wav
hmm_trainer = HMMTrainer() Process the file data\lime\lime01.wav
hmm_trainer.train(X) Process the file data\lime\lime02.wav
hmm_models.append((hmm_trainer, label)) Process the file data\lime\lime03.wav
hmm_trainer = None Process the file data\lime\lime04.wav
Process the file data\lime\lime05.wav
Process the file data\lime\lime06.wav
Process the file data\lime\lime07.wav
Process the file data\lime\lime08.wav
Process the file data\lime\lime09.wav
Process the file data\lime\lime10.wav
Process the file data\lime\lime11.wav
Process the file data\lime\lime12.wav
Process the file data\lime\lime13.wav
Process the file data\lime\lime14.wav
Process the file data\orange\orange01.wav
Process the file data\orange\orange02.wav
Process the file data\orange\orange03.wav
Process the file data\orange\orange04.wav
Process the file data\orange\orange05.wav
localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 3/7 localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 4/7
5/27/2019 wk_speech_recognition_v2.0 5/27/2019 wk_speech_recognition_v2.0

Process the file data\orange\orange06.wav


Process the file data\orange\orange07.wav In [4]:
Process the file data\orange\orange08.wav
# Step 1: Select test audio file
Process the file data\orange\orange09.wav
test_file_name = 'data\\pineapple\\pineapple15.wav'
Process the file data\orange\orange10.wav
sampling_freq, audio = wavfile.read(test_file_name)
Process the file data\orange\orange11.wav
Process the file data\orange\orange12.wav
# Step 2: Extract MFCC features
Process the file data\orange\orange13.wav
mfcc_features = mfcc(audio, sampling_freq)
Process the file data\orange\orange14.wav
Process the file data\peach\peach01.wav
max_score = None
Process the file data\peach\peach02.wav
output_label = None
Process the file data\peach\peach03.wav
Process the file data\peach\peach04.wav
# Step 3: Iterate through all HMM models and pick the one with the highest score
Process the file data\peach\peach05.wav
for item in hmm_models:
Process the file data\peach\peach06.wav
hmm_model, label = item
Process the file data\peach\peach07.wav
score = hmm_model.get_score(mfcc_features)
Process the file data\peach\peach08.wav
if max_score is None:
Process the file data\peach\peach09.wav
max_score = score
Process the file data\peach\peach10.wav
output_label = label
Process the file data\peach\peach11.wav
Process the file data\peach\peach12.wav
if score > max_score:
Process the file data\peach\peach13.wav
max_score = score
Process the file data\peach\peach14.wav
output_label = label
Process the file data\pineapple\pineapple01.wav
Process the file data\pineapple\pineapple02.wav
# Print the output
Process the file data\pineapple\pineapple03.wav
print('Process file: ', test_file_name, ", Predicted: ", output_label)
Process the file data\pineapple\pineapple04.wav
Process the file data\pineapple\pineapple05.wav
Process the file data\pineapple\pineapple06.wav Process file data\pineapple\pineapple15.wav , Predicted: pineapple
Process the file data\pineapple\pineapple07.wav
Process the file data\pineapple\pineapple08.wav
Q1. Complete following code to perform HMM-based audio recognition.
Process the file data\pineapple\pineapple09.wav
Process the file data\pineapple\pineapple10.wav
Tasks
Process the file data\pineapple\pineapple11.wav
Process the file data\pineapple\pineapple12.wav
Apply the pre-trained HMM model to perform audio recognition
Process the file data\pineapple\pineapple13.wav
Process the file data\pineapple\pineapple14.wav Evaluate the audio recognition performance by changing HMM model configuration

Reference

HMMLearn, https://hmmlearn.readthedocs.io/en/latest/tutorial.html
(https://hmmlearn.readthedocs.io/en/latest/tutorial.html)
Python_speech_features, https://python-speech-features.readthedocs.io/en/latest/ (https://python-
speech-features.readthedocs.io/en/latest/)

In [5]:

# Provide your answer here

# Step 1: Select test audio file

# Step 2: Extract MFCC features

# Step 3: Iterate through all HMM models and pick the one with the highest score

# Print the output

localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 5/7 localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 6/7
5/27/2019 wk_speech_recognition_v2.0

Exercise 2: Speech Recognition


Speech recognition engine/API support: https://pypi.org/project/SpeechRecognition/
(https://pypi.org/project/SpeechRecognition/)

In [6]:

# For speech recognition using pre-trained models


import speech_recognition as sr

In [7]:

# Load data

AUDIO_FILE = "english.wav"

# use the audio file as the audio source


r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file

Q2. Complete code to perform speech recognition with option sphinx.

Tasks

Perform speech recognition


(optional) Apply https://www.text2speech.org/ (https://www.text2speech.org/) to generate speech wav file
and test your speech recognition system performance

In [8]:

# recognize speech using Sphinx


print("Speech recognition result: " + r.recognize_sphinx(audio))

Speech recognition result: one two three

Once you finish your workshop, rename your .ipynb file to your name, and submit your .ipynb file
into LumiNUS.

Have a nice day!

localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 7/7

You might also like