Stats Prem Handout

5/27/2019 wk_speech_recognition_v2.0 5/27/2019 wk_speech_recognition_v2.
In [1]:
# Import packages
Workshop: Speech cognitive system
import os
Contact: Tian Jing import numpy as np
from scipy.io import wavfile
Email: tianjing@nus.edu.sg
# For HMM model
from hmmlearn import hmm
Objective from python_speech_features import mfcc
In this workshop, we will need to perform following tasks
Build a HMM-based speech recognition system In [2]:

Build a speech recognition system using SpeechRecognition package
# Class to handle all HMM related processing
class HMMTrainer(object):
Installation guideline def __init__(self, model_name='GaussianHMM', n_components=4, cov_type='diag', n_ite
r=1000):
Required packages self.model_name = model_name
numpy=1.15.1 self.n_components = n_components
matplotlib=2.2.3 self.cov_type = cov_type
self.n_iter = n_iter
scipy=1.1.0
self.models = []
scikit-learn=0.19.1
hmmlearn=0.2.1 # WRITE YOUR OWN CODE
notebook=5.7.6 self.model = hmm.GaussianHMM(n_components=self.n_components, covariance_type=se
python_speech_features lf.cov_type, n_iter=self.n_iter)
SpeechRecognition
# X is a 2D numpy array where each row is 13D
pocketsphinx def train(self, X):
np.seterr(all='ignore')
Create a new virtual environment or install additional packages in your own environment self.models.append(self.model.fit(X))
Open Anaconda Prompt
conda create -n cogspeech python=3.6 numpy=1.15.1 matplotlib=2.2.3 scipy=1.1.0 scikit- # Run the model on input data
learn=0.19.1 hmmlearn=0.2.1 notebook=5.7.6 def get_score(self, input_data):
Conda activate cogspeech return self.model.score(input_data)
pip install python_speech_features SpeechRecognition --upgrade pocketsphinx
Browse to the folder that contains the workshop files, then type 'jupyter notebook'
Submission guideline
Once you finish your workshop, rename your .ipynb file to your name, and submit your .ipynb file into
LumiNUS.
Exercise 1: HMM-based speech recognition
localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 1/7 localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 2/7
5/27/2019 wk_speech_recognition_v2.0 5/27/2019 wk_speech_recognition_v2.0
In [3]: Process the file data\apple\apple01.wav

Process the file data\apple\apple02.wav
# Build an HMM model Process the file data\apple\apple03.wav
input_folder = "data" Process the file data\apple\apple05.wav
hmm_models = [] Process the file data\apple\apple07.wav
# Parse the input directory Process the file data\apple\apple08.wav
for dirname in os.listdir(input_folder): Process the file data\apple\apple09.wav
# Get the name of the subfolder Process the file data\apple\apple10.wav
subfolder = os.path.join(input_folder, dirname) Process the file data\apple\apple11.wav
if not os.path.isdir(subfolder): Process the file data\apple\apple12.wav
continue Process the file data\apple\apple13.wav
# Extract the label Process the file data\banana\banana01.wav
label = subfolder[subfolder.rfind('\\') + 1:] Process the file data\banana\banana02.wav
Process the file data\banana\banana03.wav
# Initialize variables Process the file data\banana\banana04.wav
X = np.array([]) Process the file data\banana\banana05.wav
y_words = [] Process the file data\banana\banana06.wav
# Iterate through the audio files (leaving 1 file for testing in each class) Process the file data\banana\banana08.wav
for filename in [x for x in os.listdir(subfolder) if x.endswith('.wav')][:-1]: Process the file data\banana\banana09.wav
# Read the input file Process the file data\banana\banana10.wav
filepath = os.path.join(subfolder, filename) Process the file data\banana\banana11.wav
print('Process the file', filepath) Process the file data\banana\banana12.wav
sampling_freq, audio = wavfile.read(filepath) Process the file data\banana\banana13.wav
# Extract MFCC features Process the file data\kiwi\kiwi01.wav
# WRITE YOUR OWN CODE Process the file data\kiwi\kiwi02.wav
mfcc_features = mfcc(audio, sampling_freq) Process the file data\kiwi\kiwi03.wav
Process the file data\kiwi\kiwi04.wav
# Append to the variable X Process the file data\kiwi\kiwi05.wav
if len(X) == 0: Process the file data\kiwi\kiwi06.wav
X = mfcc_features Process the file data\kiwi\kiwi07.wav
else: Process the file data\kiwi\kiwi08.wav
X = np.append(X, mfcc_features, axis=0) Process the file data\kiwi\kiwi09.wav
# Append the label Process the file data\kiwi\kiwi11.wav
y_words.append(label) Process the file data\kiwi\kiwi12.wav
# Train and save HMM model Process the file data\kiwi\kiwi14.wav
hmm_trainer = HMMTrainer() Process the file data\lime\lime01.wav
hmm_trainer.train(X) Process the file data\lime\lime02.wav
hmm_models.append((hmm_trainer, label)) Process the file data\lime\lime03.wav
hmm_trainer = None Process the file data\lime\lime04.wav
Process the file data\lime\lime05.wav
Process the file data\orange\orange01.wav
5/27/2019 wk_speech_recognition_v2.0 5/27/2019 wk_speech_recognition_v2.0

Process the file data\orange\orange07.wav In [4]:
# Step 1: Select test audio file
test_file_name = 'data\\pineapple\\pineapple15.wav'
sampling_freq, audio = wavfile.read(test_file_name)
# Step 2: Extract MFCC features
mfcc_features = mfcc(audio, sampling_freq)
Process the file data\peach\peach01.wav
max_score = None
output_label = None
# Step 3: Iterate through all HMM models and pick the one with the highest score
for item in hmm_models:
hmm_model, label = item
score = hmm_model.get_score(mfcc_features)
if max_score is None:
max_score = score
output_label = label
if score > max_score:
max_score = score
output_label = label
Process the file data\pineapple\pineapple01.wav
# Print the output
print('Process file: ', test_file_name, ", Predicted: ", output_label)
Process the file data\pineapple\pineapple06.wav Process file data\pineapple\pineapple15.wav , Predicted: pineapple
Q1. Complete following code to perform HMM-based audio recognition.
Tasks
Apply the pre-trained HMM model to perform audio recognition
Process the file data\pineapple\pineapple14.wav Evaluate the audio recognition performance by changing HMM model configuration
Reference
HMMLearn, https://hmmlearn.readthedocs.io/en/latest/tutorial.html
(https://hmmlearn.readthedocs.io/en/latest/tutorial.html)
Python_speech_features, https://python-speech-features.readthedocs.io/en/latest/ (https://python-
speech-features.readthedocs.io/en/latest/)
In [5]:
# Provide your answer here
# Step 1: Select test audio file
# Step 2: Extract MFCC features
# Step 3: Iterate through all HMM models and pick the one with the highest score
# Print the output
5/27/2019 wk_speech_recognition_v2.0
Exercise 2: Speech Recognition

Speech recognition engine/API support: https://pypi.org/project/SpeechRecognition/
(https://pypi.org/project/SpeechRecognition/)
In [6]:
# For speech recognition using pre-trained models

import speech_recognition as sr
In [7]:
# Load data
AUDIO_FILE = "english.wav"
# use the audio file as the audio source

r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
Q2. Complete code to perform speech recognition with option sphinx.
Tasks
Perform speech recognition

(optional) Apply https://www.text2speech.org/ (https://www.text2speech.org/) to generate speech wav file
and test your speech recognition system performance
In [8]:
# recognize speech using Sphinx

print("Speech recognition result: " + r.recognize_sphinx(audio))
Speech recognition result: one two three
Once you finish your workshop, rename your .ipynb file to your name, and submit your .ipynb file
into LumiNUS.
Have a nice day!
localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 7/7

Stats Prem Handout

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats Prem Handout

Uploaded by

Copyright:

Available Formats

5/27/2019 wk_speech_recognition_v2.0 5/27/2019 wk_speech_recognition_v2.

In this workshop, we will need to perform following tasks

Build a HMM-based speech recognition system In [2]:

Exercise 1: HMM-based speech recognition

In [3]: Process the file data\apple\apple01.wav

Process the file data\orange\orange06.wav

# Provide your answer here

# Step 1: Select test audio file

# Step 2: Extract MFCC features

# Print the output

Exercise 2: Speech Recognition

# For speech recognition using pre-trained models

# use the audio file as the audio source

Q2. Complete code to perform speech recognition with option sphinx.

Perform speech recognition

# recognize speech using Sphinx

Speech recognition result: one two three

Have a nice day!

localhost:8889/nbconvert/html/Dropbox/TO PRINT AND CAN DELETE/wk_speech_recognition_v2.0.ipynb?download=false 7/7

You might also like