Professional Documents
Culture Documents
Input Audio
Preprocessing
Output: After preprocessing ,building model and training input data with the
help of few python modules like librosa ,scipy and keras that contains
useful CNN libraries finally we predict the correct text at the output for the
audio input.
Important python libraries used:
• Librosa and Scipy: Used for processing audio signals.
• Numpy: Used for working with arrays and matrices.
• Matplotlib: Matplotlib is a plotting library for the Python programming language and its
numerical mathematics extension NumPy.
• Random: Random is used as a tool or a feature in preparing data and
in learning algorithms that map input data to output data in order to make predictions.
• Keras:Keras is a powerful and easy-to-use free open source Python library for
developing and evaluating deep learning models. It wraps the efficient numerical
computation libraries Theano and TensorFlow and allows you to define and train
neural network models in just a few lines of code.
Stagewise result and its related discussion:
• Import all the libraries that are mentioned in previous slide.
• Data Exploration and Visualization helps us to understand the data as well as pre-processing
steps in a better way. Here is the plot for Visualization of Audio signal in time domain.
• Sampling and resampling of signal:
• The sampling rate of the signal is 16,000 Hz. But we will re-sample it to 8000 Hz since most of
the speech-related frequencies are present at 8000 Hz. The below code is used for this:
• Next step is defining the labels and a look at the distribution of the duration of recordings which is shown
in screenshot below:
• Pre processing the audio waves:
• In the data exploration part earlier, we have seen that the duration of a few recordings is less than 1
second and the sampling rate is too high. So, let us read the audio waves of defined labels and use
the below-preprocessing steps to deal with this.
• Here are the two steps we’ll follow 1.Resampling and 2.Removing shorter commands of less than 1
second below is the code for it.
• Next we converted the output labels to integer encoded since CNN algorithm requires input and
output variables as numbers before we can use it to fit and evaluate a model. Code ss is below:
• Now, we converted the integer encoded labels to a vector since it is a multi-classification problem:
• Reshape the 2D array to 3D since the input to the conv1d must be a 3D array the code for it is;
• Split into train and test set
• Next, we will train the model on 80% of the data and test on the remaining 20%:
• Now with the help of random module the model will take randomly any audio and by
calling the predict function we get the final output text.