You are on page 1of 28

Cover image

1
Title page
Introduction to Audio Analysis
A MATLAB Approach
First Edition
Theodoros Giannakopoulos and Aggelos Pikrakis

2
Table of Contents
Table of Contents
Cover image
Title page
Copyright
Preface
Acknowledgments
List of Tables
List of figures
1: Basic Concepts, Representations and Feature Extraction
1: Introduction
1.1 The MATLAB Audio Analysis Library
1.2 Outline of Chapters

1.3 A Note on Exercises

2: Getting Familiar with Audio Signals


2.1 Sampling
2.2 Playback
2.3 Mono and Stereo Audio Signals
2.4 Reading and Writing Audio Files
2.6 Recording Audio Data
2.5 Reading Audio Files in Blocks
2.7 Short-term Audio Processing
3
Table of Contents
2.8 Exercises
3: Signal Transforms and Filtering Essentials
3.1 The Discrete Fourier Transform
3.2 The Short-Time Fourier Transform
3.3 Aliasing in More Detail
3.4 The Discrete Cosine Transform
3.5 The Discrete-Time Wavelet Transform
3.6 Digital Filtering ...

4
Copyright
Copyright
Academic Press is an imprint of Elsevier
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK
225 Wyman Street, Waltham, MA 02451, USA
525 B Street, Suite 1800, San Diego, CA 92101-4495, USA
First edition 2014
Copyright © 2014 Elsevier Ltd. All rights reserved.

MATLAB® is a registered trademarks of The MathWorks, Inc.


For MATLAB and Simulink product information, please contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA, 01760-2098 USA

Fax: 508-647-7001
E-mail: info@mathworks.com
Web: mathworks.com
No part ...

5
Preface
Preface
This book attempts to provide a gentle introduction to the field of audio analysis using the
MATLAB programming environment as the vehicle of presentation. Audio analysis is a
multidisciplinary field, which requires the reader to be familiar with concepts from
diverse research disciplines, including digital signal processing and machine learning. As a
result, it is a great challenge to write a book that can provide sufficient coverage of the
important concepts in the field of audio analysis and, at the same time, be accessible to
readers who do not necessarily possess the required scientific background.
Our main goal has been to provide a standalone introduction, involving a balanced
presentation of theoretical descriptions and reproducible ...

6
Acknowledgments
Acknowledgments
This book has improved thanks to the support of a number of colleagues, students, and
friends, who have provided generous feedback and constructive comments, during the
writing process. Above all, T. Giannakopoulos would like to thank his wife, Maria, and
his daughter, Eleni, for always being cheerful and supportive. A. Pikrakis would like to
thank his family for their patience and generous support and dedicates this book to all the
teachers who have shaped his life.

7
List of Tables
List of Tables
Table 1.1 Difficulty Levels of the Exercises 7
Table 2.1 Execution Times for Different Loading Techniques 20
Table 2.2 Sound Recording Using the Data Acquisition Toolbox 20
Table 4.1 Class Descriptions for the Multi-Class Task of Movie Segments 69
Table 5.1 Classification Tasks and Files 131
Table 5.2 Row-Wise Normalized Confusion Matrix for the 8-Class Audio Segment Classification Task 144
Table 5.3 Row-Wise Normalized Confusion Matrix for the Speech vs Music Binary Classification Task 144
Table 5.4 Row-Wise Normalized Confusion Matrix for the 3-Class Musical Genre Classification Task 146
Table 5.5 Row-Wise Normalized Confusion Matrix for the Speech vs Non-Speech Classification Task 147
Table A.1 List of All Functions Included in the MATLAB Audio ...

8
List of figures
List of Figures
Figure A synthetic audio signal. 12
2.1
Figure A STEREO audio signal. 14
2.2
Figure Short-term processing of an audio signal. 26
2.3
Figure Plots of the magnitude of the spectrum of a signal consisting of three frequencies at 200, 500, and 1200 Hz. 38
3.1
Figure A synthetic signal consisting of three frequencies is corrupted by additive noise. 40
3.2
Figure The spectrogram of a speech signal. 41
3.3
Figure Spectrograms of a synthetic, frequency-modulated signal for three short-term frame lengths. 42
3.4
Figure Spectrum representations of (a) an analog signal, (b) a sampled version when the sampling frequency exceeds the Nyquist rate, and
3.5 (c) a sampled version with insufficient sampling frequency. In the last case, the shifted versions ...

9
1: Basic Concepts, Representations and Feature Extraction
Part 1: Basic Concepts, Representations and Feature Extraction
Outline
Introduction
Getting Familiar with Audio Signals
Signal Transforms and Filtering Essentials
Audio Features

10
1: Introduction
1

Introduction
Abstract
This chapter has an introductory purpose. A chapter outline is provided, along with
general notes on the book’s exercises and the companion software. Before we proceed, it
is important to note that, although in this book the term audio does not exclude the
speech signal, we are not focusing on traditional speech-related problems that have been
studied by the research community for decades, e.g., speech recognition and coding.
Keywords
Audio analysis
MATLAB
During recent years we have witnessed the increasing availability of audio content via
numerous distribution channels both for commercial and non-profit purposes. The
resulting wealth of data has inevitably highlighted the need for systems that are capable
of analyzing ...

11
2: Getting Familiar with Audio Signals
2

Getting Familiar with Audio Signals


Abstract
The purpose of this chapter is to provide basic knowledge and techniques related to the
creation, representation, playback, recording and storing of audio signals using
MATLAB. In addition, short-term audio analysis is introduced here.
Keywords
MATLAB
Playback
Recording
Audio signals
Short-term

WAVE file
MP3
Nyquist
Sample rate
Short-term processing
The goal of this chapter is to provide the basic knowledge and techniques that are
essential to create, playback, load, and store the audio signals that will be analyzed.
Therefore, we focus on certain practical skills that you will need to develop in order to
prepare the audio signals on which the audio analysis methods of later chapters will be ...

12
3: Signal Transforms and Filtering Essentials
3

Signal Transforms and Filtering Essentials


Abstract
This chapter presents methods towards the representation of audio signals in the
frequency domain. Special emphasis has been placed on the description of the Discrete
Fourier Transform because a lot of material in later chapters of this book assumes that the
reader is familiar with this particular transform. Furthermore, the chapter aims at
presenting the fundamentals of digital filters in MATLAB, so that, in the end of this
chapter, the reader will be able to create and experiment with basic filter types and
understand how they can affect the performance of various audio analysis stages.
Keywords
Discrete Fourier Transform
DFT
Fast Fourier Transform
FFt

Short-Time Fourier Transform

13
4: Audio Features
4

Audio Features
Abstract
This chapter focuses on presenting a wide range of audio features. Apart from the
theoretical background of these features and respective MATLAB code, their
discrimination ability is also demonstrated for particular audio types.
Keywords
Short-Term feature extraction
Mid-Term windowing
Energy
Zero crossing rate
Entropy of energy
Spectral centroid

Spectral spread
Spectral entropy
Spectral flux
Spectral rolloff
Mel-Frequency cepstrum coefficients
MFCCs
Chroma vector
14

Harmonic ratio
4: Audio Features
Fundamental frequency
Feature extraction is an important audio analysis stage. In general, feature extraction is
an essential processing step in pattern recognition and machine learning tasks. The goal is
to extract a set of features from ...

15
2: Audio Content Characterization
Part 2: Audio Content Characterization
Outline
Audio Classification
Audio Segmentation
Audio Alignment and Temporal Modelling

16
5: Audio Classification
5

Audio Classification
Abstract
This chapter describes the task of classifying unknown audio segments of “homogeneous
content” to a set of predefined audio classes. In particular, theoretical background is
provided regarding popular classification methods, including Support Vector Machines,
Decision Trees and the -Nearest-Neighbor method. The reader is also introduced to
generic performance measures and validation methods for the estimation of the
performance of a classifier. The chapter concludes with the presentation of performance
measurements for a set of typical audio classification tasks.
Keywords
Audio segments
Posterior probability ...

17
6: Audio Segmentation
6

Audio Segmentation
Abstract
This chapter focuses on a vital stage of audio analysis, the audio segmentation stage,
which focuses on splitting an uninterrupted audio signal into segments of homogeneous
content. The chapter describes two general categories of audio segmentation: those that
employ supervised knowledge and those that are unsupervised or semi-supervised. In this
presentation context, certain specific segmentation tasks are presented, e.g., silence
removal and speaker diarization.
Keywords
Audio segmentation
Fixed-window segmentation
Probability smoothing
Silence removal

Signal change detection


Speaker diarization
Clustering
Unsupervised learning
Semi-supervised learning
Segmentation is a processing stage that is of vital importance ...

18
7: Audio Alignment and Temporal Modeling
7

Audio Alignment and Temporal Modeling


Abstract
This chapter focuses on audio analysis methods that take into account the temporal
evolution of the audio phenomena. This is done by preserving the short-term nature of
the feature sequences, in order to either create methods that align two feature sequences
or build temporal audio representations using Hidden Markov Models.
Keywords
Sequence alignment
Template matching
Dynamic time warping
DTW
Cost grid

Sakoe-Chiba
Itakura
Smith-Waterman
Hidden Markov Model
HMM
Baum-Welch
Viterbi algorithm
Trellis diagramn

19
7: Audio Alignment and Temporal Modeling
Mixture of Gaussians
This chapter presents several methods that take into account the temporal evolution of
the audio phenomena. In other words, we are no longer interested in computing ...

20
3: Other Issues
Part 3: Other Issues
Outline
Music Information Retrieval
Appendix A: The Matlab Audio Analysis Library
Appendix B: Audio-related Libraries and Software
Appendix C: Audio Datasets

21
8: Music Information Retrieval
8

Music Information Retrieval


Abstract
This chapter provides descriptions and implementations of some basic Music Information
Retrieval tasks, so that the reader can gain a deeper understanding of the field. In
particular, we focus on the tasks of music thumbnailing, meter/tempo induction and
music content visualization.
Keywords
Music thumbnailing
Audio thumbnailing
Meter extraction
Tempo extraction
Self-similarity matrix

Content visualization
Dimensionality reduction
Self organizing maps
SOMs
Fisher linear discriminant
FLD
LDA
Principal component analysis

22
8: Music Information Retrieval
PCA
During recent years, the wide distribution of music content via numerous channels has
raised the need for the development of computational tools for the analysis,
summarization, classification, ...

23
Appendix A: The Matlab Audio Analysis Library
Appendix A

The Matlab Audio Analysis Library


Abstract
This chapter gives a short description of the most important MATLAB functions
implemented in the Audio Analysis Library which serves as a companion of this book.
Keywords
MATLAB audio analysis library
Companion material
MATLAB examples
MATLAB code
This book is accompanied by a MATLAB software library, to assist with the
reproducibility of the methods presented in the book and as a toolbox for readers wishing
to embark on their own projects. Each function in the library contains a description of its
functionality. In this appendix, we present a complete list of the m-files and their

24
Appendix B: Audio-Related Libraries and Software
Appendix B

Audio-Related Libraries and Software


Abstract
This appendix presents several 3d-party audio analysis libraries and methodologies,
covering various programming languages (including MATLAB-based software).
Furthermore, non-audio libraries and packages from the fields of pattern recognition,
signal processing, etc, are presented. Although our primary focus is on MATLAB-based
code, we also provide a flavour of Python and C/C++ resources
Keywords
Software libraries
Software packages
MATLAB
Python
C++
Pattern recognition
Audio analysis
Data mining
Signal processing
In this appendix we present a number of audio analysis libraries and methodologies for
MATLAB and other programming languages. In addition, we present related (non-audio)
...

25
Appendix C: Audio Datasets
Appendix C

Audio Datasets
Abstract
This appendix provides a list of datasets which are available on the Web, that can be used
as training and evaluation data for several audio analysis tasks.
Keywords
Audio datasets
Benchmarking
Several datasets and benchmarks that focus on audio analysis tasks are available on the
Web. The diversity of the datasets is high with respect to: size, level of annotation, and
addressed audio analysis tasks. For example, there are datasets for general audio event
classification and segmentation; musical genre classification; speech emotion recognition;
speech vs music discrimination; speaker diarization; speaker identification, etc. In
addition, these datasets may or may not contain other non-audio media types ...

26
Bibliography
Bibliography
1. Proakis John G, Manolakis Dimitris K. Digital Signal Processing. fourth ed. Pearson
Education; 2009.
2. Kientzle Tim. A Programmer’s Guide to Sound with Cdrom. Addison-Wesley
Longman Publishing Co. Inc.; 1997
3. Brandenburg Karlheinz. Mp3 and aac explained. Audio Engineering Society
Conference: 17th International Conference: High-Quality Audio Coding. 1999.
4. Painter Ted, Spanias Andreas. Perceptual coding of digital audio. Proceedings of the
IEEE. 2000;88(4):451-515.
5. Theodoridis S, Koutroumbas K. Pattern Recognition. fourth ed. Academic Press,
Inc.; 2008
6. Frigo M, Johnson SG. Fftw: an adaptive software architecture for the fft. Proceedings
of the International Conference on Acoustics, Speech, and, Signal Processing.
1998:1381-1384. ...

27
Index
Index
A
abs ( ) MATLAB function, 35
Activation function, 120
Agglomerative algorithms, 175
Aliasing, 10, 43
ALSAaudio, 244
Analoginput function, 21
aplay command-line utility, 12
Application programming interface (API), 17
-ar <value> flag, 17
Au file format, 16
aubio, 245
Audio and speech, 242
Audio event detection, 4
audio RecorderOnline, 24
Audio spotting, 186
Audio tracking, 186
audiorecorder function, 22
audiorecorder object, 23
audioRecorderTimerCallback
MATLAB libraries, 241 ( ), 24
Audio-related
Python, 243 libraries and software, 241
AuditoryToolbox, Version 2, 242
auread ( ), 16
Autocorrelation function, 93–94
Automatic music transcription, 213
B
Bartlett window, 26

Baum-Welch algorithm, 198

28

You might also like