Professional Documents
Culture Documents
Introduction To Audio Analysis
Introduction To Audio Analysis
1
Title page
Introduction to Audio Analysis
A MATLAB Approach
First Edition
Theodoros Giannakopoulos and Aggelos Pikrakis
2
Table of Contents
Table of Contents
Cover image
Title page
Copyright
Preface
Acknowledgments
List of Tables
List of figures
1: Basic Concepts, Representations and Feature Extraction
1: Introduction
1.1 The MATLAB Audio Analysis Library
1.2 Outline of Chapters
4
Copyright
Copyright
Academic Press is an imprint of Elsevier
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, UK
225 Wyman Street, Waltham, MA 02451, USA
525 B Street, Suite 1800, San Diego, CA 92101-4495, USA
First edition 2014
Copyright © 2014 Elsevier Ltd. All rights reserved.
Fax: 508-647-7001
E-mail: info@mathworks.com
Web: mathworks.com
No part ...
5
Preface
Preface
This book attempts to provide a gentle introduction to the field of audio analysis using the
MATLAB programming environment as the vehicle of presentation. Audio analysis is a
multidisciplinary field, which requires the reader to be familiar with concepts from
diverse research disciplines, including digital signal processing and machine learning. As a
result, it is a great challenge to write a book that can provide sufficient coverage of the
important concepts in the field of audio analysis and, at the same time, be accessible to
readers who do not necessarily possess the required scientific background.
Our main goal has been to provide a standalone introduction, involving a balanced
presentation of theoretical descriptions and reproducible ...
6
Acknowledgments
Acknowledgments
This book has improved thanks to the support of a number of colleagues, students, and
friends, who have provided generous feedback and constructive comments, during the
writing process. Above all, T. Giannakopoulos would like to thank his wife, Maria, and
his daughter, Eleni, for always being cheerful and supportive. A. Pikrakis would like to
thank his family for their patience and generous support and dedicates this book to all the
teachers who have shaped his life.
7
List of Tables
List of Tables
Table 1.1 Difficulty Levels of the Exercises 7
Table 2.1 Execution Times for Different Loading Techniques 20
Table 2.2 Sound Recording Using the Data Acquisition Toolbox 20
Table 4.1 Class Descriptions for the Multi-Class Task of Movie Segments 69
Table 5.1 Classification Tasks and Files 131
Table 5.2 Row-Wise Normalized Confusion Matrix for the 8-Class Audio Segment Classification Task 144
Table 5.3 Row-Wise Normalized Confusion Matrix for the Speech vs Music Binary Classification Task 144
Table 5.4 Row-Wise Normalized Confusion Matrix for the 3-Class Musical Genre Classification Task 146
Table 5.5 Row-Wise Normalized Confusion Matrix for the Speech vs Non-Speech Classification Task 147
Table A.1 List of All Functions Included in the MATLAB Audio ...
8
List of figures
List of Figures
Figure A synthetic audio signal. 12
2.1
Figure A STEREO audio signal. 14
2.2
Figure Short-term processing of an audio signal. 26
2.3
Figure Plots of the magnitude of the spectrum of a signal consisting of three frequencies at 200, 500, and 1200 Hz. 38
3.1
Figure A synthetic signal consisting of three frequencies is corrupted by additive noise. 40
3.2
Figure The spectrogram of a speech signal. 41
3.3
Figure Spectrograms of a synthetic, frequency-modulated signal for three short-term frame lengths. 42
3.4
Figure Spectrum representations of (a) an analog signal, (b) a sampled version when the sampling frequency exceeds the Nyquist rate, and
3.5 (c) a sampled version with insufficient sampling frequency. In the last case, the shifted versions ...
9
1: Basic Concepts, Representations and Feature Extraction
Part 1: Basic Concepts, Representations and Feature Extraction
Outline
Introduction
Getting Familiar with Audio Signals
Signal Transforms and Filtering Essentials
Audio Features
10
1: Introduction
1
Introduction
Abstract
This chapter has an introductory purpose. A chapter outline is provided, along with
general notes on the book’s exercises and the companion software. Before we proceed, it
is important to note that, although in this book the term audio does not exclude the
speech signal, we are not focusing on traditional speech-related problems that have been
studied by the research community for decades, e.g., speech recognition and coding.
Keywords
Audio analysis
MATLAB
During recent years we have witnessed the increasing availability of audio content via
numerous distribution channels both for commercial and non-profit purposes. The
resulting wealth of data has inevitably highlighted the need for systems that are capable
of analyzing ...
11
2: Getting Familiar with Audio Signals
2
WAVE file
MP3
Nyquist
Sample rate
Short-term processing
The goal of this chapter is to provide the basic knowledge and techniques that are
essential to create, playback, load, and store the audio signals that will be analyzed.
Therefore, we focus on certain practical skills that you will need to develop in order to
prepare the audio signals on which the audio analysis methods of later chapters will be ...
12
3: Signal Transforms and Filtering Essentials
3
13
4: Audio Features
4
Audio Features
Abstract
This chapter focuses on presenting a wide range of audio features. Apart from the
theoretical background of these features and respective MATLAB code, their
discrimination ability is also demonstrated for particular audio types.
Keywords
Short-Term feature extraction
Mid-Term windowing
Energy
Zero crossing rate
Entropy of energy
Spectral centroid
Spectral spread
Spectral entropy
Spectral flux
Spectral rolloff
Mel-Frequency cepstrum coefficients
MFCCs
Chroma vector
14
Harmonic ratio
4: Audio Features
Fundamental frequency
Feature extraction is an important audio analysis stage. In general, feature extraction is
an essential processing step in pattern recognition and machine learning tasks. The goal is
to extract a set of features from ...
15
2: Audio Content Characterization
Part 2: Audio Content Characterization
Outline
Audio Classification
Audio Segmentation
Audio Alignment and Temporal Modelling
16
5: Audio Classification
5
Audio Classification
Abstract
This chapter describes the task of classifying unknown audio segments of “homogeneous
content” to a set of predefined audio classes. In particular, theoretical background is
provided regarding popular classification methods, including Support Vector Machines,
Decision Trees and the -Nearest-Neighbor method. The reader is also introduced to
generic performance measures and validation methods for the estimation of the
performance of a classifier. The chapter concludes with the presentation of performance
measurements for a set of typical audio classification tasks.
Keywords
Audio segments
Posterior probability ...
17
6: Audio Segmentation
6
Audio Segmentation
Abstract
This chapter focuses on a vital stage of audio analysis, the audio segmentation stage,
which focuses on splitting an uninterrupted audio signal into segments of homogeneous
content. The chapter describes two general categories of audio segmentation: those that
employ supervised knowledge and those that are unsupervised or semi-supervised. In this
presentation context, certain specific segmentation tasks are presented, e.g., silence
removal and speaker diarization.
Keywords
Audio segmentation
Fixed-window segmentation
Probability smoothing
Silence removal
18
7: Audio Alignment and Temporal Modeling
7
Sakoe-Chiba
Itakura
Smith-Waterman
Hidden Markov Model
HMM
Baum-Welch
Viterbi algorithm
Trellis diagramn
19
7: Audio Alignment and Temporal Modeling
Mixture of Gaussians
This chapter presents several methods that take into account the temporal evolution of
the audio phenomena. In other words, we are no longer interested in computing ...
20
3: Other Issues
Part 3: Other Issues
Outline
Music Information Retrieval
Appendix A: The Matlab Audio Analysis Library
Appendix B: Audio-related Libraries and Software
Appendix C: Audio Datasets
21
8: Music Information Retrieval
8
Content visualization
Dimensionality reduction
Self organizing maps
SOMs
Fisher linear discriminant
FLD
LDA
Principal component analysis
22
8: Music Information Retrieval
PCA
During recent years, the wide distribution of music content via numerous channels has
raised the need for the development of computational tools for the analysis,
summarization, classification, ...
23
Appendix A: The Matlab Audio Analysis Library
Appendix A
24
Appendix B: Audio-Related Libraries and Software
Appendix B
25
Appendix C: Audio Datasets
Appendix C
Audio Datasets
Abstract
This appendix provides a list of datasets which are available on the Web, that can be used
as training and evaluation data for several audio analysis tasks.
Keywords
Audio datasets
Benchmarking
Several datasets and benchmarks that focus on audio analysis tasks are available on the
Web. The diversity of the datasets is high with respect to: size, level of annotation, and
addressed audio analysis tasks. For example, there are datasets for general audio event
classification and segmentation; musical genre classification; speech emotion recognition;
speech vs music discrimination; speaker diarization; speaker identification, etc. In
addition, these datasets may or may not contain other non-audio media types ...
26
Bibliography
Bibliography
1. Proakis John G, Manolakis Dimitris K. Digital Signal Processing. fourth ed. Pearson
Education; 2009.
2. Kientzle Tim. A Programmer’s Guide to Sound with Cdrom. Addison-Wesley
Longman Publishing Co. Inc.; 1997
3. Brandenburg Karlheinz. Mp3 and aac explained. Audio Engineering Society
Conference: 17th International Conference: High-Quality Audio Coding. 1999.
4. Painter Ted, Spanias Andreas. Perceptual coding of digital audio. Proceedings of the
IEEE. 2000;88(4):451-515.
5. Theodoridis S, Koutroumbas K. Pattern Recognition. fourth ed. Academic Press,
Inc.; 2008
6. Frigo M, Johnson SG. Fftw: an adaptive software architecture for the fft. Proceedings
of the International Conference on Acoustics, Speech, and, Signal Processing.
1998:1381-1384. ...
27
Index
Index
A
abs ( ) MATLAB function, 35
Activation function, 120
Agglomerative algorithms, 175
Aliasing, 10, 43
ALSAaudio, 244
Analoginput function, 21
aplay command-line utility, 12
Application programming interface (API), 17
-ar <value> flag, 17
Au file format, 16
aubio, 245
Audio and speech, 242
Audio event detection, 4
audio RecorderOnline, 24
Audio spotting, 186
Audio tracking, 186
audiorecorder function, 22
audiorecorder object, 23
audioRecorderTimerCallback
MATLAB libraries, 241 ( ), 24
Audio-related
Python, 243 libraries and software, 241
AuditoryToolbox, Version 2, 242
auread ( ), 16
Autocorrelation function, 93–94
Automatic music transcription, 213
B
Bartlett window, 26
28