/  4
 
LIBXTRACT: A LIGHTWEIGHT LIBRARY FOR AUDIO FEATUREEXTRACTION
 Jamie Bullock 
UCE Birmingham ConservatoireMusic Technology
ABSTRACT
The libxtract library consists of a collection of over fortyfunctions that can be used for the extraction of low levelaudio features. In this paper I will describe the develop-ment and usage of the library as well as the rationale forits design. Its use in the composition and performance of music involving live electronics also will be discussed. Anumber of use case scenarios will be presented, includingthe use of individual features and the use of the library tocreate a ’feature vector’, which may be used in conjunc-tion with a classification algorithm, to extract higher levelfeatures.
1. INTRODUCTION1.1. Audio features
Anaudiofeatureisanyqualitativelyorquantitativelymea-surable aspect of a sound. If we describe a sound as ’quiteloud’, we have used our ears to perform an audio fea-ture extraction process. Musical audio features includemelodicshape, rhythm, textureandtimbre. However, mostsoftware-based approaches tend to focus on numericallyquantifiable features. For example, the MPEG-7 standarddefinesseventeenlow-leveldescriptors(LLDs)thatcanbedivided into six categories: ’basic’, ’basic spectral’, ’sig-nal parameters’, ’temporal timbral’, ’spectral timbral’ and’spectral basis representations’[1]. The Cuidado projectextends the MPEG-7 standard by providing 72 audio fea-tures, whichitusesforcontent-basedindexingandretrieval[2].
1.2. libxtract
libxtract is a cross-platform, free and open-source soft-ware library that provides a set of feature extraction func-tionsforextractingLLDs. Theeventualaimistoprovideasuperset of the MPEG-7 and Cuidado audio features. Thelibrary is written in ANSI C and licensed under the GNUGPL so that it can easily be incorporated into any programthat supports linkage to shared libraries.
2. EXISTING TECHNOLOGY
It is beyond the scope of this paper to conduct an exhaus-tive study of existing technology and compare the resultswith libxtract. However, a brief survey of the library’scontext will be given.OneprojectrelatedtolibxtractistheAubiolibrary
1
byPaul Brossier. Aubio is designed for audio labelling, andincludes excellent pitch, beat and onset detectors[5]. libx-tract doesn’t currently include any onset detection, thismakesAubiocomplimentarytolibxtractwithminimaldu-plication of functionality.libxtract has much in common with the jAudio project[9], which seeks to provide a system that ’meets the needsof MIR researchers’ by providing ’a new framework forfeature extraction designed to eliminate the duplication of effortincalculatingfeaturesfromanaudiosignal’. jAudioprovides many useful functions such as the automatic res-olution of dependencies between features, an API whichmakes it easy to add new features, multidimensional fea-ture support, and XML file output. Its implementation inJava makes it cross-platform, and suitable for embeddingin other Java applications, such as the promising jMIRsoftware. However, libxtract was written out of a need toperform real-time feature extraction on live instrumentalsources, and jAudio is not designed for this task. libx-tract, being written in C, also has the advantage that it canbe incorporated into programs written in a variety of lan-guages, notjustJava. Examplesofthisaregiveninsection5.libxtract also provides functionality that is similar toaspects of the CLAM project
2
. According to Amatri-ain, CLAM is ’a framework that aims at offering exten-sible, generic and efficient design and implementation so-lutions for developing Audio and Music applications aswell as for doing more complex research related with thefield’[8]. As noted by McKay, ’the [CLAM] system wasnot intended for extracting features for classification prob-lems’, it is a large and complex piece of software withmany dependencies, making it a poor choice if only fea-ture extraction functionality is required.Other similar projects include Marsyas
3
and Maate
4
.The library components of these programs all include fea-tureextractionfunctionality, buttheyallprovideotherfunc-tions for tasks such as file and audio i/o, annotation or ses-sion handling. libxtract provides only feature extractionfunctions on the basis that any additional functionality can
1
http://aubio.piem.org
2
http://clam.iua.upf.edu
3
http://marsyas.sf.net
4
http://www.cmis.csiro.au/maaate
 
Figure 1
. A typical libxtract feature cascadebe provided by the calling application or another library.
3. LIBRARY DESIGN
The central idea behind libxtract is that the feature extrac-tion functions should be modularised so they can be com-bined arbitrarily. Central to this approach is the idea of acascaded extraction hierarchy. A simple example of this isshown in Figure 1. This approach serves a dual purpose:it avoids the duplication of ’subfeatures’, making compu-tation more efficient, and if the calling application allows,it enables a certain degree of experimentation. For exam-ple the user can easily create novel features by makingunconventional extraction hierarchies.libxtract seeks to provide a simple API for developers.This is achieved by using an array of function pointersas the primary means of calling extraction functions. Aconsequence of this is that all feature extraction functionshave the same prototype for their arguments. The array of function pointers can be indexed using an enumeration of descriptively-named constants. A typical libxtract call inthe DSP loop of an application will look like this:
xtract[XTRACT_FUNCTION_NAME](input_vector, blocksize, argv,output_vector);
Thisdesignmakeslibxtractparticularlysuitableforusein modular patching environments such as Pure Data andMax/MSP, because it alleviates the need for the programmakinguseoflibxtracttoprovidemappingsbetweensym-bolic ’xtractor’ names and callback function names.libxtractdividesfeaturesintoscalarfeatures, whichgivethe result as a single value, vector features, which give theresult as an array, and delta features, which have sometemporal element in their calculation process. To makethe process of incorporating the wide variety of features(with their slightly varying argument requirements) eas-ier, each extraction function has its own function descrip-tor. The purpose of the function descriptor is to provideuseful ’self documentation’ about the feature extractionfunction in question. This enables a calling application toeasilydeterminetheexpectedformatofthedatapointedto
Feature name Description
Mean Mean of a vectorKurtosis Kurtosis of a vectorSpectral Mean Mean of a spectrumSpectral Kurtosis Kurtosis of a spectrumSpectral Centroid Centroid of a spectrumIrregularity (2 types) Irregularity of a spectrumTristimulus (3 types) Tristimulus of a spectrumSmoothness Smoothness of a spectrumSpread Spread of a spectrumZero Crossing Rate Zero crossing rate of a vectorLoudness Loudness of a signalInharmonicity Inharmonicity of a spectrumF0 Fundamental frequency of a signalAutocorrelation Autocorrelation vector of a signalBark Coefficients Bark coefficients from a spectrumPeak Spectrum Spectral peaks from a spectrumMFCC MFCC from a spectrum
Table 1
. Some of the features provided by the libraryby
*data
and
*argv
, and what the ’donor’ functions forany sub-features (passed in as part of the argument vector)would be.The library has been written on the assumption that acontiguous block of data will be written to the input ar-ray of the feature extraction functions. Some functionsassume the data represents a block of time-domain audiodata, others use a special spectral data format, and othersmake no assumption about what the datarepresents. Someof the functions may therefore be suitable for analysingnon-audio data.CertainfeatureextractionfunctionsrequirethattheFFTof an audio block be taken. The inclusion of FFT process-ing is provided as a compile time option because it entailsa dependency on the FFTW library. Signal windowingand zero padding are provided as helper functions.
4. LIST OF FEATURES
It is beyond the scope of this paper to list all the featuresprovided by the library, but some of the most useful onesare listed in table 1. If the feature is ’of a spectrum’ itdenotes that the input data will follow the format of libx-tract’s spectral data types.
5. PROGRAMS USING THE LIBRARY
Despite the fact that libxtract is a relatively recent library,it has already been incorporated into a number of usefulprograms.
 
5.1. Vamp libxtract plugin
The Vamp analysis plugin API was devised by Chris Can-nam for the Sonic Visualiser
5
software. Sonic Visualiseris an application for visualising features extracted fromaudio files, it was developed at Queen Mary Universityof London, and has applications in musicology, signal-processing research and performance practice[4]. SonicVisualiser acts as a Vamp plugin host, with vamp pluginssupplying analysis data to it.libxtract is used to provide analysis functions for SonicVisualiser using the vamp-libxtract-plugin. The vamp-libxtract-plugin acts as a wrapper for the libxtract library,making nearly the entire set of libxtract features avail-able to any vamp host. This is done by providing onlythe minimum set of feature combinations, the implicationof this being that the facility to experiment with differentcascaded features is lost.
5.2. PD and Max/MSP externals
The libxtract library comes with a Pure Data (PD) exter-nal, which acts as a wrapper to the library’s API. This is anideal use case because it enables feature extraction func-tions to be cascaded arbitrarily, and for non-validated datato be passed in as arguments through the [xtract˜] object’sright inlet. The disadvantage of this approach is that itrequires the user to learn how the library works, and tounderstand in a limited way what each function does.AMax/MSPexternalisavailable, whichprovidesfunc-tionality that is analogous to the PD external.
5.3. SC3 ugen
There also exists a Supercollider libxtract wrapper by DanStowell. This is implemented as a number of SuperCol-lider UGens, which are object-oriented multiply instan-tiable DSP units
6
. The primary focus of Stowell’s libx-tract work is a libxtract-based MFCC UGen, but severalother libxtract-based Ugens are under development.
6. EXTRACTING HIGHER LEVEL FEATURES
The primary focus of the libxtract library is the extrac-tion of relatively low-level features. However, one themain reasons for its development was that it could serve asa basis for extracting more semantically meaningful fea-tures. These could include psychoacoustic features suchas roughness, sharpness and loudness[10], some of whichareincludedinthelibrary; instrumentclassificationoutputs[3];orarbitraryuser-defineddescriptorssuchas’Danceability’[7].It is possible to extract these ’high level’ features byusing libxtract to extract a feature vector, which can beconstructed using the results from a range of extractionfunctions. This vector can then be submitted to a map-ping that entails a further reduction in dimensionality of 
5
http://www.sonicvisualiser.org
6
http://mcld.co.uk/supercollider
the data. Possible algorithms for the dimension reduc-tion task include Neural Networks, k-NN and Multidi-mensional Gauss[11]. Figure 2 shows a dimension re-duction implementation in Pure Data using the [xtract˜]libxtract wrapper, and the [ann mlp] Fast Artificial NeuralNetwork 
7
(FANN) library wrapper by Davide Morelli
8
.An extended version of this system has recently beenused in one of the author’s own compositions for Pianoand Live electronics. For this particular piece, a PD patchwas created to detect whether a specific chord was beingplayed, and to add a ’resonance’ effect to the Piano ac-cordingly. For the detection aspect of the patch, a selec-tion of audio features, represented as floating point val-ues, are ’packed’ into a list using the PD [pack] object.This data is used to train the neural network (a multi-layerperceptron), by successively presenting it with input lists,followed by the corresponding expected output. Once thenetwork has been trained (giving the minimum possibleerror), it can operate in ’run’ mode, whereby it shouldgive appropriate output when presented with new data thatshows similarity to the training data. With a close-mic’dstudio recording in a dry acoustic, an average detectionaccuracy of 92% was achieved. This dropped to around70% in a concert environment. An exploration of theseresults is beyond the scope of this paper.Another possible use case for the library is as a sourcefor continuous mapping to an output feature space. Witha continuous mapping, the classifier gives as output, a lo-cation on a low-dimensional map rather than giving a dis-crete classification ’decision’. This has been implementedin one of the author’s recent works for Flute and live elec-tronics, whereby the flautist can control the classifier’soutput by modifying their instrumental timbre. Semanticdescriptors were used to tag specific map locations, andproximity to these locations was measured and used as asource of realtime control data. The system was used tomeasure the ’breathiness’ and ’shrillness’ of the flute tim-bre. Further work could involve the recognition of timbralgestures in this resultant data stream.
7. EFFICIENCY AND REALTIME USE
The library in its current form makes no guarantee abouthow long it will take for a function to execute. For anygiven blocksize, thereis no defined behaviour determiningwhat will happen if the function does not return in theduration of the block.Most of the extraction functions have an algorithmicefficiencyofO(n)orbetter, meaningthatcomputationtimeis usually proportional to the audio blocksize used. How-ever, because of the way in which the library has been de-signed (flexibility of feature combination has been givenpriority), certain features end up being computed com-paratively inefficiently. For example if only the Kurtosisfeature was required in the system shown in figure 1, thefunctions xtract mean(), xtract variance(),
7
http://leenissen.dk/fann/
8
http://www.davidemorelli.it

Share & Embed

More from this user

Add a Comment

Characters: ...