(Machine) Learning The Sound of Violins: F.Antonacci, M. Zanoni

(Machine) Learning the sound of Violins http://archive.signalprocessingsociety.org/index.php?mact=News,cntnt01...
Categories: AASP-TC-News
Date: Apr 20, 2015
Title: (Machine) Learning the sound of Violins
The sound of violins have always intrigued musicians, composers and…scientists.From the beginnings measurement methodologies of
the phenomena that concur to the production o sound have been refined, e.g. modern scanning vibrometers allow us to accurately
reconstruct in a few minutes the modal shapes on the whole instrument, while just a few years ago this operation could last several
days. Generally speaking, the breakthrough of digital computing represented a tremendous boost for several directions of research of
violins sound. In this contribution we investigate on the use of machine learning for analyzing the timbre of violin sounds.
(Machine) Learning the Sound of Violins

F.Antonacci, M. Zanoni
Image and Sound Processing Group, Politecnico di Milano
Luthery is considered in reason amongst the most prominent arts related to manufacturing of wood. For this reason, the traditional
violin craftsmanship in Cremona has been inscribed in 2012 on the Representative List of the Intangible Cultural Heritage of
Humanity.
The sound of violins have always intrigued musicians, composers and…scientists. A rich and comprehensive tutorial that includes the
most relevant results achieved until recent days about the violin sound can be found in [1]: all the phenomena concurrent in the
complex mechanism of sound formation are analysed.
From the beginnings measurement methodologies have been refined, e.g. modern scanning vibrometers allow us to accurately
reconstruct in a few minutes the modal shapes on the whole instrument, while just a few years ago this operation could last several
days. Generally speaking, the breakthrough of digital computing represented a tremendous boost for several directions of research of
violins sound. Among these, it is worth mentioning the use of machine learning techniques for the analysis of the timbre of violins
sound.
The timbre of violin has been studied for a long time. However, the characterization of the relationship existing between timbre and the
physical phenomena is still an open issue. Many approaches use Low Level Features directly extracted from the sound.
In [2] Kaminiarz and Łukasik investigated on the use of MPEG spectral and harmonic descriptors to distinguish timbral differences
among contemporary concert violin sounds, and to examine their predictive power for expert rankings of violin quality. In particular,
they focused on the MPEG-7 Audio Spectrum Basis, which consists in the basis functions that are used to project the spectrum of the
sound onto a lower dimensional sub-space. Results show that ASB is useful in identifying the violins that exhibit a distinct timbre. On
the other hand, neither a strong correlation exists between timbral quality and ASB, nor a prediction of the ranking can be achieved
through the ASB.
In [3] Łukasik makes use of the Long Term Cepstral Coefficients to identify individual violins.
More recently, the Image and Sound Processing Group (ISPG) in Politecnico di Milano has undertaken a new step in this research
area. In Cremona ISPG has opened a new laboratory in the renowned Museum of Violins, which hosts several among the most
valuable masterpieces from historical and modern Cremonese violin-making tradition.
1 of 7 16/07/2020 10:42
Low level features, such as ASB and Cepstral Coefficients are characterized by a low level of abstraction, i.e. it is difficult to find a
direct relationship with terms coming from natural language of violin-makers, such as warmness and brightness. In order to fill this
gap, ISPG on the Semantic Timbral Descriptors, which fall in the broader category of High-Level Features. Though semantic timbral
descriptors are subjective, there is a strong connection between sound description, sound perception and physics. The link is in our
brain, which is able to process stimuli from the auditory system to formulate a description of the sound. Understanding this link would
mean filling the semantic gap between low-level and high-level features. In the past few years, the community of Music Information
Retrieval has worked a great deal in this direction.
The first step that ISPG has undertaken in this direction was to gather and manage knowledge about the terminology used by violin
makers to describe the acoustics; timbral characteristics; building process; and materials used for the production of violins. This
knowledge has been achieved through a series of interviews with violin makers in the city of Cremona, which hosts more than 150
workshops of luthiers.
In order to represent and manage the gathered knowledge, IPSG has adopted an ontology-based approach [4]. Ontologies have become
popular in the community of semantic web technologies and represent the knowledge in a certain area in a machine-readable way. The
components of ontologies are classes, which represent categories or concepts; individuals, which are instances of a class; and
properties, which define the relations between two classes. Classes and properties can be organized in hierarchies with some
constraints over their use.
Fig. 1 shows the organization of the ontology that describes the instrument making.
FIG.1 TOP-LEVEL ONTOLOGY OF THE VIOLIN-MAKING PROCESS
Each class within the high-level ontology shown in Fig. 1 can be further expanded.
For the sake of clarity, Fig. 2 shows a further level of expansion of the class “Materials”.
2 of 7 16/07/2020 10:42
Fig. 2 Expansion of the class "Materials" within the ontology of the violin-making
Once the terminology has been organized through the ontology, the goal is to learn the set of Low Level Features whose combination
allows us to model the High-level ones. This stage of selection of features is commonly accomplished manually. In order to achieve
more accurate results, ISPG and researchers from the Queen Mary University of London adopted a Deep Belief Network (DBN) based
approach [5].
DBN’s aim at reproducing the way the human brain processes information in a hierarchical fashion to address decisional problems
decomposing them into simpler sub-problems. They are based on an inherently layered representation of the information, which
enables the inferral of features that describe the input data (learned features) at various levels of abstraction, in an unsupervised
fashion. Each layer of the DBN consists of a Restricted Boltzmann Machine (RBM).
Fig. 3 shows the structure of a generic DBN composed by three RBM’s.
Fig. 3 Structure of a three-layer Deep Belief Network
Machine learning regressions allow us to adopt a dimensional representation for the semantic descriptors, which expresses the degree
of intensity of each descriptor.
3 of 7 16/07/2020 10:42
Each semantic descriptor is modeled following a classic scheme of a training-based technique.

Figure 4 shows the workflow of the method. The low-level characterization of each recording is
provided through the extraction of the set of learned LLFs. Semantic descriptors are modeled
through a set of generative models (regressors) that are trained on the high dimensional learned
feature space computed on a training dataset of recordings.
Fig. 4 Training and testing of Deep Belief Networks for regression of High-Level features
Since it is not clear the relationship between LLF and HLF, in order to discover the most appropriate method, a set of regression
functions is used: linear regression (LR); ridge regression (RR); polynomial regression (PR); support vector regression (SVR);
gradient-based boosting regression (GBR), ADA boosting regression (ABR).
Six bipolar semantic descriptors coming from the ontology have been used. They are “Dark” vs. “Bright”; “Deep” vs. “Not Deep”;
“Hard” vs. “Soft”; “Not Full vs. Full”; “Sweet” vs. “Harsh”; “Not Warm” vs. “Warm”.
Fig. 5 shows the the performance of the proposed regression approach in terms of R2 ∈(-∞; 1] index, which is a standard metric that
measures the accuracy of the regression model, for each bipolar semantic descriptor and for each term.
4 of 7 16/07/2020 10:42
Fig.5 Accuracy of the regression models for six descriptors of the violin sound
For the sake of example, Fig. 6 and Fig. 7 show the High Level Feature description of a historical
and a modern violin, respectively, obtained through annotations by 4 violin makers and the
prediction through the described approach. It is possible to observe the good match between the
annotations and the predictions, especially for the modern instrument.
FIG.6 Timbral descriptions of a historical violin obtained through annotation (dashed red
5 of 7 16/07/2020 10:42
line) and through prediction (blue line)
FIG.6 TIMBRAL DESCRIPTIONs OF A MODERN VIOLIN OBTAINED THROUGH ANNOTATION

(DASHED RED LINE) AND THROUGH PREDICTION (BLUE LINE)
ISPG is currently working on developing methodologies based on machine learning that are not
limited to timbral analysis, but also deal with the problems of predicting some features of the timbre
and of the acoustic response from vibrometric measurements [6,7].
REFERENCES
[1] J. Woodhouse, “The acoustics of the violin: a review”, Rep. Prog. Phys. 77, 2014 , IOP
Publishing, pages 1-42
[2] A. Kaminiarz and E. Łukasik, “MPEG-7 Audio Spectrum Basis as a Signature of Violin Sound”,
in proc. Of 15th European Signal Processing Conference, EUSIPCO 2007, pages 1541-1545
[3] E. Łukasik, “Long Term Cepstral Coefficients for violin identification”, in proc, of 128th AES
Convention, 2010, pages 1-10
[4] M. Zanoni, F. Setragno, A. Sarti, “The Violin Ontology”, in proc. of 9th Conference on
Interdisciplinary Musicology, CIM’14
[5] M. Zanoni, F. Setragno, F. Antonacci, A. Sarti, G. Fazekas, M.
Sandler, “Training-based Semantic Descriptors modeling for violin quality sound characterization”,
in proc. of 138th AES Convention
[6] R. Corradi, A. Liberatore, S. Miccoli, “A Multidisciplinary Approach to the Characterization of

Bowed String Instruments: The Musical Acoustics Lab In Cremona”, in proc. of the 22nd
6 of 7 16/07/2020 10:42
International Congress on Sound and Vibrations, pages 1-8
[7] A. Canclini, L. Mucci, F. Antonacci, A. Sarti, S. Tubaro, “Estimation of the radiation pattern of a
violin during the performance using plenacoustic methods”, accepted for publication at 138th AES
Convention, 2015
7 of 7 16/07/2020 10:42

(Machine) Learning The Sound of Violins: F.Antonacci, M. Zanoni

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Machine) Learning The Sound of Violins: F.Antonacci, M. Zanoni

Uploaded by

Copyright:

Available Formats

(Machine) Learning the sound of Violins http://archive.signalprocessingsociety.org/index.php?mact=News,cntnt01...

(Machine) Learning the Sound of Violins

FIG.1 TOP-LEVEL ONTOLOGY OF THE VIOLIN-MAKING PROCESS

Fig. 3 shows the structure of a generic DBN composed by three RBM’s.

Fig. 3 Structure of a three-layer Deep Belief Network

Each semantic descriptor is modeled following a classic scheme of a training-based technique.

line) and through prediction (blue line)

FIG.6 TIMBRAL DESCRIPTIONs OF A MODERN VIOLIN OBTAINED THROUGH ANNOTATION

[5] M. Zanoni, F. Setragno, F. Antonacci, A. Sarti, G. Fazekas, M.

[6] R. Corradi, A. Liberatore, S. Miccoli, “A Multidisciplinary Approach to the Characterization of

International Congress on Sound and Vibrations, pages 1-8

You might also like