You are on page 1of 54

Advances in Speech and Language

Technologies for Iberian Languages


Third International Conference
IberSPEECH 2016 Lisbon Portugal
November 23 25 2016 Proceedings 1st
Edition Alberto Abad
Visit to download the full and correct content document:
https://textbookfull.com/product/advances-in-speech-and-language-technologies-for-i
berian-languages-third-international-conference-iberspeech-2016-lisbon-portugal-nov
ember-23-25-2016-proceedings-1st-edition-alberto-abad/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Advances in Speech and Language Technologies for


Iberian Languages Second International Conference
IberSPEECH 2014 Las Palmas de Gran Canaria Spain
November 19 21 2014 Proceedings 1st Edition Juan Luis
Navarro Mesa
https://textbookfull.com/product/advances-in-speech-and-language-
technologies-for-iberian-languages-second-international-
conference-iberspeech-2014-las-palmas-de-gran-canaria-spain-
november-19-21-2014-proceedings-1st-edition-juan-lui/

Innovative Technologies and Learning Third


International Conference ICITL 2020 Porto Portugal
November 23 25 2020 Proceedings Tien-Chi Huang

https://textbookfull.com/product/innovative-technologies-and-
learning-third-international-conference-icitl-2020-porto-
portugal-november-23-25-2020-proceedings-tien-chi-huang/

Future Data and Security Engineering Third


International Conference FDSE 2016 Can Tho City Vietnam
November 23 25 2016 Proceedings 1st Edition Tran Khanh
Dang
https://textbookfull.com/product/future-data-and-security-
engineering-third-international-conference-fdse-2016-can-tho-
city-vietnam-november-23-25-2016-proceedings-1st-edition-tran-
khanh-dang/

Future Network Systems and Security Second


International Conference FNSS 2016 Paris France
November 23 25 2016 Proceedings 1st Edition Robin Doss

https://textbookfull.com/product/future-network-systems-and-
security-second-international-conference-fnss-2016-paris-france-
november-23-25-2016-proceedings-1st-edition-robin-doss/
Computational Logistics 7th International Conference
ICCL 2016 Lisbon Portugal September 7 9 2016
Proceedings 1st Edition Ana Paias

https://textbookfull.com/product/computational-logistics-7th-
international-conference-iccl-2016-lisbon-portugal-
september-7-9-2016-proceedings-1st-edition-ana-paias/

Graphical Models for Security Third International


Workshop GraMSec 2016 Lisbon Portugal June 27 2016
Revised Selected Papers 1st Edition Barbara Kordy

https://textbookfull.com/product/graphical-models-for-security-
third-international-workshop-gramsec-2016-lisbon-portugal-
june-27-2016-revised-selected-papers-1st-edition-barbara-kordy/

Statistical Language and Speech Processing 4th


International Conference SLSP 2016 Pilsen Czech
Republic October 11 12 2016 Proceedings 1st Edition
Pavel Král
https://textbookfull.com/product/statistical-language-and-speech-
processing-4th-international-conference-slsp-2016-pilsen-czech-
republic-october-11-12-2016-proceedings-1st-edition-pavel-kral/

Advances in Applied Biotechnology Proceedings of the


3rd International Conference on Applied Biotechnology
ICAB2016 November 25 27 2016 Tianjin China 1st Edition
Hao Liu
https://textbookfull.com/product/advances-in-applied-
biotechnology-proceedings-of-the-3rd-international-conference-on-
applied-biotechnology-icab2016-november-25-27-2016-tianjin-
china-1st-edition-hao-liu/

Computational Processing of the Portuguese Language


12th International Conference PROPOR 2016 Tomar
Portugal July 13 15 2016 Proceedings 1st Edition João
Silva
https://textbookfull.com/product/computational-processing-of-the-
portuguese-language-12th-international-conference-
propor-2016-tomar-portugal-july-13-15-2016-proceedings-1st-
Alberto Abad · Alfonso Ortega
António Teixeira · Carmen García Mateo
Carlos D. Martínez Hinarejos · Fernando Perdigão
Fernando Batista · Nuno Mamede (Eds.)

Advances in Speech
LNAI 10077

and Language Technologies


for Iberian Languages
Third International Conference, IberSPEECH 2016
Lisbon, Portugal, November 23–25, 2016
Proceedings

123
Lecture Notes in Artificial Intelligence 10077

Subseries of Lecture Notes in Computer Science

LNAI Series Editors


Randy Goebel
University of Alberta, Edmonton, Canada
Yuzuru Tanaka
Hokkaido University, Sapporo, Japan
Wolfgang Wahlster
DFKI and Saarland University, Saarbrücken, Germany

LNAI Founding Series Editor


Joerg Siekmann
DFKI and Saarland University, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/1244
Alberto Abad Alfonso Ortega

António Teixeira Carmen García Mateo


Carlos D. Martínez Hinarejos Fernando Perdigão


Fernando Batista Nuno Mamede (Eds.)


Advances in Speech
and Language Technologies
for Iberian Languages
Third International Conference, IberSPEECH 2016
Lisbon, Portugal, November 23–25, 2016
Proceedings

123
Editors
Alberto Abad Carlos D. Martínez Hinarejos
INESC-ID/IST Universitat Politècnica de València
Universidade de Lisboa Valencia
Lisbon Spain
Portugal
Fernando Perdigão
Alfonso Ortega University of Coimbra
I3A/University of Zaragoza Coimbra
Zaragoza Portugal
Spain
Fernando Batista
António Teixeira INESC-ID/ISCTE-IUL
DETI/IEETA Lisbon
University of Aveiro Portugal
Aveiro
Portugal Nuno Mamede
INESC-ID/IST
Carmen García Mateo Universidade de Lisboa
AtlantTIC Research Center Lisbon
Universidad de Vigo Portugal
Vigo
Spain

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Artificial Intelligence
ISBN 978-3-319-49168-4 ISBN 978-3-319-49169-1 (eBook)
DOI 10.1007/978-3-319-49169-1

Library of Congress Control Number: 2016956496

LNCS Sublibrary: SL7 – Artificial Intelligence

© Springer International Publishing AG 2016


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

The Spanish Thematic Network on Speech Technology (RTTH) and the International
Speech Communication Association (ISCA) Special Interest Group on Iberian Lan-
guages (SIG-IL) are pleased to present a selection of papers from the IberSPEECH
2016 conference—IX Jornadas en Tecnologías del Habla and the V Iberian SLTech
Workshop—that was held in Lisbon, Portugal, during November 23–25 2016.
The IberSPEECH series of conferences have become one of the most relevant
scientific events for the community working in the field of speech and language pro-
cessing of Iberian languages. This is demonstrated by the increased interest and success
of previous editions, starting in Vigo 2010 with FALA and continuing in Madrid 2012
and Las Palmas de Gran Canaria 2014 with the new denomination: IberSPEECH. Prior
to these past joint editions of the Jornadas en Tecnologías del Habla and the Iberian
SLTech Workshop, separate events were held. The former ones have been organized
by the RTTH (http://www.rthabla.es) since 2000. This network was created in 1999
and currently includes over 200 researchers and 30 research groups in speech tech-
nology all over Spain. The latter (Iberian SLTech) have been organized by the SIG-IL
(http://www.il-sig.org/) of the ISCA since 2009. Overall, these events have attracted the
attention of many researchers over the years, mainly from Spain, Portugal, and from
other Iberian-speaking countries in Latin America, but also from several other research
groups from all around the world. These previous efforts have contributed to the
consolidation of a strong and active community addressing the problems of speech and
language automatic processing of Iberian languages.
The IberSPEECH 2016 conference –the third of its kind using this name– brought
together the IX Jornadas en Tecnologías del Habla and the V Iberian SLTech
Workshop events, and it was co-organized by INESC-ID Lisboa, the RTTH, and the
ISCA SIG-IL. Maintaining the identity of previous editions, this new event represented
not only a step forward for the support of researchers in Iberian languages, but also a
new challenge for the community, since it was the first edition to be held outside Spain.
In order to promote interaction and discussion among all the members of the com-
munity, the Organizing Committee planned a three-day event with a wide variety of
activities, such as technical paper presentations, keynote lectures, evaluation chal-
lenges, presentation of demos and research projects, and recent PhD thesis.
This Lecture Notes in Artificial Intelligence volume contains only a selection of the
full regular articles presented during the IberSPEECH 2016 conference. To ensure the
quality of all the contributions, each submitted article was reviewed by three members
of the Scientific Review Committee, who provided feedback to improve the final article
version, in addition to acceptance/rejection recommendations for the Technical Pro-
gram Committee. Authors of the accepted papers had time to address the comments
before submitting the camera-ready versions. We had 48 regular full papers submitted
and only 27 of them were selected for publication in this volume. This selection was
VI Preface

based on the scores and comments provided by our Scientific Review Committee,
which includes over 66 researchers from different institutions mainly from Spain,
Portugal, and Latin America, but also from France, Germany, Hungary, Italy, Norway,
Sweden, UK, and the USA, to whom we also would like to express our deepest
gratitude. The selected articles in this volume are organized into four different topics:
– Speech Production, Analysis, Coding and Synthesis
– Automatic Speech Recognition
– Paralinguistic Speaker Trait Characterization
– Speech and Language Technologies in Different Application Fields
The Organizing Committee of IberSPEECH is extremely proud of the contents of
this volume. We trust that we have achieved and delivered the scientific quality
standards that the researchers in the field of speech and language technologies for
Iberian languages value.
As mentioned previously, besides the excellent research articles included in this
volume, the conference included a wide variety of scientific activities that are worth
mentioning in more detail, particularly, the ALBAYZIN Technology Competitive
Evaluations, the Special Session contributions, and the keynote lectures.
The ALBAYZIN Technology Competitive Evaluations have been organized
alongside the conference since 2006, promoting the fair and transparent comparison of
technology in different fields related to speech and language technology. In this edition
we had two different challenge evaluations: Speaker Diarization and Search on Speech.
The organization of each one of these evaluations requires the preparation of devel-
opment and test data, which is released along with a clear set of rules to the partici-
pants, and gathering and comparing results from participants. This organization was
carried out by different groups of researchers with the support of the ALBAYZIN
Committee. Although article contributions and results from the evaluations are not
included in this volume, we would like to express our gratitude to the organizers and
also to the participants of these evaluation challenges.
In addition to full articles, a subset of which are represented here, and to system
description papers of the ALBAYZIN participants, Special Session papers were also
included in the conference program. These were intended to describe either progress in
current or recent research and development projects, demonstration systems, or PhD
Thesis extended abstracts to compete in the PhD Award.
Moreover, we had the pleasure of hosting three extraordinary keynote speakers:
Prof. Elmar Nöth (University of Erlangen-Nuremberg, Germany), Dr. Bhuvana
Ramabhadran (IBM T.J. Watson Research Center, USA) and Prof. Steve Renals
(University of Edinburgh, UK). We would like to acknowledge their extremely valu-
able participation that helped to elevate the quality of the overall event.
Finally, we would like to thank all those whose effort made possible this conference,
including the members of the Organizing Committee, the Local Organizing Committee,
the ALBAYZIN Committee, the Scientific Reviewer Committee, the authors, the
conference attendees, the supporting institutions, and so many people who gave their
Preface VII

best to achieve a successful conference. We would also like to thank Springer for the
possibility of publishing another volume of selected papers from an IberSPEECH
conference.

November 2016 Alberto Abad


Alfonso Ortega
António Teixeira
Carmen García Mateo
Carlos D. Martínez Hinarejos
Fernando Perdigão
Fernando Batista
Nuno Mamede
Organization

General Chair
Alberto Abad INESC-ID/IST, University of Lisbon, Portugal
Alfonso Ortega Universidad de Zaragoza, Spain
António Teixeira Universidade de Aveiro, Portugal

Technical Program Chair


Carmen García Mateo Universidad de Vigo, Spain
Fernando Perdigão Universidade de Coimbra, Portugal
C.D. Martínez-Hinarejos Universitat Politècnica de València, Spain

Publication Chair
Nuno Mamede INESC-ID/IST, University of Lisbon, Portugal
Fernando Batista INESC-ID/ISCTE-IUL, Portugal

Special Session and Awards Chair


Juan Luis Navarro Mesa Universidad de Las Palmas de Gran Canaria, Spain
Doroteo Torre Toledano Universidad Autónoma de Madrid, Spain
Xavier Anguera ELSA Corp., USA

Plenary Talks Chair


Isabel Trancoso INESC-ID/IST, University of Lisbon, Portugal

Evaluations Chair
Luis J. Rodríguez Fuentes Universidad del País Vasco, Spain
Rubén San-Segundo Universidad Politécnica de Madrid, Spain

Publicity and Sponsorship Chair


Helena Moniz INESC-ID/FL, University of Lisbon, Portugal

Albayzin Committee
Luis J. Rodríguez Fuentes Universidad del País Vasco, Spain
Rubén San-Segundo Universidad Politécnica de Madrid, Spain
X Organization

Alberto Abad INESC-ID/IST, University of Lisbon, Portugal


Alfonso Ortega Universidad de Zaragoza, Spain
António Teixeira Universidade de Aveiro, Portugal

Local Organizing Committee


Alberto Abad INESC-ID/University of Lisbon, Portugal
Fernando Batista INESC-ID/ISCTE-IUL, Portugal
Nuno Mamede INESC-ID/IST, University of Lisbon, Portugal
David Martins de Matos INESC-ID/IST, University of Lisbon, Portugal
Helena Moniz INESC-ID/FL, University of Lisbon, Portugal
Rubén Solera-Ureña INESC-ID, Portugal
Isabel Trancoso INESC-ID/IST, University of Lisbon, Portugal

Scientific Review Committee


Alberto Abad INESC-ID/IST, University of Lisbon, Portugal
Plinio Barbosa University of Campinas, Brazil
Jorge Baptista INESC-ID/University of Algarve, Portugal
Fernando Batista INESC-ID/ISCTE-IUL, Portugal
José Miguel Benedí Ruiz Universitat Politècnica de València, Spain
Carmen Benitez Universidad de Granada, Spain
Antonio Bonafonte Universitat Politecnica de Catalunya, Spain
German Bordel University of the Basque Country UPV/EHU, Spain
Alessio Brutti FBK, Italy
Paula Carvalho INESC-ID/Universidade Europeia, Portugal
Diamantino Caseiro Google Inc., USA
Maria Jose Castro-Bleda Universitat Politecnica de Valencia, Spain
Ricardo Cordoba Grupo de Tecnologia del Habla, Spain
Conceicao Cunha IPS Munich, Germany
Carme De-La-Mota Universitat Autònoma de Barcelona, Spain
Laura Docío Fernández University of Vigo, Spain
Daniel Erro University of the Basque Country UPV/EHU, Spain
David Escudero University of Valladolid, Spain
Ruben Fernandez Universidad Politécnica de Madrid, Spain
Javier Ferreiros GTH, Universidad Politécnica de Madrid, Spain
Julian Fierrez Universidad Autonoma de Madrid, Spain
Ascension Gallardo Universidad Carlos III de Madrid, Spain
Carmen Garcia Mateo Universidade de Vigo, Spain
Juan I. Godino Llorente Universidad Politécnica de Madrid, Spain
Javier Hernando Universitat Politecnica de Catalunya, Spain
Lluís Felip Hurtado Oliver Universitat Politecnica de Valencia, Spain
Irina Illina LORIA, France
Eduardo Lleida University of Zaragoza, Spain
José David Lopes KTH, Sweden
Paula López Otero Universidade de Vigo, Spain
Organization XI

Ranniery Maia Toshiba Research Europe Ltd., UK


C.D. Martínez Hinarejos Universitat Politècnica de València, Spain
Helena Moniz INESC-ID/FL, University of Lisbon, Portugal
Nicolas Morales Mombiela Nuance Communications GmbH, Germany
Asuncion Moreno Universitat Politècnica de Catalunya, Spain
Antonio Moreno-Sandoval Universidad Autonoma Madrid, Spain
Climent Nadeu Universitat Politècnica de Catalunya, Spain
Juan L. Navarro-Mesa Universidad de Las Palmas de Gran Canaria, Spain
Eva Navas Cordón University of the Basque Country, Spain
Géza Németh University of Technology and Economics, Hungary
Nelson Neto Universidade Federal do Pará, Brazil
Alfonso Ortega University of Zaragoza, Spain
Thomas Pellegrini Université de Toulouse; IRIT, France
Carmen Peláez-Moreno University Carlos III Madrid, Spain
Mikel Penagarikano University of the Basque Country, Spain
Fernando Perdigao IT/Universidade de Coimbra, Portugal
Ferran Pla DSIC, Universitat Politècnica de València, Spain
José L. Pérez-Córdoba University of Granada, Spain
Paulo Quaresma Universidade de Evora, Portugal
Andreia Rauber University of Tübingen, Germany
Luis J. Rodriguez-Fuentes University of the Basque Country UPV/EHU, Spain
Eduardo Rodriguez Banga University of Vigo, Spain
José A.R. Fonollosa Universitat Politècnica de Catalunya, Spain
Rubén San-Segundo Universidad Politécnica de Madrid, Spain
Emilio Sanchis Universidad Politecnica Valencia, Spain
Diana Santos University of Oslo, Norway
Encarna Segarra DSIC, Universidad Politécnica de Valencia, Spain
Alberto Simões ESEIG, Instituto Politécnico do Porto, Portugal
Rubén Solera-Ureña INESC-ID, Portugal
Joan Andreu Sanchez Universitat Politècnica de València, Spain
António Teixeira University of Aveiro, Portugal
Javier Tejedor GEINTRA/Universidad de Alcala, Spain
Doroteo Toledano Universidad Autónoma de Madrid, Spain
Pedro Torres-Carrasquillo MIT Lincoln Laboratory, USA
Isabel Trancoso INESC-ID/IST, University of Lisbon, Portugal
Amparo Varona University of the Basque Country, Spain
Aline Villavicencio Universidade Federal do Rio Grande do Sul, Brazil
XII Organization

Organizing Institutions

INESC-ID Lisboa
Spanish Thematic Network on Speech Technology (RTTH)
ISCA Special Interest Group on Iberian Languages (SIG-IL)

Support and Partner Institutions

Instituto Superior Técnico, University of Lisbon


Ministerio de Economía y Competitividad, Gobierno de España
FCT, Fundação para a Ciência e a Tecnologia
Cámara Municipal de Lisboa
TAP Portugal
Contents

Speech Production, Analysis, Coding and Synthesis

Study of the Effect of Reducing Training Data in Speech Synthesis


Adaptation Based on Frequency Warping. . . . . . . . . . . . . . . . . . . . . . . . . . 3
Agustin Alonso, Daniel Erro, Eva Navas, and Inma Hernaez

A Dynamic FEC for Improved Robustness of CELP-Based Codec . . . . . . . . 14


Nadir Benamirouche, Bachir Boudraa, Angel M. Gomez,
José L. Pérez-Córdoba, and Iván López-Espejo

Objective Comparison of Four GMM-Based Methods


for PMA-to-Speech Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Daniel Erro, Inma Hernaez, Luis Serrano, Ibon Saratxaga,
and Eva Navas

Adding Singing Capabilities to Unit Selection TTS


Through HNM-Based Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Marc Freixes, Joan Claudi Socoró, and Francesc Alías

A Novel Error Mitigation Scheme Based on Replacement Vectors


and FEC Codes for Speech Recovery in Loss-Prone Channels . . . . . . . . . . . 44
Domingo López-Oller, Angel M. Gomez, and José L. Pérez-Córdoba

Language-Independent Acoustic Cloning of HTS Voices:


An Objective Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Carmen Magariños, Daniel Erro, Paula Lopez-Otero,
and Eduardo R. Banga

Prosodic Break Prediction with RNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64


Santiago Pascual and Antonio Bonafonte

Surgery of Speech Synthesis Models to Overcome the Scarcity


of Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Arnaud Pierard, D. Erro, I. Hernaez, E. Navas, and Thierry Dutoit

Automatic Speech Recognition

An Analysis of Deep Neural Networks in Broad Phonetic Classes


for Noisy Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
F. de-la-Calle-Silos, A. Gallardo-Antolín, and C. Peláez-Moreno
XIV Contents

Automatic Speech Recognition with Deep Neural Networks


for Impaired Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Cristina España-Bonet and José A.R. Fonollosa

Detection of Publicity Mentions in Broadcast Radio: Preliminary Results . . . . 108


María Pilar Fernández-Gallego, Álvaro Mesa-Castellanos,
Alicia Lozano-Díez, and Doroteo T. Toledano

Deep Neural Network-Based Noise Estimation for Robust ASR


in Dual-Microphone Smartphones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Iván López-Espejo, Antonio M. Peinado, Angel M. Gomez,
and Juan M. Martín-Doñas

Better Phoneme Recognisers Lead to Better Phoneme Posteriorgrams


for Search on Speech? An Experimental Analysis . . . . . . . . . . . . . . . . . . . . 128
Paula Lopez-Otero, Laura Docio-Fernandez,
and Carmen Garcia-Mateo

Crowdsourced Video Subtitling with Adaptation Based


on User-Corrected Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
João Miranda, Ramón F. Astudillo, Ângela Costa, André Silva,
Hugo Silva, João Graça, and Bhiksha Raj

Paralinguistic Speaker Trait Characterization

Acoustic Analysis of Anomalous Use of Prosodic Features in a Corpus


of People with Intellectual Disability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Mario Corrales-Astorgano, David Escudero-Mancebo,
and César González-Ferreras

Detecting Psychological Distress in Adults Through Transcriptions


of Clinical Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Joana Correia, Isabel Trancoso, and Bhiksha Raj

Automatic Annotation of Disfluent Speech in Children’s Reading Tasks . . . . 172


Jorge Proença, Dirce Celorico, Carla Lopes, Sara Candeias,
and Fernando Perdigão

Automatic Detection of Hyperarticulated Speech . . . . . . . . . . . . . . . . . . . . . 182


Eugénio Ribeiro, Fernando Batista, Isabel Trancoso, Ricardo Ribeiro,
and David Martins de Matos

Acoustic-Prosodic Automatic Personality Trait Assessment for Adults


and Children. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Rubén Solera-Ureña, Helena Moniz, Fernando Batista,
Ramón F. Astudillo, Joana Campos, Ana Paiva,
and Isabel Trancoso
Contents XV

Speech and Language Technologies in Different Application Fields

Evaluating Different Non-native Pronunciation Scoring Metrics


with the Japanese Speakers of the SAMPLE Corpus . . . . . . . . . . . . . . . . . . 205
Vandria Álvarez Álvarez, David Escudero Mancebo,
César González Ferreras, and Valentín Cardeñoso Payo

Global Analysis of Entrainment in Dialogues . . . . . . . . . . . . . . . . . . . . . . . 215


Vera Cabarrão, Isabel Trancoso, Ana Isabel Mata, Helena Moniz,
and Fernando Batista

A Train-on-Target Strategy for Multilingual Spoken Language


Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Fernando García-Granada, Encarna Segarra, Carlos Millán,
Emilio Sanchis, and Lluís-F. Hurtado

Collaborator Effort Optimisation in Multimodal Crowdsourcing


for Transcribing Historical Manuscripts . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Emilio Granell and Carlos-D. Martínez-Hinarejos

Assessing User Expertise in Spoken Dialog System Interactions . . . . . . . . . . 245


Eugénio Ribeiro, Fernando Batista, Isabel Trancoso, José Lopes,
Ricardo Ribeiro, and David Martins de Matos

Automatic Speech Feature Learning for Continuous Prediction of Customer


Satisfaction in Contact Center Phone Calls . . . . . . . . . . . . . . . . . . . . . . . . . 255
Carlos Segura, Daniel Balcells, Martí Umbert, Javier Arias,
and Jordi Luque

Reversible Speech De-identification Using Parametric Transformations


and Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Aitor Valdivielso, Daniel Erro, and Inma Hernaez

Bottleneck Based Front-End for Diarization Systems . . . . . . . . . . . . . . . . . . 276


Ignacio Viñals, Jesús Villalba, Alfonso Ortega, Antonio Miguel,
and Eduardo Lleida

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287


Speech Production, Analysis, Coding
and Synthesis
Study of the Effect of Reducing Training Data
in Speech Synthesis Adaptation Based
on Frequency Warping

Agustin Alonso1(B) , Daniel Erro1,2 , Eva Navas1 , and Inma Hernaez1


1
AHOLAB, University of the Basque Country (UPV/EHU), Bilbao, Spain
{agustin,derro,eva,inma}@aholab.ehu.es
2
Basque Foundation for Science (IKERBASQUE), Bilbao, Spain

Abstract. Speaker adaptation techniques use a small amount of data


to modify Hidden Markov Model (HMM) based speech synthesis systems
to mimic a target voice. These techniques can be used to provide person-
alized systems to people who suffer some speech impairment and allow
them to communicate in a more natural way. Although the adaptation
techniques don’t require a big quantity of data, the recording process can
be tedious if the user has speaking problems. To improve the acceptance
of these systems an important factor is to be able to obtain accept-
able results with minimal amount of recordings. In this work we explore
the performance of an adaptation method based on Frequency Warping
which uses only vocalic segments according to the amount of available
training data.

Keywords: Speech adaptation · Statistical speech synthesis ·


Frequency warping · Dysarthric voice

1 Introduction
The voice is the principal way that humans have to communicate with each
other. Sadly, in some cases people lose totally or partially the ability to speak
due to an illness or an accident. In these cases speech technologies can provide a
partial solution, in particular Text-to-Speech (TTS) systems [1] can be used by
these people as an alternative communication tool. The rise of the use of smart-
phones together with the emergence of statistical parametric TTS [2], which
requires few resources, have made possible that anybody can bring his/her own
TTS in the pocket. However, conventional TTS systems produce a generic and
personality-less voice, so an important desired characteristic is the ability to pro-
vide personalized voices to the users. There are several initiatives to achieve this
goal using voice adaptation [3]: with some recordings of a human target voice it
is possible to change the acoustic properties of the synthetic voice.
Although conventional adaptation methods are valid when the user knows
that he or she will suffer some impairment (e.g. a planned surgery) and he or
she can make the necessary recordings, it is not an option if the user’s voice

c Springer International Publishing AG 2016
A. Abad et al. (Eds.): IberSPEECH 2016, LNAI 10077, pp. 3–13, 2016.
DOI: 10.1007/978-3-319-49169-1 1
4 A. Alonso et al.

already exhibits severe symptoms of a given pathology. In this case two differ-
ent approaches exist. One of them consists in using the recordings made by a
“donor”. In other words, the user will obtain a TTS system with the acoustic
properties of another person [4]. This solution is the most reasonable when the
user has completely lost the ability of speaking. The other approach applies
when the user is able to produce partially intelligible speech with only some
sounds are correctly pronounced, i.e. dysarthric voice. In this case it is possible
to extract characteristics from the healthy fragments and try to extrapolate this
information to adapt the whole voice [5,6].
When the second approach is followed two aspects must be evaluated. First,
the usable segments have to be selected, e.g. in dysarthric voices the vowels
are usually the best pronounced sounds. Then, enough usable data should be
collected. On the one hand, if there are few data the adapted voice will not
adequately mimic the target one. On the other hand, it is not practical to ask
too many recordings to people who have troubles speaking. If we want to improve
the system’s acceptance and make it more accessible, the process of obtaining
the adaptation information should be as friendly as possible. So, an important
factor to determinate is how many recordings are necessary to collect the data for
the adaptation process. In order to resolve this question in this paper we present
the analysis of the performance of the method presented in [7]. This method has
already shown to obtain good results in terms of quality and similarity to the
target speaker when it works with sufficient data using only vocalic segments.
We gradually reduce the data used to train the transforms and study how the
performance of the system is affected.
Although the final goal of our work is to provide personalized synthesizer to
people with dysarthric voice, in this first steps of the research we used healthy
voices. In this way we can obtain an indicator of the performance in the most
favourable conditions. Moreover, the comparison with state of the art adaptation
methods is only possible with healthy voices.
The rest of the paper is structured as follows. In Sect. 2 a brief description of
statistical parametric synthesis and some adaptation techniques are introduced.
Section 3 summarizes of our previous work and the method we used in the exper-
iments. In Sect. 4 the experiments carried out are presented. Finally in Sect. 5
the conclusions of this work are summarised.

2 Statistical Parametric Synthesis and Adaptation

In a HMM-based synthesizer, a set of HMMs are trained using a database con-


taining audio utterances from one (or more) speaker(s) and their transcription.
Text is converted into labels that describe it phonetic, prosodic and linguistically,
and audio is parameterised into acoustic features using a vocoder. State-of-the-
art vocoders [8,9] consider three different acoustic streams: logarithm of fun-
damental frequency (log-f0 ), Mel-Cepstral (MCEP) or linear prediction related
representation of the spectral envelope, and a degree of harmonicity of different
spectral bands. The context-dependent HMMs learn the correspondence between
Study of the Effect of Reducing Training Data in FW 5

the labels and the acoustic parameters, together with their first and second deriv-
atives over time. Since the transition probabilities between states don’t model
correctly the actual durations, a variant of HMM called Hidden Semi-Markov
Models (HSMM) [10] is used. In this model the state durations are characterized
by explicit Gaussian distributions.
In the synthesis phase any arbitrary text is translated into labels in the same
way as the transcriptions from the training dataset. Given these labels and the
model, the system generates the most likely sequence of acoustic parameters [11]
and feeds the vocoder, which builds the synthetic waveform.
The main idea behind the adaptation consists in changing the mean vectors
and covariance matrices from the set of HSMM using a new small database from
another speaker to obtain a new voice which sounds similar to the target. If the
database used to train the initial model contains utterances from more than one
speaker the result is called average model [12]. This average model is suitable for
adaptation because it does not convey peculiarities from any particular speaker,
so it is easier to adapt.
Usual adaptation techniques, such us Constrained Maximum Likelihood Lin-
ear Regression (CMLLR) [13] or Constrained Structural Maximum a Posteriori
Linear Regression (CSMAPLR) [14], uses unconstrained linear transforms, to
project the acoustic space of the source into the acoustic space of the target.
These kind of transforms are capable of capturing the source-target correspon-
dence, but they are not directly interpretable. There is another kind of trans-
form, namely Frequency Warping (FW) plus Amplitude Scaling (AS), which has
a direct physical interpretation [15] but is not as flexible as the free linear one.
FW maps the frequency axis from a source spectrum onto that of a target spec-
trum, without eliminating any spectral detail from the source. AS compensates
the amplitude differences between the warped-source spectrum and the target
spectrum. FW+AS transforms were originally used in the voice conversion field
[16–19], where they obtained satisfactory conversion scores without degrading
the quality of the speech significantly. Also it has been proved that, in cepstral
domain, FW is equivalent to a multiplicative matrix [20] and AS can be imple-
mented as an additive term. This means that FW+AS can be seen as a linear
transform in cepstral domain, so it can be applied to adapt the HSMM state
distributions as:
µ̂ = µ + b̂, Σ̂ = ÂΣ (1)
where µ and Σ are the mean vector and covariance matrix of the HSMM state,
µ̂ and Σ̂ are their transformed counterparts, and
⎡ ⎤ ⎡ ⎤
A 0 0 b
 = ⎣ 0 A 0 ⎦ , b̂ = ⎣ 0 ⎦ (2)
0 0 A 0

A representing the FW, b the AS, and  and b̂ are used to modify simultane-
ously the static and the dynamic features.
6 A. Alonso et al.

3 Description of the Adaptation Method


The FW+AS-based adaptation methods require a set of paired source-target
vectors as input. In our case, the vectors from the target speaker are taken
selecting the vowels from the uttered sentences where: (i) all the frames are
voiced, which avoids artifacts related to f0 misdetection; (ii) the duration is
greater than 55 ms, which ensures that there are enough samples for an accurate
spectral analysis and also that there is a sufficiently wide stable zone free of co-
articulation. From the vowels that fulfil both requisites we only take the central
frame. The corresponding source frames are selected from the original voice
model: first we generate labels from the text of the adaptation utterance; using
these labels, we select the p.d.f. of the central state of each vowels and extract
the static part of the mean vector which is used as source frame.
The method we proposed in [7] is an evolution of the method explained in
[17], which is based on Dynamic Frequency Warping (DFW) [21] procedure.
DFW calculates the FW function that should be applied to a set of (N + 1)-
point log-amplitude semi-spectra, {Xt }, to make them maximally close to the
paired counterparts, {Yt }. It is based on an accumulative cost function D(i, j)
that indicates the log-spectral distortion obtained when the ith bin from the
source is mapped onto the j th bin from the target following the optimal path
from (0, 0) to (i, j). The mathematical expression of D(i, j) can be describes as
follows ⎧ ⎫
⎨ D(i − 1, j) + d(i, j) ⎬
D(i, j) = min D(i − 1, j − 1) + w · d(i, j) (3)
⎩ ⎭
D(i, j − 1) + d(i, j)
where i, j = 0...N , w is an empirical control parameter to manage the penalty of
the horizontal and vertical paths, and d(i, j) is the distance between the source’s
ith bin and target’s j th bin. In our case d(i, j) is calculated simultaneously from
all the paired source-target vectors using the following equation:
T T
αt (Xt [i] − Yt [j])
2 2
d(i, j) = (Xt [i] − Yt [j]) + (4)
t=1 t=1

where T is the total amount of training paired vectors, Xt and Yt are the deriva-
tives over frequency of Xt and Yt respectively, and αt is an empirical factor that
stands for the relative weight of the derivatives of the spectra. The frequency
warping path P is defined as a sequence of points,

P = {(0, 0), (i1 , j1 ), (i2 , j2 ), . . . (N, N )} (5)

where the presence on the point (i, j) indicates that the ith bin from the source
should be mapped into the j th bin from the target. The points of P are computed
backward from (N, N ) to (0, 0) using the recursive formula expressed in (3).
Since the DFW works in the log-spectral domain, the adaptation data must
be converted from pth -order MCEP into (N + 1) point discrete log-amplitude
Study of the Effect of Reducing Training Data in FW 7

semispectra. Using a traditional MCEP definition this can be done multiplying


the MCEP vectors by a matrix S defined as follows

S[n, i] = cos(i · mel(πn/N )), 0 ≤ n ≤ N, 0 ≤ i ≤ p (6)

In a similar way, the MCEP representation of a discrete log-amplitude spectrum


of (N + 1) points can be calculated through the technique known as regularized
discrete cepstrum [22], i.e. multiplying the semispectra in vector form by
−1 T
C = ST S + λR S (7)

where S is given by (6), R is a regularization matrix that imposes smoothing


constraints to the cepstral envelope,
 
R = 8π 2 · diag 0, 12 , 22 , . . . p2 (8)

and λ is an empirical constant typically equal to 2 · 10−4 [22]. Since the 0th
cepstral coefficient, related to energy, and the 1st cepstral coefficient, mainly
related to glottal spectrum, are not relevant in terms of FW [17], they are set to
zero before the multiplication by S. With the training vectors translated from
MCEP domain into log-semispectra domain using S, the optimal warping path
P is obtained via DFW. The matrix which implements the frequency warping
operation can be defined as follows
mi,j
W[j, i] = N (9)
k=1 mi,k

where mi,j = 1 when (i, j) ∈ P and mi,j = 0 otherwise. Once W has been
determined the matrix A which implements the same operation in the MCEP
domain is calculated as:
A=C·W·S (10)
In our work we calculated a unique FW matrix A for the whole training dataset,
regardless of the vowel the vectors belong to. Using its block replicated version Ǎ
(2), this curve can be applied to all the distributions of the model without
altering their phonetic content.
The additive MCEP term that implements the AS operation is calculated as
the difference between the target and source-warped MCEP vectors:
T
1
b= (yt − A · xt ) (11)
T t=1

where xt and yt are the tth pair of source and target vectors respectively, and
A is given by 10. Unlike the FW matrix, we don’t assume a single bias vector
for all the distributions. We compute one b vector per vowel and then we apply
a different AS vector to each distribution according three cases:
– Distributions that correspond to only one vowel are modified by the specific
AS vector of that vowel.
8 A. Alonso et al.

– When a distribution is shared by more than one vowel, we use a weighted


average of the AS vectors of these vowels. The weights are the proportion of
times the current distribution belonged to each vowel when the original voice
model was trained.
– The remaining distributions are modified using the average of the AS vectors
of all vowels.
Not only the spectral information but also the fundamental frequency is a
very important acoustic feature that defines the identity of the speaker. For this
reason we also perform a single mean normalization as follows: First, we take the
log-f0 values at the central frame of the previously selected vowels. From these
values, we calculate the target average log-f0 . Then, similarly as in the spectral
case, a source average log-f0 is calculated from the static parts of the means
of the p.d.f. in the central state of the vowels. Finally the adaptation is carried
out applying a linear transform similar to 1 to all the distributions of the log-f0
model where A is set to 1 and b is equal to the difference between source and
target average log-f0 :
The experiments carried out in [7] showed that the method above explained
presents a good performance in terms of quality and similarity when it is com-
pared with a state-of-the-art adaptation algorithm. However in our first experi-
ments we used as training data all the available vocalic segments from an adap-
tation dataset of 100 utterances per speaker. The need of collecting this amount
of recordings can be troublesome and tedious, so be able to reduce the number of
required vocalic segments, and therefore the number of recordings, is an impor-
tant task. Our final goal is to be able to provide a personalized synthesizer to
people with disarthric voice, thus an important factor to ensure the acceptance
of this system is making it easy and friendly.

4 Experiments
4.1 Experimental Setup
To determine the minimal amount of adaptation data required by our method
for an acceptable performance, we used recordings made through a voice-banking
dedicated web-portal [23]. This is the most suitable kind of data for our experi-
ments because we plan to include adaptation functionalities in the web-page in
order to provide speaking aids to impaired people. We collected 100 phoneti-
cally balanced utterances from 6 non-professional Spanish speakers (3 male and
3 female). Because all of them were recorded using the speaker’s own equipment
over a variable number of sessions in a non-controlled environment the record-
ings exhibit medium/low quality. Therefore, the signals were normalized in order
to maximize dynamic range, and were passed through a Wiener filter to reduce
noise. Phonetic segmentation was carried out automatically using HTK [24].
Table 1 shows the amount of vowels available when the segments are selected
according to the method described in Sect. 3. As expected this is different for
each speaker.
Study of the Effect of Reducing Training Data in FW 9

Table 1. Number of available vowels per speaker.

/a/ /e/ /i/ /o/ /u/ Total


M1 256 154 111 153 66 740
M2 451 239 182 249 63 1184
M3 261 185 160 163 72 842
F1 317 201 169 186 74 1530
F2 453 364 244 334 135 1123
F3 283 139 206 147 68 843

In order to avoid a too long perceptual test we first limited the number of
different adaptations to compare. According to informal perceptual tests carried
out in the laboratory, we decided to use three adaptations with a varying amount
of training data in the test: the first one using all available vowels, the second
one using 10 % of samples of each vowel and finally the extreme case of using
only one sample per vowel. In the second and third cases the used samples where
randomly selected.
As source we used a standard HTS [2] speaker-dependent voice trained with
1962 utterances from a female speaker in Spanish. The use of an average voice
model was discarded because we do not have enough training data from different
speakers with the quality necessary to obtain a good average model. The audio
signals for the target and the source voices were sampled at 16 kHz and parame-
terised using Ahocoder [9], a high quality harmonic-plus-noise based vocoder. We
extracted 39th order MCEP coefficients, log-f0 and Maximum Voiced Frequency
(MVF) each 5 ms.
As baseline for comparison we used a state-of-the-art HTS-based speaker
adaptive method that uses CMLLR+MAP [3]. For this method we use all the
information from the utterances instead of only vowels. Although the baseline
method adapts not only the spectral envelope but all of the acoustic features
(log-f0 , MVF and duration), we restricted the comparison to the MCEP model
adaptation. In synthesis, we used for each adaptation method the corresponding
MCEP model. For the log-f0 model, we used the proposed average correction
explained in Sect. 3 in all the methods. The duration and MVF models used were
those of the source voice.

4.2 Perceptual Test and Results


For the evaluation, 10 short sentences were synthesised for each adapted speaker
and method, i.e. 240 sentences. A total of 11 evaluators took part in the evalu-
ation. Two sentences per speaker and method were randomly selected from the
whole evaluation set and presented to each evaluator, so each evaluator rated
48 sentences. For every synthetic sentence, a recording of the original voice was
also given as reference. The evaluators were asked to rate both the quality of the
adapted voice (regardless of the quality of the reference) and the similarity to
the original target, both using a 1–5 scale. The mean opinion scores are shown
in Figs. 1 and 2.
10 A. Alonso et al.

Fig. 1. Mean opinion scores for quality and 95 % confidence intervals.

Fig. 2. Mean opinion scores for similarity and 95 % confidence intervals.

As we can observe, overall the experiments confirm that the proposed method
is better in terms of quality, whereas it is not as good as the baseline method
(which exploits the whole signal content) in terms of similarity to the target.
In this regard, we have to admit that the gap has increased with respect to our
previous experiment [7]. Among the possible reasons for this, the most likely
is the replacement of M3 and F3 target voices in the experiments by other
speakers. The recordings used previously presented worse recording conditions
(noise, reverberation and other artifacts). We decided to use recordings from new
users of the voice bank and, therefore the scores are different. In any case, this
test shows more clearly the limitations of the suggested approach. We can also
observe difference in term of similarity score between male and female target
voices. We believe this is due to the specific characteristics of the source voice.
To overcome this limitation, an appropriate source voice could be selected from
a voice bank.
Study of the Effect of Reducing Training Data in FW 11

As for the behaviour of the method under training data reduction, overall
performance degradation is little when the amount of data is reduced by factor
10. However, for a single randomly-selected sample for each vowel the differences
are significant. There are two alternative ways of interpreting this. First, it could
be an indicator that a single vowel is not enough to train the system. Second,
the problem could be an “unfortunate” random selection, i.e. the selection of a
strongly co-articulated vowel. To clarify this point, we trained the system using
sustained vowels recorded in the laboratory from several speakers instead of
fragments of continuous speech. We found that for some of these voices the per-
formance of the method dropped substantially. More specifically, we observed
that under these non-spontaneous articulation conditions some of the resulting
FW curves were too irregular and produced annoying artifacts during synthesis.
This leads us to the conclusion that FW-based adaptation must be reinforced
by means of a proper source speaker selection algorithm. Moreover, we are cur-
rently exploring alternative speaker interpolation based approach, assuming the
availability of multiple donors.

5 Conclusions and Future Work

In this paper we have studied the effect of reducing the amount of training data
in an adaptation method that uses only vocalic segments, which was already
proved to obtain good results when it has enough samples. We found that if
we carry the system to the limit case of having only one sample per vowel the
performance in terms of quality and similarity deteriorates. However the method
supports quite well the lack of training data and we obtain similar results when
we use all the available data and when we use only 10 % of them. With this
knowledge we are now capable of designing a more specific corpus to record
by people with speech impairments. For these people the request of recording
too many sentences may be annoying. If we can reduce the amount of necessary
recordings, these systems could be more easily accepted. The next step is building
this specific corpus and using the recordings made by people with dysarthric
voices for adaptation. Future works will also tackle the automatic detection
of useful vocalic segments from pathological voice recordings. Furthermore, we
will address the main limitation of this method, namely the need of correctly
articulated vowels. Preliminary experiments in this direction have shown that
for severe dysarthria it is convenient to repair the vocalic triangle of the recorded
utterances before adaptation to avoid unnatural warping trajectories.

Acknowledgments. This work has been partially supported by MINECO/FEDER,


UE (SpeechTech4All project, TEC2012-38939-C03-03 and RESTORE project,
TEC2015-67163-C2-1-R), and the Basque Government (ELKAROLA project, KK-
2015/00098).
12 A. Alonso et al.

References
1. Taylor, P.: Text-to-Speech Synthesis. Cambridge University Press, Cambridge
(2009)
2. Zen, H., Tokuda, K., Black, A.: Statistical parametric speech synthesis. Speech
Commun. 51(11), 1039–1064 (2009)
3. Yamagishi, J., Nose, T., Zen, H., Ling, Z.H., Toda, T., Tokuda, K., King, S.,
Renals, S.: Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE
Trans. Audio Speech Lang. Process. 17(6), 1208–1230 (2009)
4. Yamagishi, J., Veaux, C., King, S., Renals, S.: Speech synthesis technologies for
individuals with vocal disabilities: voice banking and reconstruction. Acoust. Sci.
Technol. 33(1), 1–5 (2012)
5. Creer, S., Cunningham, S., Green, P., Yamagishi, J.: Building personalised syn-
thetic voices for individuals with severe speech impairment. Comput. Speech Lang.
27(6), 1178–1193 (2013)
6. Lanchantin, P., Veaux, C., Gales, M.J.F., King, S., Yamagishi, J.: Reconstructing
voices within the multiple-average-voice-model framework. In: Proceedings of the
16th Annual Conference of the International Speech Communication Association
(Interspeech), Dresden, Germany, pp. 2232–2236 (2015)
7. Alonso, A., Erro, D., Navas, E., Hernaez, I.: Speaker adaptation using only vocalic
segments via frequency warping. In: Proceedings of the 16th Annual Conference
of the International Speech Communication Association (Interspeech), Dresden,
Germany, pp. 2764–2768 (2015)
8. Kawahara, H., Masuda-Katsusue, I., de Cheveigne, A.: Restructuring speech repre-
sentations using a pitch-adaptive time-frequency smoothing and an instantaneous-
frequency-based F0 extraction: possible role of a repetitive structure in sounds.
Speech Commun. 27, 187–207 (1999)
9. Erro, D., Sainz, I., Navas, E., Hernaez, I.: Harmonics plus noise model based
vocoder for statistical parametric speech synthesis. IEEE J. Sel. Top. Signal
Process. 8(2), 184–194 (2014)
10. Zen, H., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: A hidden semi-
Markov model-based speech synthesis system. IEICE Trans. Inf. Syst. E90-D(5),
825–834 (2007)
11. Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T.: Speech
parameter generation algorithms for HMM-based speech synthesis, vol. 30, pp.
1315–1318 (2000)
12. Yamagishi, J.: A training method of average voice model for HMM-based speech
synthesis using MLLR. IEICE Trans. Inf. Syst. 86(8), 1956–1963 (2003)
13. Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech
recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
14. Yamagishi, J., Kobayashi, T., Nakano, Y., Ogata, K., Isogai, J.: Analysis of
speaker adaptation algorthims for HMM-based speech synthesis and a constrained
SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process. 19(1),
66–83 (2009)
15. Erro, D., Alonso, A., Serrano, L., Navas, E., Hernaez, I.: Interpretable parametric
voice conversion functions based on Gaussian mixture models and constrained
transformations. Comput. Speech Lang. 30, 3–15 (2015)
16. Erro, D., Moreno, A., Bonafonte, A.: Voice conversion based on weighted frequency
warping. IEEE Trans. Audio Speech Lang. Process. 18(5), 922–931 (2010)
Study of the Effect of Reducing Training Data in FW 13

17. Zorila, T.C., Erro, D., Hernaez, I.: Improving the quality of standard GMM-based
voice conversion systems by considering physically motivated linear transforma-
tions. Commun. Comput. Inf. Sci. 328, 30–39 (2012)
18. Godoy, E., Rosec, O., Chonavel, T.: Voice conversion using dynamic frequency
warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Trans.
Audio Speech Lang. Process. 20(4), 1313–1323 (2012)
19. Erro, D., Navas, E., Hernaez, I.: Parametric voice conversion based on bilinear fre-
quency warping plus amplitude scaling. IEEE Trans. Audio Speech Lang. Process.
21(3), 556–566 (2013)
20. Pitz, M., Ney, H.: Vocal tract normalization equals linear transformation in cepstral
space. IEEE Trans. Speech Audio Process. 13, 930–944 (2005)
21. Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using PSOLA tech-
nique. Speech Commun. 11(2–3), 175–187 (1992)
22. Cappé, O., Laroche, J., Moulines, E.: Regularized estimation of cepstrum envelope
from discrete frequency points. In: IEEE ASSP Workshop on Applications of Signal
Processing to Audio and Acoustics, pp. 213–216 (1995)
23. Erro, D., Hernáez, I., Navas, E., Alonso, A., Arzelus, H., Jauk, I., Hy, N.Q., Mag-
ariños, C., Pérez-Ramón, R., Sulı́r, M., Tian, X., Wang, X., Ye, J.: ZureTTS: online
platform for obtaining personalized synthetic voices. In: Proceedings of eNTER-
FACE 2014 (2014)
24. Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P., et al.:
The HTK Book, version 3.4 (2006)
A Dynamic FEC for Improved Robustness
of CELP-Based Codec

Nadir Benamirouche1 , Bachir Boudraa2 , Angel M. Gomez3 ,


José L. Pérez-Córdoba3(B) , and Iván López-Espejo3
1
Laboratoire de Génie Electrique, Faculté de Technologie,
Université de Bejaia, 06000 Bejaia, Algeria
benam nadir@yahoo.fr
2
Faculty of Electronics and Computer Science,
University of S.T.H.B, Algiers, Algeria
b.boudraa@yahoo.fr
3
Department of Signal Theory, Networking and Communications,
University of Granada, 18071 Granada, Spain
{amgg,jlpc,iloes}@ugr.es

Abstract. The strong interframe dependency present in Code Excited


Linear Prediction (CELP) codecs renders the decoder very vulnerable
when the Adaptive Codebook (ACB) is desynchronized. Hence, errors
affect not only the concealed frame but also all the subsequent frames.
In this paper, we have developed a Forward Error Correction (FEC)-
based technique which relies on energy constraint to determine frame
onset which will be considered for sending the FEC information. The
extra information contains an optimized FEC pulse excitation which
models the contribution of the ACB to offer a resynchronization proce-
dure at the decoder. In fact, under the energy constraint the number of
Fixed Codebook (FCB) pulses can be reduced in order to be exploited
by the FEC intervention. In return, the error propagation is considerably
prevented with no overload of added-pulses. Furthermore, the proposed
method greatly improves the CELP-based codec robustness to packet
losses with no increase in coder storage capacity.

Keywords: Speech coding · VoIP · Forward error correction · Lossy


packet networks · Error propagation · ACB resynchronization

1 Introduction
Media streaming services such as Voice over Internet Protocol (VoIP) is an
emerging technology which has become a key driver in the evolution of voice
communications. Unfortunately, the quality of service (QoS) of VoIP does not
yet provide toll-quality voice equivalent to that offered by the traditional public
switched telephone network [1]. Indeed, another critical issue for media stream-
ing applications such as VoIP, is its vulnerability to end-to-end performance
[2,3]. Some packets may be delayed or lost due to network congestion. Hence,
the missing packets have to be regenerated at the decoder side using packet

c Springer International Publishing AG 2016
A. Abad et al. (Eds.): IberSPEECH 2016, LNAI 10077, pp. 14–23, 2016.
DOI: 10.1007/978-3-319-49169-1 2
A Dynamic FEC for Improved Robustness of CELP-Based Codec 15

loss concealment techniques. In Code Excited Linear Prediction (CELP)-based


codecs [4] a long-term predictor (LTP) is used to encode the excitation signal
through its past samples. Since such speech parameters are not efficiently esti-
mated by the concealment approach, it is reported that this mismatch on the
obtained excitation introduces error propagation through the properly received
frames [6]. A wide variety of error concealment techniques have been proposed
as solutions for the above problems in order to mitigate the effect of lost frames,
especially in the context of voice over packet networks [1–7]. These alternative
solutions are based on considering some side information sent as extra bit-rate.
In the same direction, recent approaches are focused on limiting the inter-frame
dependencies using forward error correction (FEC) [8], where additional infor-
mation are used to reinitialize the decoder in presence of packet loss. This redun-
dancy provides an alternative representation of the previous excitation samples
to prevent error propagation [7].
In this paper, we propose a decoder resynchronization method based on
dynamically adding FEC information to improve the robustness of CELP-based
codecs. As a solution to the error propagation problem, the proposed method
sends side information replacing the Adaptive Codebook (ACB) contribution
with a set of pulses as memoryless codebook. The developed FEC-based tech-
nique relies on energy ratio constraint to determine voiced subframes (frame
onset) which side information will be considered for. Through the proposed
method, a higher reduction in bit-rate is achieved when the extra information is
sent for only the first two subframes (SF 0 and SF 1 ) of a voiced frame, since the
pitch component of the subsequent subframes (SF 2 and SF 3 ) can be estimated
using the previous two subframes. The proposed method is a subframe-based
technique which uses the Least Square Error (LSE) criterion over the synthe-
sized speech domain to optimize the FEC pulses. Hence, at the decoder side
when the previous frame is erased, this set of pulses is added to the Fixed Code-
book (FCB) vector to form the total excitation. The synthesized speech signal
is finally obtained as the LP filter response to this excitation. Through this app-
roach the ACB resynchronization pulse search does not significantly increase the
overall complexity of the CELP-based codec.
This paper is organized as follows. In Sect. 2, we present the ACB resyn-
chronization approach using reduced-FCB pulse compensation and the applied
criterion for its optimization. Subsequently, in Sect. 3, we describe the exper-
imental framework applied to simulate lossy packet channels, the used speech
database and objective quality measure to assess the quality of the proposed
method. In Sect. 4, we discuss the effectiveness of the proposed method, where
the obtained results under the FEC method intervention are shown. Finally,
conclusions of this work are summarized in Sect. 5.

2 ACB Resynchronization Approach Using Reduced-FCB


Pulse Compensation
Under the CELP model, a segment of synthesized speech for each subframe
is obtained by filtering an error signal (1), by means of a short-term linear
Another random document with
no related content on Scribd:
some attention. He was brought up in the Church of England, and his earliest
recollection was of saying the Lord’s Prayer when his dinner was delayed.
[396] He planned at least one church and contributed to the erection of others,
gave freely to Bible Societies, and liberally to the support of the clergy. He
attended church with normal regularity, taking his prayer book to the
services and joining in the responses and prayers of the congregation. No
human being ever heard him utter a word of profanity. During the period of
his social ostracism by the intolerant partisans of Philadelphia, he passed
many evenings with Dr. Rush in conversation on religion.[397] ‘I am a
Christian,’ he once said, ‘in the only sense in which Jesus wished any one to
be—sincerely attached to his doctrines in preference to all others.’ On one
occasion when a man of distinction expressed his disbelief in the truths of
the Bible, he said, ‘Then, sir, you have studied it to little purpose.’[398]
While the New England pulpits were ringing with denunciations of this
‘infidel’ and old ladies, unable to detect the false witness of the partisan
clergy, were solemnly hiding their Bibles to prevent their confiscation by the
‘atheist’ in the President’s House, he was spending his nights in the
codification of the ‘Morals of Jesus,’ and through the remainder of his life he
was to read from this every night before retiring.[399] In his last days he
spent much time reading the Greek dramatists and the Bible, dwelling in
conversation on the superiority of the moral system of Christ over all others.
In his dying hour, after taking leave of his family, he was heard to murmur,
‘Lord, now lettest Thy servant depart in peace.’[400]
The reason for the myth created against him is not far to seek. Just as the
landed aristocracy of Virginia pursued him with increasing venom because
of his land reforms, the clergy hated him for forcing the separation of
Church and State. When he made the fight for this reform, it was a crime not
to baptize a child into the Episcopal Church; a crime to bring a Quaker into
the colony; and, according to the law, a heretic could be burned. If the latter
law was not observed, that compelling all to pay tithes regardless of their
religious affiliations and opinions was rigidly enforced. This outraged
Jefferson’s love of liberty. The Presbyterians, Baptists, and Methodists, who
were making inroads on the membership of the Established Church, were
prosecuted, and their ministers were declared disturbers of the peace and
thrown into jail like common felons. Patrick Henry and his followers fought
Jefferson’s plan for a disestablishment—but he won.[401] The ‘atheist’ law,
which was never forgiven by the ministers of Virginia and Connecticut, was
simple and brief:
No man shall be compelled to frequent or support any religious worship,
place or ministry whatsoever, nor shall be enforced, restrained, molested or
burdened in his mind or goods, nor shall otherwise suffer on account of his
religious opinions or belief; but all men shall be free to profess, and by
argument to maintain, their opinions in matters of religion, and the same
shall in no wise diminish, enlarge, or affect their civil capacities.
Here we have the secret of the animus of the clergy of the time—but there
were other reasons. In his ‘Notes on Virginia’ he did not please the orthodox,
and Dr. Mason, a fashionable political minister of New York City, exposed
him in the pulpit, holding him up to scorn as a ‘profane philosopher’ and an
‘infidel.’ Discussing the theory that the marine shells found on the high
mountains were proof of the universal deluge, Jefferson had rejected it.
‘Aha,’ cried Mason, ‘he derides the Mosaic account’; he ‘sneers at the
Scriptures’ and with ‘malignant sarcasm.’ When Jefferson, referring to the
tillers of the soil, wrote that they were ‘the chosen people of God if ever He
had a chosen people,’ and referred to Christ as ‘good if ever man was,’ the
minister charged him with ‘profane babbling.’[402]
His view of creation is set forth in a letter discussing a work by
Whitehurst. He believed that a Supreme Being created the earth and its
inhabitants; that if He created both, He could have created both at once, or
created the earth and waited ages for it to get form itself before He created
man; but he believed that it was created in a state of fluidity and not in its
present solid form. This was his infidelity. He probably did not believe that
Jonah was swallowed by the whale—and that was enough to damn him. But
if he was not a Christian, the pulpits are teeming with atheists to-day.[403]

VI

We have seen that Hamilton had no faith in the Constitution, but did
yeoman service for its ratification; we have the charge that Jefferson was
hostile to both; and the truth is that he was hostile to neither and favorable to
both. The evidence is overwhelming.
When the new form of government was under consideration, he proposed
‘to make the States one in everything connected with foreign nations, and
several as to everything purely domestic,’ and to separate the executive,
legislative, and judicial branches.[404] He was bitterly hostile to any plan
based on the monarchical idea, and advised its friends ‘to read the fable of
the frogs who solicited Jupiter for a King.’[405] When the Convention met,
he wrote Adams that it was ‘really an assembly of demigods,’ but regretted
that they began their deliberations ‘by so abominable a precedent as that of
tying up the tongues of the members.’[406] His first impressions of the
completed document were unfavorable. In a letter to Adams he complained
of the reëligibility of the President.[407] To another correspondent he
complained that the proposed system would merge the States into one
without protecting the people with a bill of rights.[408]
Writing to Madison, he went more into detail, balancing the good against
the bad. He liked the separation of the departments, endorsed the lodging of
the power of initiating money bills with the representatives of the people,
and was ‘captivated with the compromise of the opposite claims of the great
and little States’; but he insisted that a bill of rights ‘is what the people are
entitled to against every government on earth, general or particular, and what
no just government should refuse or rest in inference.’ Professing himself
‘no friend to a very energetic government’ as ‘always oppressive,’ he added
that should the people approve the Constitution in all its parts he should
‘concur in it cheerfully in hopes that they will amend it whenever they think
it works wrong.’[409]
Little more than a month later he had become an ardent friend of
ratification. ‘I wish with all my soul,’ he wrote, ‘that the nine first
Conventions may accept the Constitution, because this may secure to us the
good it contains, which I think great and important. But I equally wish that
the four latest Conventions, whichever they may be, may refuse to accede to
it till a declaration of rights is annexed.’[410]
When the Massachusetts Convention accepted with ‘perpetual
instructions to her Delegates to endeavor to secure reforms,’ he was
delighted,[411] and the same day he wrote another correspondent of his
pleasure at the progress made toward ratification. ‘Indeed I have presumed
that it would gain on the public mind as I confess it has on my own.’[412]
When South Carolina acted, he wrote E. Rutledge his congratulations. ‘Our
government wanted bracing,’ he said. ‘Still we must take care not to run
from one extreme to another; not to brace too high.’[413] When the requisite
nine States had ratified, he wrote Madison in a spirit of rejoicing. ‘It is a
good canvas on which some strokes only want retouching. What these are I
think are sufficiently manifested by the general voice from North to South
which calls for a bill of rights.’[414]
After the ratification, he wrote Madison in praise of ‘The Federalist,’
describing it as ‘the best commentary on government ever written,’ and
admitting that it had ‘rectified’ him on many points.[415] In the same vein he
wrote to Washington, expressing the hope that a bill of rights would be
speedily added.[416] In the spring of 1789 he wrote another that the
Constitution was ‘unquestionably the wisest ever yet presented to men.’[417]
And after the Bill of Rights had been added, he wrote to Lafayette that ‘the
opposition to the Constitution has almost totally disappeared’ and that ‘the
amendments proposed by Congress have brought over almost all’ of the
objectors.[418]
Years afterward, when he wrote his ‘Autobiography,’ he reviewed his
reactions on the document: ‘I received a copy early in November,’ he wrote,
‘and read and contemplated its provisions, with great satisfaction.... The
absence of express declarations, ensuring freedom of religious worship,
freedom of the press, freedom of the person under the uninterrupted
protection of the Habeas Corpus & trial by jury in civil as well as in criminal
cases excited my jealousy; and the reëligibility of the President for life I
quite disapproved. I expressed freely in letters to my friends, and most
particularly to Mr. Madison and General Washington, my approbations and
objections.’[419] His recollections were true to the facts as conclusively
shown in the correspondence to which reference has been made.
He was no more opposed to the Constitution and its ratification than he
was an atheist.

VII

This brings us to Jefferson the creator and leader of a party, and his
methods of management. Here he was without a peer in the mastery of men.
He intuitively knew men, and when bent upon it could usually bend them to
his will. He was a psychologist and could easily probe the minds and hearts
of those he met. In his understanding of mass psychology, he had no equal.
When a measure was passed or a policy adopted in Philadelphia, he knew
the reactions in the woods of Georgia without waiting for letters and papers.
This rare insight into the mass mind made him a brilliantly successful
propagandist. In every community he had his correspondents with whom he
communicated with reasonable regularity, doing more in this way to mould
and direct the policies of his party than could have been done in any other
way. Seldom has there lived a more tireless and voluminous letter-writer.
With all the powerful elements arrayed against him, he appreciated the
importance of the press as did few others. ‘I desired you in my last to send
me the newspapers, notwithstanding the expense,’ he wrote a friend from
Paris.[420] Believing that the people, in possession of the facts, would reach
reasonable conclusions, he considered newspapers a necessary engine of
democracy. ‘If left to me,’ he once wrote, ‘to decide whether we should have
a government without newspapers, or newspapers without a government, I
should not hesitate for a moment to prefer the latter.’[421] There is not a
scintilla of evidence to confute his stout contention that he never wrote for
the papers anonymously, but the evidence piles mountain high to prove that
he constantly inspired the tone of the party press.
In his personal contacts he was captivating—a master of diplomacy and
tact, born of his intuitive knowledge of men. Perhaps no better illustration of
his cleverness in analyzing men can be found than in his letter to Madison on
De Moustier, a newly appointed French Minister to the United States. ‘De
M. is remarkably communicative. With adroitness he may be pumped of
anything. His openness is from character, not affectation. An intimacy with
him may, on this account, be politically valuable.’[422]
In his leadership we find more of leading than of driving. He had a genius
for gently and imperceptibly insinuating his own views into the minds of
others and leaving them with the impression that they had conceived the
ideas and convinced Jefferson. To Madison this was a source of keen delight.
[423] Jefferson was the original ‘Easy Boss.’ His tact was proverbial. He
never sought to overshadow or overawe. Inferior men were not embarrassed
or depressed in his presence. He was amazingly thoughtful and considerate.
In a company he instinctively went to the assistance of the neglected. Thus at
a dinner party, a guest, long absent from the country, and unknown to the
diners, was left out of the conversation and ignored. In a momentary silence,
Jefferson turned to him. ‘To you, Mr. C., we are indebted for this benefit—’
he said, ‘no one deserves more the gratitude of his country.’ The other guests
were all attention. ‘Yes, sir, the upland rice which you sent from Algiers, and
which thus far succeeds, will, when generally adopted by the planters, prove
an inestimable blessing to our Southern States.’ After that the neglected
guest became the lion of the dinner.[424] Thoughtfulness in small things—
this entered not a little into Jefferson’s hold on his followers.
It was at the dinner table that he planned many of his battles. He did not
care for the stormy and contentious atmosphere of a caucus. He was not an
orator. In the Continental Congress he was disgusted by the ‘rage for
debate.’[425] Later he was to find his lot in the Cabinet intolerable because
he and Hamilton were constantly pitted against each other ‘like cocks in a
pit.’ He was not afraid of a fight, but the futility of angry controversy
repelled him. It was this which made him a delightful dinner host—all
controversial subjects that might offend were taboo. If his position were
warmly controverted, he changed the subject tactfully. It was never the
opposition that interested him, but the reason for it; and with rare subtlety he
would seek to obliterate the prejudice, if it were prejudice, or to remove the
misunderstanding if it were ignorance of facts. Thus he won many victories
through a seeming retreat.[426]
Unescapable quarrels and separations were minor tragedies to him. He
long sought to get along with Hamilton. He advised his daughters to be
tolerant of disagreeable people and acted on his own advice. Fiske has
explained him in a sentence: ‘He was in no wise lacking in moral courage,
but his sympathies were so broad and tender that he could not breathe freely
in an atmosphere of strife.’[427] Thus considerate of his foes, he never hurt
the sensibilities of his friends through offensive methods. He liked to gather
his lieutenants about him at the table and ‘talk it out’—each man free to give
his views. Here he ironed out differences, dominating by the superiority of
his intellect and fascinating personality while appearing singularly free from
domination.
In his power of self-control Jefferson had another advantage over his
leading political opponents. There was something uncanny in his capacity to
simulate ignorance of the hate that often encompassed him. To the most
virulent of his foes he was the pink of courtesy. He mastered others by
mastering himself. And because he was master of himself, he had another
advantage—he kept his judgment clear as to the capacity and character of his
opponents. One may search in vain through the letters of Hamilton for
expressions other than those of contemptuous belittlement of his political
foes. Jefferson never made that mistake. He conceded Hamilton’s ability and
admired it. Visitors at Monticello, manifesting surprise at finding busts by
Ceracchi of Hamilton and Jefferson, facing each other across the hall,
elicited the smiling comment—‘opposite in death as in life.’ There never
would have been a bust of Jefferson at ‘The Grange.’ Through the long years
of estrangement with Adams, Jefferson kept the way clear for the restoration
of their old relations. Writing Madison of Adams’s faults, he emphasized his
virtues and lovable qualities. When the bitter battles of their administrations
were in the past and a mutual friend wrote that the old man at Quincy had
said, ‘I always loved Jefferson and always shall,’ he said, ‘That is enough for
me,’ and set to work to revive the old friendship. Thus the time came when
in reply to Jefferson’s congratulations on the election of John Quincy Adams
in 1824, Adams wrote: ‘I call him our John because when you were at the
Cul de Sac at Paris, he appeared to me to be almost as much your boy as
mine.’[428] This capacity for keeping his judgment clear of the benumbing
fumes of prejudice concerning the qualities of his enemies was one of the
strong points of his leadership.
This does not mean that in practical politics Jefferson was a ‘Miss Nancy’
or a ‘Sister Sue.’ This first consummate practical politician of the Republic
did not consider it practical to underestimate the foe, nor to dissipate his
energy and cloud his judgment by mere prejudices and hates. He was not an
idealist in his methods, and this has given his enemies a peg on which to
hang the charge that he was dishonest. He was an opportunist, to be sure; he
never refused the half loaf he could get because of the whole loaf he could
not have. He trimmed his sails at times to save his craft—and this was
wisdom. He compromised at the call of necessity. He was hard-headed and
looked clear-eyed at the realities about him. He was cunning, for without
cunning he could not have overcome a foe so powerfully entrenched. He was
as elusive as a shadow, and this has been called cowardice—but it was
difficult to trap him in consequence. His antipathy to the frontal attack has
often been referred to with contempt, but, leading a large but unorganized
army against one of tremendous power, he preferred the methods of
Washington in the field—which was to avoid the frontal attack with his
ragged Continentals against the trained and disciplined army. Because of
these conditions he was given to mining. When apparently quiescent, he was
probably sowing discord among his foes—his part concealed. This was
hateful to the Federalists—just as the tactics of Frederick were hateful to the
exasperated superior forces against him.
Jefferson was the most resourceful politician of his time. For every
problem he had a solution. He teemed with ideas. These were his shock
troops. If he seemed motionless, it was because by a nod or look he had put
his forces on the march. Like the wiser of the modern bosses, he knew the
virtue of silence. When in doubt, he said nothing. When certain of his
course, he said nothing—to his foes. It was impossible to smoke him out
when he preferred to stay in. In the midst of abuse he was serene. And he
was a stickler for party regularity.[429] He appreciated the possibilities of
organization and discipline. When money was needed for party purposes, his
friends would receive a note: ‘I have put you down for so much.’ When the
party paper languished, he circulated subscription lists among his neighbors,
and instructed his friends to imitate his example. He was never too big for
the small essential things, and he was a master of detail—very rarely true of
men of large views. His energy was dynamic and he was tireless. He never
rested on his arms or went into winter quarters. His fight was endless. The
real secret of his triumph, however, is found in the reason given by one of
his biographers: ‘He enjoyed a political vision penetrating deeper down into
the inevitable movement of popular government, and farther forward into the
future of free institutions than was possessed by any other man in public life
in his day.’

VIII

No American of his time had such versatility or such diversified interests.


He was asked to frame the Declaration of Independence because of his
reputation as a writer. Adams has told the story: ‘He brought with him a
reputation for literary science and a happy talent for composition. Writings
of his were handed about[430] remarkable for their peculiar felicity of
expression.’ It was the ‘Summary View’ which elicited the admiration of
Edmund Burke. A more ambitious effort, his ‘Notes on Virginia’ were
written during the fatal illness of his wife, and while he was confined to the
house two or three weeks by a riding accident.[431] It was a valuable
contribution to the natural, social, economic, and political history of the
State, with a number of eloquent passages and fascinating pages.
He had an artistic temperament, loved music, and at the beginning of his
career we find him busy planning his garden at Monticello, and practicing
three hours a day on his loved violin, under the instructions of an Italian
musician. His hospitality to the Hessian prisoners is partly explained by a
mutual love of music. Returning from an absence to find ‘Shadwell,’ his
early home, in ashes, he inquired anxiously about his books. ‘Oh, my young
master,’ exclaimed the distressed slave, ‘they were all burnt, but we saved
your fiddle.’[432]
Loving art in all its forms, he was fond of the company of artists. It was
he who arranged in Paris for Houdon to go to America to make the statue of
Washington.[433] He entertained Trumbull in the French capital,
accompanying him to Versailles to see the King’s art collection, and urged
him to remain in Paris and study.[434] He was delighted with architectural
beauty and lingered about the masterpieces. From Nesmes, he wrote
enthusiastically to a woman friend: ‘Here I am, Madame, gazing whole
hours at the Maison Quarree, like a lover at his mistress. This is the second
time I have been in love since I left Paris. The first was with a Diana at the
Chateau de Laye-Epinaye in Beaujolais, a delicious morsel of sculpture, by
M. A. Soldtz. This you will say was in rule, to fall in love with a female
beauty; but with a house. No, Madame, it is not without a precedent in my
own history. While in Paris I was violently smitten with the Hotel de
Salm.’[435] When the Capitol at Richmond was in contemplation, he urged
the construction of the most beautiful edifice possible as a model to be
emulated in other buildings; drew some plans himself; examined those of
Hallet, was captivated with those of Thornton, and urged their acceptance.
‘Simple, noble, beautiful,’ he wrote home.[436]
And yet, so many-sided was this man, that he was a utilitarian and
scientist as well as artist. In Europe he was thought a philosopher, and
Humboldt came to America to pass many hours under his roof. A perusal of
his letters discloses the intensity and range of his interests. He was entranced
with clocks, and we find him writing David Rittenhouse reminding him of ‘a
kind promise of making me an accurate clock,’[437] and later to Madison of
a watch he had made for himself and inquiring if his friend wished one.[438]
He summoned a Swiss clock-maker to Monticello who died on the mountain
and is buried in the enclosure with his patron. He put the noted Buffon to
rout in Paris on points in natural history.[439] Admiring the red men, he spent
years collecting their vocabularies.[440] When in Paris he heard that an
Arabic translation of Livy had been found in Sicily, and importuned the
chargés des affaires of Naples to make inquiries, and was much excited to
hear that such a translation had been found ‘and will restore to us seventeen
of the lost books.’[441] In the midst of the political diversions and social
distractions of Paris he found time to write at length on the ‘latest
discoveries in astrology.’[442] As early as the summer of 1785, when Pilatre
de Rozière made his fatal attempt to cross the English Channel in a balloon,
we find him eagerly discussing the possibilities of the aeronautical science.
[443] A newly invented lamp pleased him and he sent one to a friend from
Paris.[444] The use of steam in the operation of grist mills interested him and
he found time to witness the test.[445] Even the absorbing drama of the
French Revolution in its early stages did not lessen his interest in Paine’s
iron bridge, and he attended its exhibition,[446] and finding the inventor
hesitating between ‘the catenary and portions of a circle,’ he sent to Italy for
a scientific work by the Abbe Mascheroni.[447] Fascinated by inventions, he
was, himself, the inventor of a plough.

IX

Interested as he was in art and inventions, his heart was with the country
life and the farmer’s lot. He was never happier than when, in the early
morning, mounted on one of his beloved horses, he rode over his broad acres
at Monticello, observing with a perennial zest the budding of the trees in
spring, the unfolding of the flowers, the ripening of the harvest. Wherever he
was, throughout his life, he longed for the house he had made on the hill, the
broad fields, the family circle and the servitors and slaves. There he was lord
of the domain. If he employed Italian gardeners, they conformed to his ideas.
If he had a supervisor, it was he himself who determined what should be
planted and where—where the orchards should be, what trees should be set
and their location; and even the vines and shrubs, the nuts and seeds, the
roots and bulbs claimed his personal attention. Even his hogs were named,
and when one was to be killed, he designated it by name.[448] There, too, he
lived in an atmosphere of affection. There he had taken his bride, a woman
of exquisite beauty, grace, and loveliness; there his children had been born,
and there, all too soon, their mother died. He was passionately devoted to her
and there was no successor. To the daughters who were left he became both
a father and a mother, resulting in an intimacy seldom found between father
and daughters. In Paris he would not permit even his trusted servant to do
their shopping, reserving that duty for himself. Always patient, never harsh,
and ever sympathetic, he was the ideal parent.[449]
Though he did not remarry, he was fond of the society of women and they
of his. The few letters to women that have been preserved are masterpieces
of their kind, sprightly, playful, sometimes beautiful. His relations with the
women of the Adams family are shown in a note to John Adams’s married
daughter, written from Paris: ‘Mr. Jefferson has the honor to present his
compliments to Mrs. Smith and to send her the two pair of corsets she
desired. He wishes they may be suitable, as Mrs. Smith omitted to send her
measure. Times are altered since Mademoiselle de Samson had the honor of
knowing her; should they be too small, however, she will be so good as to
lay them by a while. There are ebbs as well as flows in this world. When the
Mountain refused to go to Mahomet, he went to the Mountain.’[450] In Paris
he formed a few cherished friendships with women, notably with Mrs.
Cosway, Italian wife of an English painter, a woman of charm, beauty, and
intellect, with whom he corresponded. One of his letters, the dialogue
between the Head and the Heart on her departure for England, is unique and
sparkling.[451] He appreciated the exquisite Mrs. Bingham whom he met in
Paris, and his chiding letters to her after her return to America must have
pleased that artificial lady immensely.[452] He was a friend of the Comtesse
De Tesse whose mind he admired,[453] and of Madame De Corney whose
beauty attracted him. ‘The Bois de Boulogne invited you earnestly to retire
to its umbrage from the heats of the sad season,’ he wrote her gallantly. ‘I
was through it to-day as I am every day. Every tree charged me with this
invitation.’[454]
Such was Thomas Jefferson who took upon himself the organization of
the forces of democracy, when its enemies were in the saddle, booted and
spurred, and with a well-disciplined and powerful army at their back. None
but an extraordinary character could have dared hope for victory, and he was
that, and more. Democrat and aristocrat, and sometimes autocrat;
philosopher and politician; sentimentalist and utilitarian; artist, naturalist,
and scientist; thinker, dreamer, and doer; inventor and scholar; writer and
statesman, he enthralled his followers and fascinated while infuriating his
foes.
CHAPTER VI

THE SOCIAL BACKGROUND

‘I F New York wanted any revenge for the removal,’ wrote Mrs. Adams to
her daughter soon after reaching Philadelphia, ‘the citizens might be
glutted if they could come here, where every article has been almost
doubled in price, and where it is not possible for Congress and the
appendages to be half as well accommodated for a long time.’[455]
Reconciliation for the removal was not complete several months later when
Oliver Wolcott wrote his father complaining that ‘the manners of the people
are more reserved than in New York.’[456] Even so he had ‘seen nothing to
tempt [him] to idolatry,’ after having seen ‘many of their principal men,’ and
he had no apprehensions of ‘self-humiliating sensations’ after a closer
acquaintance.[457] It was not with unrestrained enthusiasm that the officials
took up their residence in the greater city, with its population of more than
60,000. ‘The Philadelphians,’ according to the indignant comment of
Jeremiah Smith, ‘are from the highest to the lowest, from the parson in his
black gown to the fille de joie or girl of pleasure, a set of beggars. You
cannot turn around without paying a dollar.’[458]
To the visitor entering by coach on Front Street and rumbling up to the
City Tavern the prospect did not seem so black as to those who received
their first impressions from the water-front. These beheld ‘nothing ... but
confused heaps of wooden store houses, crowded upon each other’—and,
behind the wharves, Water Street, narrow, shut in by the old bank of the
river, dirty, filthy, stinking.[459] Could he have looked down upon the city
from some convenient hill, he would have found something to revive his
drooping spirits in the compactness of the town and the substantial character
of the houses. The principal streets of the period were Front, Second, Third,
and Fourth, and beyond Sixth there were scarcely any habitations. No one
thought of building on Arch or Chestnut Streets west of Tenth, where the
land was thickly dotted with frogponds.[460] Practically all of business and
fashion was to be found east of Fourth Street, and the visitor or official
sojourner could congratulate himself on the ease with which he could get
about from place to place. An English tourist, observing that with the
exception of Broad and High Streets the thoroughfares were not more than
fifty feet in width, found them suggestive of ‘many of the smaller streets of
London except that the foot pavement on either side is of brick instead of
stone.’[461]
If the filth of the odorous water-front, the narrowness of the streets, and
the frogponds on the outskirts, so audible in the night, were depressing, the
houses, attractive, and in many instances architecturally pretentious, hinted
of comfort and solidity if not of opulence. The fact that almost all were
constructed of brick was not lost upon the travelers.[462] In the more
congested districts these houses had a shop on the first floor.
The streets, with their red-brick foot pavements and rows of trees, making
them fragrant after summer rains, and drearily murmurous in the winter
winds, were paved with pebbles in the middle,[463] with a gutter made of
brick or wood, and lined with strong posts to protect the area of the
pedestrians.[464] The trees, mostly buttonwood, willow, and Lombardy
poplars, had been brought over from Europe some years before by William
Hamilton.[465] At frequent intervals town pumps offered refreshment to the
thirsty, or, in the night, an accommodating hanging-post for the inebriate
staggering home from one of the popular taverns.[466] Not without its charm
was a walk through the streets of Philadelphia in the days when Hamilton
and Jefferson were exchanging shots, with the poplars and willows to shut
off the sun, the pumps to minister to the comfort, and with most of the
houses offering to the view a garden filled with old-fashioned flowers—
lilacs, roses, pinks, and tulips, morning-glories and snowballs with gourd
vines climbing over the porches. In the case of the more imposing mansions
there were more elaborate gardens with rare flowers and shrubbery, but in
many of these wealth claimed its privilege and shut off the view from the
common folk who could only catch the fragrance.[467]
The visitor on public business bent found all the governmental centers
close together. If interested in the debates at Congress Hall, erected for the
purpose at Sixth and Chestnut Streets next to the State House, the smallest
child could direct him. If a person of no special importance, he could find his
way into the commodious gallery of the House, and, looking down upon the
chamber, a hundred by sixty feet, with its three semi-circular rows of seats
facing the Speaker’s rostrum—‘a kind of pulpit near the center’[468]—could
find Ames busy at his circular writing-desk, Madison on his feet or
Sedgwick in conference with a lobbyist. If fortunate he might be admitted to
the space on the floor beneath the gallery. But it was not so easy to penetrate
to the more sacred precincts of the Senate on the floor above where the self-
constituted guardians of the covenant and the rights of property held
themselves aloof from the gaze of the vulgar. Perhaps, if he really prized the
privilege, he might look down from some point of vantage on the State
House Garden where the statesmen were wont to take the air and compose
their thoughts.
Did he have business with Jefferson? It was only a little way to the three-
story brick residence at High and Eighth Streets which had been taken over
for the purposes of the State Department. With Hamilton? It was but a few
steps to the old Pemberton mansion near Chestnut and Third, with its well-
cultivated garden in the rear where the indefatigable human dynamo worked
far into the night.[469] With the President? It was but a short distance from
Jefferson’s office to the Morris house.
At the time Washington moved in, the Morris house was one of the most
distinguished in the city, a dignified and impressive brick mansion, with two
large lamps in front, and with ample gardens to proclaim it the abode of a
personage of consequence. It was under its roof that Washington had lived as
the guest of Morris while presiding over the Constitutional Convention. It
was not without difficulties and annoyances that the house was taken over.
The banker was lustily praised by his friends for his sacrifice in abandoning
his home, but it appears to have been a sacrifice similar to that of managing
the finances of the Revolution. One writer questioned whether ‘giving up a
house of moderate dimensions for 700 pounds a year can be deemed a great
sacrifice ... when ... the President was accommodated in this city [New York]
with a much more elegant house at 400 pounds per annum.’[470] Even
Washington, who was Morris’s intimate friend, was distressed at the
difficulty in persuading him to fix the rental, and wrote Lear that he could
not understand the Senator. He would be willing to pay as much as he paid in
New York, and even more if there was not clear extortion. The owner finally
fixed the rental at three thousand dollars a year.[471] Thus Washington
moved in, and there the Presidents lived until the capital was moved to
Washington.
There, if properly presented, the visitor might call to receive the rather
cold, stately bow of Washington or even drink a cup of tea with Mrs.
Washington. In the case of a levee, he was sure to be welcome. But if his
social status did not suffice to justify the crossing of the threshold, he might,
if he were patient, see the great man as he drove forth in his ornately
decorated coach; or, better still, see him emerge on foot with his secretaries,
Lear and Jackson, one on either side, with cocked hats on their heads, the
aides a little in the rear. If he had the temerity to follow at a respectable
distance, he would have been surprised, perhaps, to find that the President
did not converse with his secretaries while on his walks.[472]

II

It was not joy unconfined to be interned in any of the hotels or taverns of


Philadelphia at any time while it was the capital. In the journals of tourists
who sojourned there we encounter no enthusiastic encomiums, even for
O’Eller’s, which owes something of its glamour in perspective to the fact
that the Assembly dances were held in its ballroom. It was infinitely better,
at any rate, than the Sign of the Sorrel Horse on Second Street, which comes
down to us as a ‘bad one.’[473] The City Tavern, scene of numerous political
demonstrations, concededly one of the best, would have been better rid of
vermin that infested the beds.[474] The London Tavern, which had its days as
the ‘principal hotel,’ was ‘deficient in comfort’ even at its best,[475] and the
Indian Queen distinguished itself as the scene of a doleful robbery when
some of Ames’s colleagues lost their linen, and thirty thousand dollars in
securities, and he escaped only because his name on his trunk assured the
‘partial rogues’ that ‘nothing was to be got by taking it away.’[476] In 1794,
the Golden Lion or the Yellow Cat at Eighth and Filbert Streets was a
favorite because of its well-drawn beer and porter; and the visitor, pushing
through the smoke-laden air to drink malt liquor from a pewter mug, would,
likely as not, find Governor Mifflin or General Knox of the Cabinet enjoying
their mugs along with the mechanics and clerks.[477] But it was not
necessary to sleep in the beds of the Yellow Cat to quaff its liquors, and after
a brief experience with the taverns the tourist would be likely to follow the
example of Thomas Twining and seek more comfortable and sanitary
quarters in some of the numerous rooming-houses that catered particularly to
members of Congress. The choicest of these resented the idea that they were
other than the private houses of gentlemen accommodating political
personages—this particularly true in the case of Francis, the Frenchman, at
whose house on Fourth Street, Vice-President Adams had a room.[478] In
these private rooming and boarding-houses, in which the majority of the
celebrities lived, an abundant table, clean agreeable rooms, and the
congenial companionship of colleagues made an appeal. At Francis’s the
head of the table was reserved for Adams, and all the ceremonial forms were
scrupulously observed, although he frequently had his meals served in his
rooms. It was not until he had escaped from the Indian Queen and found
lodgings ‘at the house of Mrs. Sage’ that Ames began ‘to feel settled and at
home.’[479] This hiving had its comedies, sometimes its scandals, and
occasionally its romances, as on the day Senator Aaron Burr took James
Madison to call upon the winsome daughter of his landlady, and history was
made in the candlelit parlor of the boarding-house.
Quiet and home-like, at least, these boarding-houses of our early
statesmen, and if they had no bars, they were in close proximity to many that
were of good repute. The members of the Legislature sometimes were
known to discuss important measures at Geisse’s Tavern over the mugs,[480]
were wont, on adjournment, to linger at Mr. O’Eller’s for his incomparable
punch,[481] and to celebrate the ending of a session with an evening of
conviviality at ‘Mr. Burns tavern on Tenth Street.’[482] Gentlemen riding
along the banks of the Schuylkill could seldom resist the impulse to
dismount at the tavern of Metz—for these drinking-houses were kindly
placed among a people intolerant of puritanism.[483]
Going forth into the streets to mingle with the common people was a
revelation to the polished tourist from the old lands. Here they found nothing
of the humility of the lowly to which they were accustomed. The mechanics
and common laborers took the theory of equality seriously. One traveler
found ‘the lower sort of people’ lacking in good manners[484] and observed
that a well-dressed stranger, asking a polite question, was almost certain of
an impudent answer.[485] These were the men who were to man the societies
fashioned after those of the Parisian radicals, to rally passionately to the
support of the French Revolution, and to supply Jefferson with his shock
troops—and sometimes shocking troops—in his fight for the
democratization of the Republic.
These, too, in their desperate striving for equality were moved to
imitations of the spendthrift practices of the rich. Even the servants and the
negroes gave elaborate balls which Liancourt found ‘destitute of the
charming simplicity of the fêtes of our peasants.’[486] The women appeared
in dresses beyond their means; the laborer and his lady rode in coaches to the
dance, where an elaborate supper was served, with liquid refreshments.
Sundays found the public-houses of the environs packed with the men of the
factories and shop, borne thither, with their families, in chairs. There was
much drinking and spending with gambling on the fights arranged for their
delectation.[487] At Harrowgate Gardens, two miles out on the New York
road, and Gray’s Gardens on the Schuylkill, they flocked to drink tea or
liquor, to dance, promenade, or flirt, and on summer nights the young men of
all stations were lured to them by the promise of romance. Even the grave
and reverend statesmen could not, in all cases, resist the call. Gay and
wicked some must have thought the scene—with the painted women of the
town a bit brazen in their fishing for men. ‘We have Eves in plenty, of all
nations, tongues and colors,’ wrote Oliver Wolcott to his wife from Gray’s
Gardens where he had taken refuge from the yellow fever, ‘but do not be
jealous—I have not seen one yet whom I have thought pretty’—leaving her
to imagine the possibilities should one such appear.[488] And yet, pleasure-
loving as the population was, the nights were reasonably quiet. About the
time the city assumed the dignity of a capital, there was little to disturb the
tranquillity of the night after ten o’clock beyond the voice of the watchman,
or the footsteps of some night-hawk wending his way by the light of the
street-lamps ‘placed like those in London.’[489] But five years later, a visitor
who recalled that in 1794 it was unusual to meet any one at night, or to hear
any noises after eleven o’clock, found that the nocturnal annoyances
continued far later into the night.[490]
It was by day, however, that the city made its best impression. The
luxury-loving people, the wealth and extravagance of the social leaders
insisting upon London and Parisian styles, the commercial traditions of the
community gave to its shopping district an elegance found nowhere else in
America. The houses of the importers and wholesalers, some maintaining
their own ships, were found, for the most part, on Front and Water Streets.
When in the spring and autumn the ships came in, and the great boxes of
English dry-goods were stretched along the pavement of Front between Arch
and Walnut Streets to be opened, it was a thrilling event to the
Philadelphians. Fluttering about them were the retail merchants—for most of
these in the days of the city’s political preëminence were women—
exclaiming ecstatically over the contents. Soon the goods were transferred to
the shops, which even a Frenchman found ‘remarkable for their
neatness’[491]—due, no doubt, to the sex of the proprietors. What more
fascinating than to stand before the great show windows—something new—
at Mrs. Whiteside’s fancy dress-goods shop, with exquisite cloths and
dresses hung full length and festooned to best advantage after the manner of
Bond Street, London. Did it add anything to the appeal to know that the
proprietress had come from London? Alas, no doubt. Thither the ladies from
the mansions drove in their carriages to make their purchases, and thence,
perhaps, for something more, to the South Second Street store of the smiling
Mrs. Holland, and then on, perchance, to Mrs. Jane Taylor’s at the Sign of
the Golden Lamb.[492] And then, having ministered to the materialistic
yearnings of vanity, as like as not milady directs the coachman to stop at
Bell’s British Book Shop on Third Street, near Pearl, lest the lord and
master, in placing his order with his London agent, overlooked something
she would not miss.
An easy, patrician life for some of these Philadelphians, but not for all.
The workman receiving a dollar a day and board, and with the smallest
houses on the outskirts renting for three hundred dollars a year, found it far
from a frolic to make both ends meet. The middle-class employees of the
stores and industries, paying from eight to twelve dollars a week for board,
without wine, candles, or fire, could have found little to interest them in Mrs.
Whiteside’s show windows, for, while the clerks were courteous and the
merchant polite, the cost of her goods was far in excess of that on Bond
Street.[493] But it is not with these of the more humble order that we are
concerned just now. It is quite possible that the curious Jefferson, who had a
habit of prying into the living conditions of ‘people of no importance,’ may
have wondered how these lived, but the social environment of the majority
of the statesmen was far removed from the common people. It is with the
world of fashion that we are concerned.

III

No society in America could have been less in harmony with the spirit of
democracy, for nowhere was class consciousness and caste pride more
pronounced. ‘Those who constitute the fashionable world are at best a mere
oligarchy, composed of a few natives and as many foreigners,’ wrote Otis to
his wife.[494] ‘I might have believed myself in an English town,’ said
Viscount de Chateaubriand.[495] An Englishman noted that ‘amongst the
upper circles ... pride, haughtiness, and ostentation are conspicuous; and it
seems that nothing could make them happier than that an order of nobility
should be established, by which they might be exalted above their fellow
citizens, as much as they are in their own conceit.’[496] A French nobleman
could not escape the observation that ‘the English influence prevails in the
first circles and prevails with great intolerance.[497] And Otis, who liked the
tone himself, was much impressed with the discovery that ‘the women after
presentations to the court of George III or Louis XVI transplanted into
Philadelphia society the manners of the English aristocracy and the fashions
of Paris.’[498] During the days of the British occupation, the cream of society
had reveled with the British officers, and many of these had resumed their
places in the society of the republican capital without abandoning their
former views. This English tone was to be felt by Jefferson a little later when
his sympathy with the French Revolution was to enter into his policies. From
the beginning these pro-English aristocrats were to draw political lines in
social intercourse, and in time Otis was to record that ‘Democratic
gentlemen and their families, no matter how high their social qualifications,
were rigidly ostracised by the best society.’[499] Along with this went a
rather vulgar deification of the dollar, and, strangely enough, a lack of polite
hospitality to the stranger. ‘What is justly called society,’ wrote Liancourt
whose ideas had been fashioned at Versailles, ‘does not exist in this city. The
vanity of wealth is common enough.’ The picture he paints is not a pretty
one. It shows a flamboyant rich man flauntingly displaying ‘his splendid
furniture, his fine English glass, and exquisite china,’ to the stranger invited
to come to ‘one ceremonious dinner,’ and then dismissing him for another
who had not ‘seen the magnificence of the house, nor tasted the old
Madeira.’ This, we are told, was the routine for all who came from Europe
—‘philosophers, priests, literati, princes, dentists, wits and idiots.’ But alas,
‘the next day the lionized stranger is not known in the street except he be
wealthy.’[500] However much they may have fallen short in manners, they
yielded nothing to Versailles in dress. This ‘elegance of dress’ astonished
Chateaubriand, and Liancourt was amazed at ‘the profusion and luxury’ in
‘the dresses of their wives and daughters.’ At balls, ‘the variety and richness
of the dresses did not suffer in comparison with Europe.’ The brilliant note
was assiduously sought in costumes, and there was much copying of the
subjects of Gainsborough and Sir Joshua Reynolds. One foreigner noting the
‘immense expense on their toilet and head dress’ thought it ‘too affected to
be pleasing.’[501] But by common consent these grand dames and belles
were beautiful, with their sparkling eyes, graceful forms, and the brilliancy
of their complexions.
If this aristocracy was neglectful of the stranger who had no golden key
to its interest, it was not because of a dearth of entertaining. Here there was a
hectic activity—dinners, dances, breakfasts, teas, parties enough to satisfy
the most insatiate passion for such excitement. Throughout the season the
great houses were ablaze with light, and if, as Mrs. Adams complained, there
was much the same company in all, it was congenial company, and the
intimacy of the contact allowed a familiarity that sometimes verged on the
risqué. In less than a month after her arrival, Mrs. Adams was appalled at
‘the invitations to tea and cards in the European style,’[502] and was
complaining that she ‘should spend a very dissipated winter if [she] were to
accept one half the invitations, particularly to the touts or teas and cards with
even Saturday night ... not excepted.’[503] A little later Aaron Burr was
being swamped with ‘many attentions and civilities—many invitations to
dine, etc.’[504] If Burr declined, as he wrote his wife, the handsome young
Otis, who loved the company of women, was not so coy. ‘I have dined once
with Cuttling at Mrs. Grattan’s,’ he wrote home, ‘once at Yznardi’s
[Spaniard who spent much time in Philadelphia] in great stile; and yesterday
in the country with Jonathan Williams [nephew of Franklin]. I am engaged
for next Christmas with Mrs. Powell, but with nobody for the Christmas
after next.’[505]
At these functions—heavy drinking—flirting—risqué talk. Even a
German was shocked to find that at public dinners each person would often
consume six bottles of Madeira.[506] Only Burr was hard to satisfy. ‘I
despair of getting genuine Trent wine in this city,’ he wrote Theodosia.
‘There never was a bottle of real unadulterated Trent imported here for sale.
Mr. Jefferson, who had some for his own use, has left town.’[507] But if there
was no Trent, Madeira flowed in streams, beer and ale, punch and whiskey
and champagne could be had for the asking, and there was asking enough,
even at parties and dinners. Even Hamilton, who drank with moderation,

You might also like