You are on page 1of 54

Human Language Technologies for

Under Resourced African Languages


Design Challenges and Prospects 1st
Edition Moses Effiong Ekpenyong
(Eds.)
Visit to download the full and correct content document:
https://textbookfull.com/product/human-language-technologies-for-under-resourced-af
rican-languages-design-challenges-and-prospects-1st-edition-moses-effiong-ekpenyo
ng-eds/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Using Comparable Corpora for Under Resourced Areas of


Machine Translation Inguna Skadi■a

https://textbookfull.com/product/using-comparable-corpora-for-
under-resourced-areas-of-machine-translation-inguna-skadina/

Human Language Technology Challenges for Computer


Science and Linguistics Zygmunt Vetulani

https://textbookfull.com/product/human-language-technology-
challenges-for-computer-science-and-linguistics-zygmunt-vetulani/

Biofuels Technology Challenges and Prospects 1st


Edition Avinash Kumar Agarwal

https://textbookfull.com/product/biofuels-technology-challenges-
and-prospects-1st-edition-avinash-kumar-agarwal/

Combustion for Power Generation and Transportation


Technology Challenges and Prospects 1st Edition Avinash
Kumar Agarwal

https://textbookfull.com/product/combustion-for-power-generation-
and-transportation-technology-challenges-and-prospects-1st-
edition-avinash-kumar-agarwal/
Agroforestry to Combat Global Challenges: Current
Prospects and Future Challenges 1st Edition Hanuman
Singh Jatav

https://textbookfull.com/product/agroforestry-to-combat-global-
challenges-current-prospects-and-future-challenges-1st-edition-
hanuman-singh-jatav/

Human Vaccines Emerging Technologies in Design and


Development 1st Edition Kayvon Modjarrad

https://textbookfull.com/product/human-vaccines-emerging-
technologies-in-design-and-development-1st-edition-kayvon-
modjarrad/

Challenges of Globalization and Prospects for an Inter


civilizational World Order Ino Rossi

https://textbookfull.com/product/challenges-of-globalization-and-
prospects-for-an-inter-civilizational-world-order-ino-rossi/

Green Public Procurement under WTO Law: Experience of


the EU and Prospects for Switzerland Rika Koch

https://textbookfull.com/product/green-public-procurement-under-
wto-law-experience-of-the-eu-and-prospects-for-switzerland-rika-
koch/

Biomaterials for Organ and Tissue Regeneration New


Technologies and Future Prospects 1st Edition Nihal
Vrana (Editor)

https://textbookfull.com/product/biomaterials-for-organ-and-
tissue-regeneration-new-technologies-and-future-prospects-1st-
edition-nihal-vrana-editor/
SPRINGER BRIEFS IN ELEC TRIC AL AND
COMPUTER ENGINEERING  SPEECH TECHNOLOGY

Moses Effiong Ekpenyong Editor

Human Language
Technologies for
Under-Resourced
African Languages
Design, Challenges,
and Prospects

123
SpringerBriefs in Electrical and Computer
Engineering

Speech Technology

Series editor
Amy Neustein, Fort Lee, NJ, USA
Editor’s Note

The authors of this series have been hand-selected. They comprise some of the most
outstanding scientists—drawn from academia and private industry—whose
research is marked by its novelty, applicability, and practicality in providing broad
based speech solutions. The SpringerBriefs in Speech Technology series provides
the latest findings in speech technology gleaned from comprehensive literature
reviews and empirical investigations that are performed in both laboratory and real
life settings. Some of the topics covered in this series include the presentation of
real life commercial deployment of spoken dialog systems, contemporary methods
of speech parameterization, developments in information security for automated
speech, forensic speaker recognition, use of sophisticated speech analytics in call
centers, and an exploration of new methods of soft computing for improving
human-computer interaction. Those in academia, the private sector, the self service
industry, law enforcement, and government intelligence, are among the principal
audience for this series, which is designed to serve as an important and essential
reference guide for speech developers, system designers, speech engineers, linguists
and others. In particular, a major audience of readers will consist of researchers and
technical experts in the automated call center industry where speech processing is a
key component to the functioning of customer care contact centers.

Amy Neustein, Ph.D., serves as Editor-in-Chief of the International Journal of


Speech Technology (Springer). She edited the recently published book “Advances
in Speech Recognition: Mobile Environments, Call Centers and Clinics” (Springer
2010), and serves as quest columnist on speech processing for Womensenews.
Dr. Neustein is Founder and CEO of Linguistic Technology Systems, a NJ-based
think tank for intelligent design of advanced natural language based
emotion-detection software to improve human response in monitoring recorded
conversations of terror suspects and helpline calls. Dr. Neustein’s work appears in
the peer review literature and in industry and mass media publications. Her
academic books, which cover a range of political, social and legal topics, have
been cited in the Chronicles of Higher Education, and have won her a pro
Humanitate Literary Award. She serves on the visiting faculty of the National
Judicial College and as a plenary speaker at conferences in artificial intelligence
and computing. Dr. Neustein is a member of MIR (machine intelligence research)
Labs, which does advanced work in computer technology to assist underdeveloped
countries in improving their ability to cope with famine, disease/illness, and
political and social affliction. She is a founding member of the New York City
Speech Processing Consortium, a newly formed group of NY-based companies,
publishing houses, and researchers dedicated to advancing speech technology
research and development.

More information about this series at http://www.springer.com/series/10043


Moses Effiong Ekpenyong
Editor

Human Language
Technologies for
Under-Resourced African
Languages
Design, Challenges, and Prospects
Editor
Moses Effiong Ekpenyong
Department of Computer Science
University of Uyo
Uyo, Akwa Ibom State, Nigeria

ISSN 2191-8112     ISSN 2191-8120 (electronic)


SpringerBriefs in Electrical and Computer Engineering
ISSN 2191-737X ISSN 2191-7388 (electronic)
SpringerBriefs in Speech Technology
ISBN 978-3-319-69958-5    ISBN 978-3-319-69960-8 (eBook)
https://doi.org/10.1007/978-3-319-69960-8

Library of Congress Control Number: 2017960943

© The Author(s) 2018


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.

Printed on acid-free paper

This Springer imprint is published by Springer Nature


The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword

Speech technology has become almost taken for granted in everyday life.
Fundamental techniques of automatic speech recognition (ASR), text to speech syn-
thesis (TTS), and speaker recognition and verification have become available as
components of commercial interactive search and consumer service agents, which
are implemented as smart loudspeakers and supported by complex databases pro-
cessed with machine learning techniques. Older application fields of public
announcements in travel hubs and dictation software and reading applications for
the visually impaired are becoming more widespread. Increasingly, speech is under-
stood not only as spoken language but as a multimodal complex of parallel synchro-
nised data streams of audio-visual information from spoken language itself together
with body movement: facial mimicry, manual gesture, and posture.
To a large extent, the world of speech technology applications is still focused on
the languages of regions with major research and development resources, such as
the European Union, North America, India, China, and Japan, but the proceedings
of international conference series such as the Language Resources and Evaluation
Conferences (LREC), InterSpeech, the IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), the Language and Technology Conference
(LTC), and Oriental COCOSDA have shown a rapid increase in papers on the less
resourced languages of Asia, Africa, and the Americas. Even so, these developments
are largely driven by technologies developed in the wealthier nations on the basis of
their standard, ‘commercially interesting’ languages.
A central part of the speech system development process is the creation of lan-
guage and speech resources for the languages concerned. This central part is to a
large extent also the most complex part. First, the process splits into many different
subprocesses: the pre-recording process of data resource design, the scenario-spe-
cific recording processes, and the post-recording procedures of annotation, analysis
with machine learning procedures, evaluation, archiving, and dissemination.
Second, for each of the subprocesses, the appropriate tool resources need to be
developed. A key point to remember is that data type and quality are essentially
functions of the tools used in the acquisition and development process, just as

v
vi Foreword

knowledge, in general, is a function of the empirical and formal methods of discov-


ery which are employed.
In the context of speech technologies for the languages of developing nations,
there are a number of goals which are currently being addressed in different parts of
the global speech technology community. One set of goals may be regarded as a
spin-off from the speech technology goals themselves: the use of speech technology
tool resources for harvesting data for language documentation, language mainte-
nance by local societies, and indeed in some cases the revival of moribund lan-
guages. The fulfilment of these goals meets with many obstacles: traditional
scepticism towards new technologies; the desire for prosperity and the conviction
that local minority languages do not provide a viable way forward in this direction;
problems with providing a reliable infrastructure for the archiving and dissemina-
tion of speech technology resources; and lack of financial support for personnel,
equipment, and institutional or commercial status for system developers.
The system development goals for the languages of developing nations also dif-
fer in complex ways from system development goals for the more highly resourced
languages: many languages for which systems have been developed belong to the
Indo-European family of languages which have spread through Europe, North and
South America, and South Asia, and which differ in many details but also share
many features of sound patterns, rhythms and melodies, and word and sentence
structure. There are many other types of language in other language families, of
which there are many. For the Sino-Tibetan languages, the models developed in the
extensive speech technology research being pursued in China are a strong founda-
tion, in the sense that the typology of morphologically isolating tone languages is
being dealt with by technologies for Mandarin and Cantonese. But the Niger-Congo
languages of Africa have so far only a few isolated speech technology develop-
ments, most prominently in North Africa for Arabic and in South Africa for applica-
tions to the official languages of the region. In East Africa, developments for
Swahili, a non-tonal language, are only partially generalisable to the tonal languages
of West and Central Africa. The most active centres for tonal Niger-Congo lan-
guages are located in Nigeria, partly in Ibadan, in the African Language Technology
Initiative, mainly centred on Yoruba, and in Uyo, mainly centred on Ibibio, and
conducted by the authors of this volume.
Overriding issues which need to be overcome in the development of speech tech-
nologies for the languages of developing countries can be listed in the CESAF cri-
teria. The resources and technologies need to be
• Comprehensive with respect to the application domain
• Effective in terms of human and computing resources
• State-of-the-art, not only intellectually, but also in terms of big data processing,
machine learning, and artificial intelligence, not simply the latest internet-depen-
dent software and hardware
• Affordable in the sense of being compatible with older computing facilities
• Fair with essential involvement of local communities, developers, universities
and other research facilities, and companies
Foreword vii

The authors of this volume have set themselves the goal of fulfilling the above
criteria in order to face the challenge of integrating local languages, in this case in
the region of South East Nigeria, into the digital community. The individual chap-
ters address issues in system development and its relation to the digital economy,
and to digital services for health and education. These commendable endeavours
are already showing fruit and will certainly develop into valuable contributions to
overcoming the digital divide. The contributions will provide effective data and
tool resource models, not only for Nigeria but also for areas where other Niger-
Congo languages are spoken. On this basis, integration of West African speech
technology development into the international speech technology community is
now well on its way.

Emeritus Professor of English and Dafydd Gibbon


General Linguistics at Bielefeld University
Preface

Human Language Technology (HLT) also known as Language Technology (LT) is


a growing interdisciplinary field that closely connects related sub-disciplines of
Linguistics, Psychology, Philosophy, Computer Science, Engineering, Mathematics
and Statistics. HLT is naturally induced by Artificial Intelligence (AI) – a term col-
loquially applied when a machine mimics cognitive functions that humans associate
with or other aspects of the human mind such as learning and problem solving.
A SWOT: Strengths-Weaknesses-Opportunities-Threats analysis of HLT indi-
cates that over the years, HLT research has experienced great wave of optimism,
with the provision of intelligent tools and methodologies for mining big data via the
World Wide WEB (WWW) as well as improving at remarkable space, algorithms
and applications dedicated to human speech and communication. Further, the quest
for comprehensive solution to problems reveals that only an interdisciplinary or
agent-based approach with agile methodology is acceptable. HLT also provides far
more employment opportunities than are available in traditional academic research,
because of its industrial applications. Yet, despite the huge progress in the field,
slow developments to extend HLT beyond its frontiers and potentials, subsist –
mainly due to failure of related disciplines, industries and stakeholders, to cohe-
sively unite or communicate effectively with each other for a common purpose,
thus, posing serious threats to funding justification.
The areas of HLT addressed in this book include speech synthesis, speaker rec-
ognition, knowledge representation and spoken language processing. Each chapter
addresses an area of HLT. Chapter 1 documents the development of an adaptive
synthesis front end for African tone language systems using hidden Markov model
(HMM) technique. The template-based front end, though heavily supervised is cur-
rently being refined to ensure seamless replicability to a multitude of languages as
well as code (re-use) flexibility. In spite of the numerous benefits of speech synthe-
sis, its application is yet to flourish the African domain. Hence, this chapter encour-
ages the development of speech resources to improve the technological status of
African tone languages.

ix
x Preface

Chapter 2 offers an in-depth assessment of speaker variability in speaker recogni-


tion systems. It exploits Machine Learning (ML) of relevant acoustic features under
degraded/sub-optimal conditions, to demonstrate its feasibility for an under-resourced
African tone language. Inspired by the success of voice recognition software such as
Siri on mobile platforms, companies are itching to place speech interfaces every-
where, and within the next couple of years, voice interfaces will be much more per-
vasive and powerful. Hence, this chapter provides a useful methodology to assess the
suitability of speech features for voice/speaker recognition systems.
Chapter 3 proposes an Ontology Driven application: ODapp, with dynamic
framework for spatial context analysis and efficient knowledge representation of
multilingual Speech Language Therapy (SLT). SLT is invaluable for the treatment
of speech and language disorders, but precise estimate of the number of persons liv-
ing with such conditions is difficult to obtain, not to talk of doing this across several
language domains. This study is therefore apt, as it represents a pioneering initiative
within the Nigerian domain. The present research is work in progress and is expected
to contribute in improving the poor healthcare services currently experienced in
Nigeria, as well as satisfy relevant Sustainable Development Goals (SDGs).
A Spoken-Computer Aided Language Learning (CALL) system is developed in
Chap. 4 to demonstrate the application of spoken language processing, a subfield of
Human Computer Interaction (HCI). HCI is becoming so faithful at creating inter-
active products that are easy, flexible and simple for all. However, owing to its mul-
tidisciplinary nature and the different value systems of interface users from various
backgrounds and experiences, it is highly challenging for designers to create appli-
cations which are usable and affordable to such a heterogeneous set of users. An
interactive framework driven by speech technology is exploited to enable language
learning, as well as protect, preserve and revitalize under-resourced languages.
Finally, this book is designed for research students and staff, language experts,
as well as those behind the curtains who would wish to explore the growing field
of HLTs.
Contents

1 Adaptive Template-Based Front End for Tone Language


Speech Synthesis��������������������������������������������������������������������������������������    1
Moses Effiong Ekpenyong
2 Intra-Speaker Variability Assessment for Speaker Recognition
in Degraded Conditions: A Case of African Tone Languages��������������   31
Moses Effiong Ekpenyong, Udoinyang G. Inyang, Mercy E. Edoho,
and Eno-Abasi E. Urua
3 Towards Ontology-Driven Application for Multilingual
Speech Language Therapy����������������������������������������������������������������������   85
Patience U. Usip and Moses Effiong Ekpenyong
4 Ibibio Spoken-CALL System������������������������������������������������������������������ 103
Moses Effiong Ekpenyong, EmemObong O. Udoh,
and Nseobong P. Uto

Index������������������������������������������������������������������������������������������������������������������ 127

xi
About the Author

Moses Effiong Ekpenyong, Ph.D. is a Senior Lecturer in the Department of


Computer Science and current Deputy Director of the Centre for Research and
Development (CERAD), University of Uyo. He has been involved in research proj-
ects and has published widely in his area of specialty: Speech and Wireless
Communications Technology, with over 100 research publications to his credit. He
is a beneficiary of several awards/scholarships and research funding from notable
organizations/institutions, such as FGN/Science and Technology Education Post-
Basic (STEP-B)-World Bank; Tertiary Education Trust Fund (TETFund), Nigeria;
and Outside Echo, UK. He is a reviewer to national and international journals, and
belongs to a number of professional bodies, including, Nigeria Computer Society
(NCS), Nigerian Mathematical Society (NMS), Institute of Electrical and Electronic
Engineers (IEEE), International Speech Communications Association (ISCA), and
West African Linguistics Society (WALS).

xiii
Chapter 1
Adaptive Template-Based Front End for Tone
Language Speech Synthesis

Moses Effiong Ekpenyong

1.1 Introduction

Text-to-speech (TTS) synthesis has transformed dramatically over the last couple of
decades, such that most current TTS systems relatively apply data-driven tech-
niques, instead of rule-based techniques – one of the earliest language-specific tech-
niques that suffered the challenges of synthesizing unrestricted domain texts. A TTS
system is basically composed of two parts: the front end or high-level synthesis
(Natural Language Processing (NLP) phase) and the back end or low level synthesis
(Digital Signal Processing (DSP) phase). The front end is responsible for gathering
input in various forms from the user, and processing it to conform to a specification
the back end can use. It is often regarded as the interface between the user and the
back end. The back end, also referred to as the waveform generator, is responsible
for the conversion of the linguistic representation into sounds. In some systems, this
part consists of the computation of the target prosody (pitch contour, phoneme dura-
tions), which is then imposed on the output speech. Figure 1.1 summarizes the func-
tional blocks of these two phases.
The application of Hidden Markov Models (HMMs) in speech synthesis is still
gaining prominence. In conventional statistical parametric speech synthesis, distinct
HMM for each context combination is typically employed to represent probability
densities of speech parameters of input texts, from which speech parameters are
generated to maximize the output probabilities, with a final reconstruction of the
speech waveform. Parametric speech synthesis techniques rely on full context
acoustic models generated by language front ends (Aylett et al. 2014), and respon-
sible for analyzing the linguistic and phonetic structure of the language.

M.E. Ekpenyong (*)


Department of Computer Science, University of Uyo, Uyo, Akwa Ibom State, Nigeria
e-mail: mosesekpenyong@gmail.com; mosesekpenyong@uniuyo.edu.ng

© The Author(s) 2018 1


M.E. Ekpenyong (ed.), Human Language Technologies for Under-Resourced
African Languages, SpringerBriefs in Electrical and Computer Engineering,
https://doi.org/10.1007/978-3-319-69960-8_1
2 M.E. Ekpenyong

Text Prosody
Text Text pre- and pronuncia- NLP phase
Annotations
processing tion prediction

Prosodic labels
Phone labels
Waveform DSP phase
synthesis

Speech

Fig. 1.1 Functional blocks of a TTS system

HMMs have been demonstrated to construct a speech synthesis system for


Ibibio – a Lower Cross tone language spoken in the southeast coastal region of
Nigeria, West Africa. Given the resource-limited situations of most African lan-
guages, an investigation of what could be achieved in the absence of these expensive
resources was carried out in (Ekpenyong et al. 2014). The HMM-based method is
known to offer good performance even with small corpora, and is capable to directly
learn the relationship between acoustics and whatever linguistic features are avail-
able, thus potentially mitigating the absence of detailed intermediate linguistic lay-
ers representation. Further, with the use of context-dependent questions and state
parameter sharing, the problems of unseen contexts and data sparseness are effec-
tively addressed.
The Hidden Markov Text-To-Speech System (HTS) is a dominant parametric
synthesis system, pioneered by the Tokuda’s working group at Nagoya Institute of
Technology (NITECH), Japan. The system has matured over the last two decades to
produce robust synthesis results based on parametric modeling and can use a variety
of front ends to generate full context models for speech synthesis and training. It is
a back end system and depends on third party systems for the initial stage of pro-
cessing. Whereas significant progress has been made at developing the back end of
parametric speech synthesis, not much work has addressed the generation of clear-
cut contexts and their effects on the back end of a parametric system.

1.2 Speech Synthesis: The Problem

Most current voice services employ concatenation techniques with units at least the
size of words or phrases. These techniques which provide near natural speech qual-
ity are limited by low flexibility because of their dependence on pre-recorded speech
1 Adaptive Template-Based Front End for Tone Language Speech Synthesis 3

items. Therefore, companies are turning towards text-to-speech synthesis (TTS) as


a solution. Since TTS systems accept unconstrained input, they provide high flexi-
bility, but the gain in flexibility is at the expense of speech quality. Applications
involving high exposure rates for highly motivated groups of users such as the visu-
ally impaired may do so with relatively poor speech synthesis quality, since the
users learn to understand the speech through frequent exposure (Duffy and Pisoni
1992). If the context of use does not allow the provider of the service to make
assumptions about exposure rate and motivation, and hardly any learning may be
assumed to occur across successive occasions, the speech quality should be much
higher, in fact so high that understanding is almost immediate and takes hardly any
effort. The latter context of use occurs with applications which are intended for the
general audience such as public information services. Thus, before launching such
a publicly available service, it is necessary to assess whether the performance of
current TTS-systems in terms of intelligibility and acceptability is sufficient for
applications where users may not be assumed to have any prior exposure to syn-
thetic speech.
In the early 1990s, it was thought that speech synthesis was a solved problem
(Cole et al. 1995). However, it has since been realised that there are many open
issues in speech synthesis that cover a very wide problem area. First, in the aspect
of text pre-processing, numerals, abbreviations, acronyms, punctuations are very
different from one language to another. Second, phonetisation: orthographic sys-
tems differ widely, proper name pronunciations, especially foreign name pronuncia-
tions, are sometimes anomalous, and the out-of-vocabulary problem is difficult to
solve; text inconsistency in written but non-standardised languages adds to the dif-
ficulty. Third, text parsing for prosody (duration, pitch and intensity) is a major
problem, both in intonation and tone languages. For instance, written texts do not
commonly mark emphasis or the prosodic expression of emotions. Fourth, at the
acoustic level, the discontinuities and contextual effects in waveform concatenation
methods are the most problematic, and the application of signal processing methods
to modify concatenative speech is in its infancy. Fifth, there are problems with both
technological and sociolinguistic consequences: the higher fundamental frequency
(F0) in female and child speech makes it more difficult to estimate the spectral enve-
lope (Klatt 1987), which makes speech synthesis with female and children voices
difficult. Sixth, the evaluation and assessment of synthesised speech from technical
and applications oriented perspectives is another area which has been hardly
explored (Gibbon et al. 2000). Further, since speech quality is multidimensional,
with diverse application domains, the evaluation methods should be carefully cho-
sen to achieve the desired results. These problems have triggered state-of-the-art
approaches to ensure robust designs and highly intelligible voices.
The Local Language Speech Technology Initiative (LLSTI) project (Tucker and
Shalonova 2005) provided the initial adaptation procedure of an existing TTS sys-
tem (i.e., using the Festival Speech Synthesis System) to a Nigerian tone language,
Ibibio (New Benue Congo, Nigeria), the official language of Akwa Ibom State,
spoken in the Southeast coastal region of Nigeria. The prototype development
uncovered a wide range of previously unconsidered problems. The adaptation
4 M.E. Ekpenyong

method may be usable to some extent when languages are prosodically and phone-
mically the same, but severe problems arise when languages are typologically very
dissimilar. For example, intonation languages, for which TTS systems have been
typically developed, pose very different problems from tone languages, e.g., Ibibio.
Even systems for Chinese, also a tone language, cannot be generalised in this way
because the East Asian languages in general have phonemic tone, whereas African
languages have a broad spectrum of morphological tone functionalities in addition
to phonemic tone. For instance, in Ibibio, the subcategory of proximal/distal (tem-
porally near or far) tense is marked by Low-High/High-Low (LH/HL) tones on the
tense morphemes (e.g. used in the context (proximal/distal
tense) ‘I (will go)/went to the market tomorrow/yesterday’.
Pitch therefore has a hard mandatory semantic function in tone languages rather
than a soft pragmatic function in intonation languages (Gibbon et al. 2006;
Ekpenyong and Udoh 2014). Hence, if tone were orthographically marked, tone-­
morpheme combinations could be used to capture this problem in unit selection and
in text pre-processing, with a resulting explosion of the corpus size. However, the
problem is compounded for Ibibio by the lack of orthographic tone marking, mak-
ing morphological tone assignment effectively an Artificial Intelligence (AI) com-
plete problem (Gibbon et al. 2006; Ekpenyong et al. 2014), and requiring extensive
background world knowledge, i.e., heuristic guessing algorithms for morphological
tone assignment. Again, the positional dependence of tone values on the terraced
tone patterning generated by automatic and non-automatic downstep in many
African languages determines a further combinatorial explosion of pitch patterning
(Gibbon 2001). In a number of African languages, the number of inflected word
forms is far larger than for languages like English or Chinese due to the agglutina-
tive inflectional morphology and complex subject-verb-object person concord. This
adds further complexity to morphological tone assignment and produces problems
of text corpus sparseness in the former, a specific case of the sparse data problem,
prevalent in corpus-based approaches (Saruladha et al. 2010).

1.3 TTS System Front End

1.3.1 Rule-Based (RB) Vs. Data-Driven (DD)

Speech and language processing systems may be classified as rule-based or


data-driven:
Rule-based System Utilize prior linguistic knowledge created manually by lan-
guage experts – linguists, phoneticians, lexicographers, etc. The degree of linguistic
knowledge embedded into the system depends on the desired functionalities. While
RB systems are language dependent – mostly refined towards certain linguistic phe-
nomena that represent a compromise between an economy principle and the
­necessity to sufficiently discriminate between communicative elements, they also
suffer the following drawbacks:
1 Adaptive Template-Based Front End for Tone Language Speech Synthesis 5

(i) difficulty in porting the system across language domains and platforms;
(ii) acquisition of the knowledge base or development of the language dependent
representations is highly labour-intensive;
(iii) linguistic models are not so interesting for language and speech processing,
which limits their reproduction;
(iv) there are no underlying and unifying linguistic theories to sufficiently and pre-
cisely represent all levels of linguistic phenomena – phonetic, morpho-­
syntactic, syntactic, discourse, etc.
One classical example of RB systems is the domain-specific expert system that
utilizes rules to make deductions or decisions. Here knowledge is represented as a
set of rules and data as a set of facts. A rule engine is then exploited to compare each
rule (in the knowledge base) with the facts.
Data-driven (empirical) System Automatically derive or learn the linguistic units
from exemplars, thus precluding the acquisition of prior linguistic knowledge.
Today, there is huge availability of speech and language data (audio, speech, video
and text) for well-resourced languages on the internet. These data can be inexpen-
sively harvested to rapidly create data-driven systems that are portable across lan-
guage domains and platforms. DD methods require a speech database, ideally with
labels or can be generated automatically, if not available. DD systems are more
desirable because they require less human expert knowledge and are flexible to
adapt when deployed. They are therefore attractive because speech and language
processing problems have evolved efficient solutions to the storage, indexing,
retrieval and processing of available data. As the number of available data increase,
more representative exemplars are generated, and more accurate results are pro-
duced. The role of a DD system would now be to efficiently store, index, retrieve
and process the relevant units of speech or text. A typical example of a DD system
is a translation system with access to huge amount of translations between two lan-
guages. The role of the DD system would involve matching the input (source lan-
guage) with the translations in its database, retrieving appropriate units in the target
language and possibly combining them to generate a translation. As more transla-
tions become available, the degree of ambiguity and data sparseness decreases, and
more accurate translations and equivalents are produced. Hence, the resulting trans-
lation becomes more precise. In constructing a DD system, the use of HMMs has
largely arisen (Tokuda et al. 2000). HMMs have successfully been applied to mod-
eling the sequence of speech spectra in speech processing systems, but much of the
advances in speech synthesis have been borrowed from the field of speech recogni-
tion. The field of machine learning has increasingly exploited data-driven approaches
where large databases act as implicit knowledge sources, rather than explicit rules
manually written by experts. Machine learning techniques are preferred in situa-
tions where engineering approaches such as hand-crafted models cannot cope with
the problem complexity, and are usually classified into three broad categories:
Predictive or supervised here the goal is to learn a generic rule that maps inputs, i,
to outputs, o, given a labeled set of input-output pairs, T = {( ik , ok )}k =1
N
6 M.E. Ekpenyong

T = {( ik , ok )}k =1 , where T is the training set, and N is the number of training exem-
N

plars. A supervised learning algorithm then analyzes the training data and produces
an inferred function, which can be used to map new examples. An optimal scenario
will permit the algorithm to correctly detect unseen labels or instances, and this
requires the learning algorithm to generalize from the training data to unseen cases
in a “reasonable” way.
Descriptive or unsupervised unsupervised learning can be a goal in itself. In this
case, only the inputs, T = {ik }k =1 , T = {ik }k =1 are given, and the goal is to discover
N N

hidden or “interesting” patterns in the data. This approach is sometimes called


knowledge discovery, and its problem is much less defined.
Reinforcement learning In this approach, the system learns the consequence by
interacting with the dynamic environment in which it must perform a certain goal.
Semi-supervised Between supervised and unsupervised learning is the semi-­
supervised learning, where a system accepts an incomplete training signal (a train-
ing set with some (often many) of the target outputs missing). Transduction is a
special case of this principle where the entire set of problem instances is known at
learning time, except that part of the targets is missing.
Template-based (TB) System Mid-way between RB and DD system is TB system.
TB systems provide a family of techniques that have advanced the field of speech
technology. The underlying idea of TB systems is the collection of prototypical ref-
erence patterns (or templates) – representing a dictionary of candidate inputs. A
matching of new or unknown data is then performed with existing reference tem-
plates, to select the best matching patterns category. The principle of the template-
based approach is adopted in the proposed front end development. Here, the reference
patterns are formatted inputs – sufficient to generate the required back end files. The
proposed template-based front end as presented in Fig. 1.2, utilizes two (input) tem-
plate classes (T1 and T2) to generate the language’s (full context-dependent feature)
model. T1 (the language-specific class) contains three sub templates that define the
language-specific properties. These include the text utterance template (a complete
list of all the recorded sentences in text form); the phone inventory template
(a complete list of all the phonemes of the intended language and their phonological
properties (see Fig. 1.4); and the syllable rule template (a set of rules specifying the
syllable structure of the intended language). The design is adaptable to other (tone)
languages and is currently being refined to improve on its robustness.
In Fig. 1.2, natural language processing (NLP) is used to extract the utterance
heterogeneous relation graphs (HRGs) from T1. From the graphs, useful patterns
are mined to build syllable HRGs, which patterns are subsequently mapped (on tone
bearing units) to produce a close-carbon copy called tone HRGs. Features of T1 are
then modeled after T2 – the context feature template (see Fig. 1.6) – by coding the
various attributes in T1 following the HMM pattern, to produce a full context model.
The HTS question model is finally built from attributes of the phone inventory, and
also coded following the HMM pattern.
1 Adaptive Template-Based Front End for Tone Language Speech Synthesis 7

T1: Language-specific

• Text utterance NLP Utterance HRG


• Phone inventory
• Syllable rule extract pattern

code with represent/model Syllable HRG


HMM pattern
map pattern

T2: Generic structure


ToneHRG
• Context feature
code with
HMM pattern

generate

Full context feature model


+
HTS question model

couple

Back end processing

Fig. 1.2 Proposed template-based front end framework

1.3.2 The Ibibio Language

Ibibio is a Lower Cross tone language, from the New Benue Congo language fam-
ily, spoken in the Southeast Coastal region of Nigeria, which native speakers are
predominantly found in Akwa Ibom State. Ibibio represents the fourth largest speak-
ing group in Nigeria, and consists of about four million speakers. The speech char-
acteristics of Ibibio concern the rules governing the production of sounds in the
language, and are discussed in the following subsections.

1.3.2.1 Phonological Structure

The phonological structure of a language permits the language to act as an intrinsic


system – useful to organize its speech sounds. The phonological system mainly
involves the segmental (speech sounds: vowels and consonants) and auto-segmental
(syllable, tone, foot and mora) structures of the language.
8 M.E. Ekpenyong

Table 1.1 Ibibio vowel system


Vowel
Front Central Back
Vowel height High
Mid
Low

Table 1.2 Ibibio consonant system


Place of articulation
Labial Alveolar Palatal Velar Glottal Uvular
Manner of Stops Voiceless
articulation Voiced
Non Nasal
stops
Fricative
Tap
Trill
Approximant

Vowel and consonant system The Ibibio vowel and consonant systems are pre-
sented in Tables 1.1 and 1.2, respectively.
The long and short vowels are similar in their tone levels, but different in the
sense that the long vowels are merely lengthened versions of their short counterpart,
and they do not bear the rising and falling tones.
Syllable Structure One of the most important aspects of phonology is the structural
representation of sound patterns above the levels of phonemes (Ekpenyong and
Udoh 2014). The Ibibio speech system is built around its syllable structure consist-
ing of a single onset-vowel or nasal prefix and a rhyme consonant (Urua 2000). The
syllable structure of Ibibio is distributed as follows (Ekpenyong and Udoh 2014):
(i) n̄(N) syllable structure: This is a syllabic nasal, and is homoganic with conso-
nants such as ‘wing’, ‘book’, ‘society’;
(ii) V syllable structure: This can be observed where a vowel occurs as a prefix in
a word as in ‘naval’;
(iii) CV syllable structure: This can be seen in words like ‘stand’, ‘come’;
(iv) CVC syllable structure: This structure can be seen in words like ‘throw’,
‘cook’.
(v) CGV syllable structure: This is a consonant-glide vowel structure such as
‘sit’.
(vi) CGVC syllable structure: This can be observed in words like
‘split’.
(vii) C V syllable structure: This is seen in the only consonant structure cluster in
Ibibio as in ‘play’, ‘stop’.
1 Adaptive Template-Based Front End for Tone Language Speech Synthesis 9

(viii) CVVC syllable structure: This is seen in sequences like ‘scratch’.


(ix) CVV structure: This can be seen in words like ‘lie down’.
(x) CVC structure: This can be seen in words like ‘fry’.
Tone Structure Ibibio has two level tones, the High (H) and Low (L) tones. Other
variants include downstepped-High tone (!H), Low-High or rising (LH) and High-­
Low or falling (HL) sequences. The following symbols are used to indicate Ibibio
tones: H [ ´ ] – High, L [ ` ] – Low, D [!] – Downstepped, LH [ ˇ ] – Low-High, and
HL [ ˆ ] – High-Low. Urua (2001) divided the Ibibio tone levels into phonemic and
phonetic tones. She maintain that the phonemic tones are High, Low, and
Downstepped-High tones, while the surface phonetic tones are High, Low,
Downstepped-High in addition to the contour tones, Low-High and High-Low.
Examples of these tones include: High tone – ‘hand’; Low tone –
‘one’; Downstepped-High tone – ‘please; Rising tone – ‘forget’; Falling
tone ‘house’.

1.3.3 Utterance Database

The utterance database is language-specific, and difficult to build for low-resourced


languages. The main steps for building the utterance database for a new TTS system
database include (c.f. Morais and Violaro 2005):
Sentences design – The set of sentences to be recorded should cover as much as
possible, all prosodic and acoustic features or details that are probably to appear
during the speech synthesis process. Put differently, the sentences must be pho-
netically and prosodically balanced. To achieve this, greedy algorithms for
selecting phonetically and prosodically balanced sentences from a huge corpus
may be helpful. However, a minimum of about 2 h of corpus may be sufficient to
build a reasonable DD speech synthesizer.
Speaker selection – The selection of a speaker should be subjective and suitable for
the intended application. Normally, a professional native speaker with good
reading skills and stable (smooth and fluent) voice is preferred to ensure that the
voice passes the digital processing at the backend.
Utterance recording – Professional acoustic conditions are necessary during the
recording stage. A stable voice quality is required all through the recording ses-
sions. This implies that the speaker must take good control of his/her voice all
through the recording sessions. Signal should be recorded using at least 16 bits
and using at least 16 kHz of sampling rate at a professional studio
(recommended).
Labeling – The recorded sentences should be phonetically segmented. This segmen-
tation is normally achieved using tools from automatic speech recognition
(Huang et al. 2001). However, some form of manual checking needs to be done.
Moreover, prosodic labeling (such as pausing, prosodic phrases, and accents or
tones) is important to build a rich HMM model.
10 M.E. Ekpenyong

1.3.3.1 Ibibio TTS Utterance Processing

Initial processing of the input (text) utterance for an Ibibio TTS front end was
accomplished using a text-to-segments framework and implemented with Speect – a
multilingual speech synthesis system (Louw 2008). Speect offers application pro-
gramming interfaces, as well as environment for research and development of TTS
systems and voices. We obtained from this implementation, heterogeneous relation
graphs (HRGs) describing the linguistic structures of the language. HRG was devel-
oped for use in speech synthesis systems, e.g., in Festival, (Black, Taylor and Caley,
1996–1999), and its design addresses the specific needs of such a system. HRG is
useful because of the specific formalism it provides for preserving linguistic infor-
mation in speech synthesis systems. This formalism differentiates itself from other
formalisms used in speech and language processing. The linguistic data processed
in a synthesis system is linguistically heterogeneous, so, rather than dealing with
syntax or phonology independently, synthesizers can be involved in text analysis,
syntactic analysis, morphology, phonology, phonetics, prosody, articulatory control
and acoustics. Hence, to encourage a rich and robust structure that would be useful
not only for text/speech processing, it is essential for a synthesis system to store
representations of these different types of linguistic information in a single
formalism.
Input to the Speect system were recorded utterance and phoneme inventory (see
T1, in Fig. 1.2). The recorded utterance have been used in larger Ibibio synthesis
experiments (Ekpenyong 2013) and were processed from resources collected in
three different projects namely, the West African Language Archive (WALA) proj-
ect, the Local Language Speech Technology Initiative (LLSTI, http://www.llsti.org)
project, and a World Bank/Science and Technology Education Post-Basic (STEP-B)
project. The processed corpus contained a total of 1140 utterances, samples of
which are shown in Fig. 1.3.
To ensure efficient speech preprocessing, and avoid loss of information, the
speech corpus was coded using the Speech Assessment Method for Phonetic
Alphabet (SAMPA) notations, and tailored to suit the ergonomic needs of the lan-

1. bONakam kuukpamba
2. akefeefeRe ajak ikOt abasi
3. eJe amaanam aNwaNa ke mme owo enie ntreubOk ke usVN OmmO keed keed
4. abasi amaasiak usVN ubOkkO OnO nditO isred
5. ejIn OmO adiben eJe akaaisaN
6. eteidVN kiNsidi
7. idooRo akpanikO owo edinIm mbo ke owo emi ataaRa Nwet abasi ke ido nsu
isibOOhO ufen
:::
1137. daNa ebod odo akekan odu OJVN asaNa OJON akewOd ke ekON ikikemme usen
1138. imam isinemme akaN iba mme owo emaesak imam tutu eJe ebeek OmmO NkaN
1139. ete ifOn akedo amaasak mmOONOjId afeRe ke mfVk nte ndiON edIm
1140. daNa ekedikOppO mbVk ebod imam amamaana asakka

Fig. 1.3 Sample input utterances


1 Adaptive Template-Based Front End for Tone Language Speech Synthesis 11

guage and christened after Ibibio. Ibibio SAMPA is a machine-readable phonetic


script using 7-bit printable ASCII characters based on the International Phonetic
Alphabet (IPA), and was developed during a language documentation project (c.f.
Gibbon et al. 2004). The SAMPA notation has become a universally acceptable
standard of encoding the IPA symbols. A list of Ibibio phonemes and their SAMPA
equivalents is shown in Table 1.3.
The phoneme inventory describes the phonological structure of the intended lan-
guage, Ibibio (c.f. Ekpenyong and Udoh 2014). As shown in Fig. 1.4, the inventory
consists of phonemes of the language, their word classification, as well as their
place and manner of articulation.
The phone inventory used for extracting the HRG for Ibibio is typically a
n × 9matrix, which columns are defined as follows:
col. 0 – the phonemes of the intended language (e.g., a, aa, b, p, kp, etc.)
col. 1 – the word classification (e.g., Sonorant, Consonantal, etc.)
col. 2 – the word segmentation (e.g., Syllabic, Non-Syllabic, etc.)
col. 3 – the sound type (e.g., Vowel, Consonant)
col. 4 – the sound duration (e.g., Short, Long, etc.)
col. 5 - the height or manner (if we can’t define the height) (e.g., Low, Medium,
High, Stop, Nasal, Fricative, Approximant, etc.)
col. 6 – the lip posture (e.g. Rounded, Unrounded, Alveolar, etc.)
col. 7 – the sound production (e.g. Voiced, Voiceless, etc.)
col. 8 – the position (e.g. Front, Back, Central, etc.)
Note: more than one specification may be combined in one column to avoid
redundancy or empty columns, but the combined sequence must be consistent with
the specification in Fig. 1.4.
A Python script was then written to implement the Ibibio syllable Finite State
Transducer (FST) (Ekpenyong and Udoh 2014). The syllable structure component
of the HRG for the sixth input utterance eteidVN kiNsidi ‘Chief Kingsley’ (see
Fig. 1.3.) in the Ibibio Speech Assessment Method for Phonetic Alphabet (SAMPA) –
a machine readable notation customized for the intended language, is as shown in
Fig. 1.5.

1.3.4 Context Features Modeling

The context-dependent features for English (Zen 2006) were modified for tone lan-
guages. The modified version is shown in Fig. 1.6. Features related to stress lan-
guages were suppressed using a ‘not applicable’ or NULL letter ‘x’.
In Fig. 1.6, p1, p2, …, t3, represent linguistic features or model states of the text
utterances (recorded corpus data), which models features such as the phonetic con-
text, syllable, word, phrase and utterance statistics, prosodic and tone patterning of
the intended language. Table 1.4 describes the linguistic features of the HMM labels
in Fig. 1.6.
12 M.E. Ekpenyong

Table 1.3 Complete of Ibibio phonemes and their SAMPA equivalent


S/no. Ibibio phone Sound type SAMPA equivalent
1 Vowel
2 Vowel
3 Consonant
4 Consonant
5 Vowel
6 Vowel
7 Vowel
8 Consonant
9 Consonant
10 Consonant
11 Vowel
12 Vowel
13 Vowel
14 Consonant
15 Consonant
16 Consonant
17 Consonant
18 Consonant
19 Consonant
20 Consonant
21 Consonant
22 Consonant
23 Consonant
24 Consonant
25 Vowel
26 Vowel
27 Vowel
28 Vowel
29 Vowel
30 Consonant
31 Consonant
32 Consonant
33 Consonant
34 Consonant
35 Consonant
36 Vowel
37 Vowel
38 Vowel
39 Consonant
40 Consonant
1. "a", "Sonorant", "Syllabic", "Vowel", "Short", "Low", "Unrounded", "Voiced", "Back"
2. "aa", "Sonorant", "Syllabic", "Vowel", "Long", "Low", "Unrounded", "Voiced", "Back"
3. "b", "Consonantal", "Non-Syllabic", "Consonant", "Short", "Stop", "Unrounded", "Voiced",
"Front"
4. "d", "Consonantal", "Non-Syllabic", "Consonant", "Short", "Stop", "alveolar", "Voiced",
"Front"
5. "e", "Sonorant", "Syllabic", "Vowel", "Short", "Medium", "Unrounded", "Voiced", "Front"
:::
36. "u", "Sonorant", "Syllabic", "Vowel", "Short", "High", "Rounded", "Voiced", "Back"
37. "uu", "Sonorant", "Syllabic", "Vowel", "Long", "Medium", "Rounded", "Voiced", "Back"
38. "U", "Sonorant", "Syllabic", "Vowel", "Short", "Medium", "Rounded", "Voiced", "Back"
39. "w", "Consonantal", "Non-Syllabic", "Consonant", "Short", "Approximant", "Unrounded",
"Voiceless", "Back"
40. "j", "Sonorant", "Non-Syllabic", "Consonant", "Short", "Approximant", "Unrounded",
"Voiced", "Central"

Fig. 1.4 A section of Ibibio phone inventory

Fig. 1.5 Extracted syllable


Utterance:
structure for a sample
HRG generated using Feature: _id => 27L
Speect Feature: input => 'eteidVN kiNsidi'
Feature: utterance-type => 'text-to-segments'
:::
Relation 'SylStructure':
Item: [ _id => 3L, name => 'eteidVN']
Daughter: [ _id => 6L, name => 'syl']
Daughter: [ _id => 7L, name => 'e']
Daughter: [ _id => 8L, name => 'syl']
Daughter: [ _id => 9L, name => 't']
Daughter: [ _id => 10L, name => 'e']
Daughter: [ _id => 11L, name => 'i']
Daughter: [ _id => 12L, name => 'syl']
Daughter: [ _id => 13L, name =>'d']
Daughter: [ _id => 14L, name => 'V']
Daughter: [ _id => 15L, name => 'N']
Item: [ _id => 4L, name => 'kiNsidi']
Daughter: [ _id => 16L, name => 'syl']
Daughter: [ _id => 17L, name => 'k']
Daughter: [ _id => 18L, name => 'i']
Daughter: [ _id => 19L, name => 'N']
Daughter: [ _id => 20L, name => 'syl']
Daughter: [ _id => 21L, name => 's']
Daughter: [ _id => 22L, name => 'i']
Daughter: [ _id => 23L, name => 'syl']
Daughter: [ _id => 24L, name => 'd’]
Daughter: [ _id => 25L, name => 'i’]
:::
14 M.E. Ekpenyong

{duration} p1^ p2 − p3 + p4 = p5 @ p6 _ p7
/ A : x _ x _ a3 / B : x − x − b3 @b4 − b5 &b6 − b7 # x − x$x − x! x − x; x − x | b16/ C : c1 + c2 + c3
/ D : d1 _ d 2 / E : e1 + e2 @ e3 + e4 & e5 + e6 # e7 + e8 / F : f1 _ f 2
/ G : g1 _ g 2 / H : h1 = h2 @ h3 = h4 | x / I : i1 _ i2
/ J : j1 + j2 − j3 / TL: t1 _ TC: t 2 _ TR: t 3

Fig. 1.6 Modified context-dependent HMM labels for tone languages

To obtain statistics and position related information for other linguistic features,
excluding tone features, a Shell script was then written to flatten the HRG files into
a single text file. A typical output of processed HRGs revealing only the syllable
annotations of sample utterances in Fig. 1.3 is given in Fig. 1.7.
Notice in Fig. 1.6, that symbols are used as boundary separators. In our case, ‘#’
represents word boundary separator, ‘-’ represents syllable boundary separator, and
‘|’ represents phrase boundary separator.
To obtain statistics for tone related features, close copy annotations of the input
utterances (see Fig. 1.3) was manually carried out for tones. The annotated v­ ersion/
file was then automatically processed into a syllabified equivalent of Fig. 1.7. The
output of this processing is shown in Fig. 1.8.
Notice in Fig. 1.8 that only tones are processed. In the case of Ibibio, numbers
(1–5) were used to represent the various tones as follows: 1 for High (H) tone, 2 for
Low (L) tone, 3 for Down-stepped (!) tone, 4 for High Low (HL) or Falling tone,
and, 5 for Low High (LH) or Rising tone. Zeros (0’s) were automatically generated
to fill the consonants slots.
To implement context-dependent tone modelling, we propose a data-driven tech-
nique to features specification to make our framework flexible at the front end – and
minimise the hard-coding of linguistic evidence into the synthesiser, hence, making
the system generic and adaptable to other tone languages. Figure 1.9 shows the
proposed context-dependent HMM algorithm for tone and prosody features label-
ling, and is expected to be fully implemented in future research.
From Fig. 1.9, we can conveniently extract the tone and prosodic features of any
corpus by clustering the input patterns and emitting the expected transitions at each
state of the HMM, thus:
r r r r
Tlabel = θ 0f,tone (i ,1) + θ tone (i ,1) +…+ θ tone(i ,n −1) + θ cf(i ,n −1),tonepat (i , n −1) +
r r r r s
θ tone (i ,n ) + θ cf(i ,n ),tonepat (i ,n ) + θ tone(i ,n +1) +…+ θ tone (i , N ) + θCb+1,tone(i , N ) +
s s s s s (1.1)
θ tpros (i , N ) +…+ θ pros (i ,n +1) + θ cb(i ,n ),tonepat (i ,n ) + θ pros (i ,n ) + θ cb(i ,n −1),tonepat (i ,n −1) +
s s
θ pros (i ,n −1) +…+ θ pros (i ,1)

Equation (1.1) is useful for modelling the state features of a HMM-based tone
language synthesis system. It describes the context-dependent features and details
the prosodic factors necessary for tone language synthesis. The prosodic features
1 Adaptive Template-Based Front End for Tone Language Speech Synthesis 15

Table 1.4 Template summarizing the context features


Feature Description
Phoneme
p1 The phoneme identity before the previous phoneme
p2 The previous phoneme identity
p3 The current phoneme identity
p4 The next phoneme identity
p5 The phoneme after the next phoneme identity
p6 Position of the current phoneme identity in the current syllable (forward)
p7 Position of the current phoneme identity in the current syllable (backward)
The number of phonemes in the current syllable
Syllable
b4 Position of the current syllable in the current word (forward)
b5 Position of the current syllable in the current word (backward)
b6 Position of the current syllable in the current phrase (forward)
b7 Position of the current syllable in the current phrase (backward)
Word
d1 Gpos (guess part-of-speech) of the previous word
d2 The number of syllables in the previous word
e1 Gpos (guess part-of-speech) of the current word
e2 The number of syllables in the current word
e3 Position of the current word in the current phrase (forward)
e4 Position of the current word in the current phrase (backward)
e5 The number of content words before the current word in the current phrase
e6 The number of content words after the current word in the current phrase
e7 The number of words from the previous content word to the current word
e8 The number of words from the current word to the next content word
f1 Gpos (guess part-of-speech) of the next word
f2 The number of syllables in the next word
Phrase
g1 The number of syllables in the previous phrase
g2 The number of words in the previous phrase
h1 The number of syllables in the current phrase
h2 The number of words in the current phrase
h3 Position of the current phrase in utterance (forward)
h4 Position of the current phrase in utterance (backward)
i1 The number of syllables in the next phrase
i2 The number of words in the next phrase
Utterance
j1 The number of syllables in this utterance
j2 The number of words in this utterance
j3 The number of phrases in this utterance
Tone
t1 The tone of previous phoneme
t2 The tone of current phoneme
t3 The tone of next phoneme
16 M.E. Ekpenyong

1. #bO-Na-kam|kuu -kpam-ba
2. #a-ke-fee-fe-Re#a-jak#i-kOt#a -ba-si
3. #e-Je#a-maa-nam#aN -wa-Na#ke#m -me#o-wo#e-nie#n-treu-bOk#ke#u -sVN#Om-
mO#keed#keed
4. #a-ba-si#a-maa-siak#u -sVN#u -bOk-kO#O-nO#n-di-tO#i-sred
5. #e-jIn#O-mO#a -di-ben#e-Je#a-kaai-saN
6. #e-tei-dVN#kiN -si-di
7. #i-doo-Ro#a-kpa-ni-kO#o-wo#e-di-nIm#m -bo#ke#o -wo#e-mi#a-taa-Ra#N-wet#a -
ba-si#ke#i -do#n-su#i-si-bOO-hO#u-fen
:::
1137. #da-Na#e-bod#o-do#a-ke-kan#o-du#O-JVN#a -sa-Na#O-JON#a -ke-wOd#ke#e -
kON#i -ki-kem-me#u-sen
1138. #i-mam#i -si-nem-me#a-kaN#i-ba#m-me#o-wo#e-mae-sak#i-mam#tu -tu#e-Je#e-
beek#Om -mO#N -kaN
1139. #e-te#i-fOn#a -ke-do#a-ma-a-sak#m-mOO-NO-jId#a-fe-Re#ke#m -fVk#n -te#n-
diON#e -dIm
1140. #da-Na#e-ke-di-kOp-pO#m-bVk#e-bod#i -mam#a -ma-maa-na#a-sak-ka

Fig. 1.7 Processed HRGs for sample Ibibio utterances

1. #02-02-020|012 -001#2 -02


2. #2-02-021-01-02#1 -020#1 -010#1 -02-02
3. #2-01#1 -033-010#1 -001-02#01#2 -02#1 -01#1 -033#2 -0022 -010|01#1 -010#20 -
04#0220#0220
4. #2-02-02#1 -033-0110#1 -010#1 -020-01#1 -02#2 -02-02#1 -00210
5. #1-010#1 -02#1 -03-010#2 -01#1 -021#1 -010
6. #2-01#1 -020#010 -02-02
7. #1-011-01#1 -001-01-02#1 -01#2 -01-040#2 -01|01#1 -01#1 -02#1 -011-01#2 -020#1 -
02-02|01#1 -01#1 -02|1 -01-022-01#1 -040
:::
1137. #01-01#1 -010#1 -02#1 -01-030#1 -04#1 -010#3 -01-01#3 -010#1 -01-020#01#1 -
010#2 -02-020-01#1 -040
1138. #1-040#1 -01-020-01#2 -010#2 -02#2 -02#1 -01#1 -021-030#1 -040#02 -02#2 -01#1 -
0330#50 -02#1 -020
1139. #2-01#2 -010#1 -01-02#1 -01-3-010#1 -011-01-020#1 -02-01#01#1 -010#2 -01#1 -
0220#1 -020
1140. #01-01#1 -01-03-020-01#2 -020#1 -020#1 -040#1 -05-022-01#1 -030-01

Fig. 1.8 Processed tone-tagged version of Fig. 1.7

θ tone (i ,1) θ tone (i ,n−1) θ tone ( i ,n ) θ tone (i ,n−1) θ tone (i , N )


θ 0f ,tone (i ,1) θ cf(i ,n−1),tonepat (i ,n−1) θ cf(i ,n),tonepat (i ,n)

S i ,1 … S i ,n −1 c n−1 S i ,n cn S i ,n+1 … S i,N

θcb(i ,n−1),tonepat (i ,n−1) θ cb(i ,n),tonepat (i ,n)


θ Cb+1,tone (i , N )
θ pros (i,1) θ pros (i,n+1) θ pros (i,n) θ pros (i,n+1) θ tpros (i , N )

Fig. 1.9 A generic context-dependent HMM labelling of tone and prosody


1 Adaptive Template-Based Front End for Tone Language Speech Synthesis 17

1. 0 1088240 x^x-pau+e=t 0 1088240 x^x-pau+e=t@x_x/A:x_x_x/B:x-x-


2. 1088240 2176480 x^pau-e+t=e
3. 2176480 3264720 pau^e-t+e=i x@x-x&x-x#x-x$x-x!x-x;x-
4. 3264720 4352960 e^t-e+i=d x|x/C:x+x+x/D:x_x/E:x+x@x+x&x+x#x+x/
5. 4352960 5441200 t^e-i+d=V F:x_x/G:x_x/H:x=x@x=x|x/I:x_x/J:x+x-
6. 5441200 6529440 e^i-d+V=N
x/TL:x_TC:x_TR:x
7. 6529440 7617680 i^d-V+N=k
8. 7617680 8705920 d^V-N+k=i 1088240 2176480 x^pau-
9. 8705920 9794160 V^N-k+i=N e+t=e@1_1/A:x_x_0/B:x-x-1@1-3&1-6#x-
10. 9794160 10882400 N^k-i+N=s
x$x-x!x-x;x-
11. 10882400 11970640 k^i-N+s=i
12. 11970640 13058880 i^N-s+i=d x|e/C:x+x+3/D:x_0/E:CONTENT+3@1+2
13. 13058880 14147120 N^s-i+d=i &x+1#x+1/F:CONTENT_3/G:0_0/H:6=2@
14. 14147120 15235360 s^i-d+i=pau 1=1|x/I:0_0/J:6+2-1/TL:x_TC:2_TR:1
15. 15235360 16323600 i^d-i+pau=x
:::
16. 16323600 17411840 d^i-pau+x=x
(a) (b)

Fig. 1.10 Context dependent labels for the input utterance ‘ete idVN kiNsidi’. (a). Phonetic con-
text model; (b). Full context model for first three phoneme-sequences

describing each vector of Eq. (1.1) are given in Table 1.4. A Python script was writ-
ten to extract the features in Table 1.4, from a text corpus, and a snippet of the full
context model for a sample sentence is given in Fig. 1.10b

1.3.5 HTS Label Generation

To generate the HTS label (or lab) files, context interactions between the various
features in Fig. 1.7 were modeled in a supervised manner using a simple FST. First,
the speech duration and phonetic contexts to establish the initial quin-phone labels
was dealt with. In labeling the speech durations, we do not manually annotate the
speech corpus, but assume that each phone in a given utterance has same duration.
This initial assumption was used to pre-annotate the speech corpus and was achieved
by dividing the total utterance’s duration by the total number of phonemes in that
utterance. This gave all phone durations equal lengths, with fixed incremental fac-
tors (Fig. 1.10a). We have shown that the resultant voice from quin-phone labels
sounds poor, but improved greatly when a rich/full context model was used
(Ekpenyong et al. 2014). A full context-dependent model for the first two phoneme
sequences in Fig. 1.10a is shown in Fig. 1.10b. To obtain consistent duration model-
ing, a re-alignment process using the Viterbi algorithm was adopted. Viterbi algo-
rithm is the most common algorithm for implementing n-gram search. It is an
efficient dynamic search technique that avoids the polynomial expansion of a
breath-first search, by ‘trimming’ the search tree at each level using the best n
MLEs. The re-alignment process was achieved using HMM edit (HLEd) tool of the
HTS toolkit.
Another random document with
no related content on Scribd:
Lameyran
Gmelin,
and Fremy,
Gas. Pflüger. from Lassaigne. Vogel. Raiset.
from
clover
clover.
CO 60–80 5 29 5 27 74.23
CO 28–40
CH 15 6 15 48 23.46
HS 80 80
O 14.7
N 50.3 25 2.22
The most elaborate observations on this subject are those made by
Lungwitz on the different aliments kept in closed vessels at the body
temperature, and on similar agents fed for days as an exclusive
aliment to oxen provided with a fistula of the rumen for purposes of
collection. He found carbon dioxide to be the predominating gas in
all cases, but that it was especially so in extreme tympanies and
varied much with the nature of the food. The following table gives
results:

Percentage of CO2.
Buckwheat (Polygonum fagopyrum) 80
Alfalfa (Mendicago Sativa) 70–80
Clover (Trifolium pratense) 70–80
Meadow grass 70–80
Indian corn (Zea Maïs) 70–80
Spurry (Spergula arvensis) 70–80
Hay of alfalfa or clover 70–80
Oats with cut straw 70–80
Yellow Lupin (Lupinus luteus) 60–70
Vetch (Vicia sativa) 60–70
Oats cut green 60–70
Potato tops 60–70
Potatoes 60–70
Meadow hay 60–70
Leaves of beet 50–60
Leaves of radish 50–60
Cabbage 40–50
The marsh gas varied from 16 to 39 per cent., being especially
abundant in cases of abstinence. It should, therefore, be in large
amount in the tympanies which accompany febrile and other chronic
affections. Hydrogen sulphide was found only in traces, recognizable
by blackening paper saturated with acetate of lead. Oxygen and
nitrogen were in small amount and were attributed to air swallowed
with the food. In the work of fermentation the oxygen may be
entirely used up.
Lesions. These are in the main the result of compression of the
different organs, by the overdistended rumen. Rupture of the rumen
is frequent. The abdominal organs are generally bloodless, the liver
and spleen shrunken and pale, though sometimes the seat of
congestion or even hemorrhage. Ecchymoses are common on the
peritoneum. The right heart and lungs are gorged with black blood,
clotted loosely, and reddening on exposure. The right auricle has
been found ruptured. Pleura, pericardium and endocardium are
ecchymotic. The capillary system of the skin, and of the brain and its
membranes, is engorged, with, in some instances, serous
extravasations.
Prevention. This would demand the avoidance or correction of all
those conditions which contribute to tympany. In fevers and
extensive inflammations, when rumination is suspended, the diet
should be restricted in quantity and of materials that are easily
digested (well boiled gruels, bran mashes, pulped roots, etc.,) and all
bulky, fibrous and fermentescible articles must be proscribed. In
weak conditions in which tympany supervenes on every meal, a
careful diet may be supplemented by a course of tonics, carminatives
and antiseptics such as fœungrec oxide of iron, hyposulphite of soda
and common salt, equal parts, nux vomica 2 drs. to every 1 ℔. of the
mixture. Dose 1 oz. daily in the food, or ½ oz. may be given with each
meal.
Musty grain and fodder should be carefully avoided, also
mowburnt hay, an excess of green food to which the stock is
unaccustomed, clover after a moderate shower, or covered with dew
or hoarfrost, frosted beet, turnip, or potato tops, frosted potatoes,
turnips or apples, also ryegrass, millet, corn, vetches, peas with the
seeds fairly matured but not yet fully hardened. When these
conditions cannot be altogether avoided, the objectionable ration
should be allowed only in small amount at one time and in the case
of pasturage the stock should have a fair allowance of grain or other
dry feed just before they are turned out. Another precaution is to
keep the stock constantly in motion so that they can only take in
slowly and in small quantity the wet or otherwise dangerous aliment.
When it becomes necessary to make an extreme transition from
one ration to another, and especially from dry to green food,
measures should be taken to make the change slowly, by giving the
new food in small quantities at intervals, while the major portion of
the diet remains as before, until the fæces indicate that the
superadded aliment has passed through the alimentary canal.
Another method is to mix the dry and green aliments with a daily
increasing allowance of the latter. Some have avoided the morning
dew and danger of fermentation by cutting the ration for each
succeeding day the previous afternoon and keeping it in the interval
under cover.
Treatment. Various simple mechanical resorts are often effective
in dispelling the tympany. Walking the animal around will
sometimes lead to relaxation of the tension of the walls of the
demicanal and even to some restoration of the movements of the
rumen with more or less free eructation of gas. The dashing of a
bucket of cold water on the left side of the abdomen sometimes
produces a similar result. Active rubbing or even kneading of the left
flank will sometimes lead to free belching of gas. The same may be at
times secured by winding a rope several times spirally round the
belly and then twisting it tighter by the aid of a stick in one of its
median turns.
A very simple and efficient resort is to place in the mouth a block
of wood 2½ to 3 inches in diameter and secured by a rope carried
from each end and tied behind the horns or ears. This expedient
which is so effective in preventing or relieving dangerous tympany in
choking appears to act by inducing movements of mastication, and
sympathetic motions of the œsophagus, demicanal and rumen. It not
only determines free discharge of gas by the mouth, but it absolutely
prevents any accession of saliva or air to the stomach by rendering
deglutition difficult or impossible. A similar effect can be obtained
from forcible dragging on the tongue but it is difficult to keep this up
so as to have the requisite lasting effect. Still another resort is to
rouse eructation by the motions of a rope introduced into the fauces.
The passing of a hollow probang into the rumen is very effective as
it not only secures a channel for the immediate escape of the gas, but
it also stimulates the demicanal and rumen to a continuous
eructation and consequent relief. Friedberger and Fröhner advise
driving the animals into a bath of cold water.
Of medicinal agents applicable to gastric tympany the best are
stimulants, antiseptics and chemical antidotes. Among stimulants
the alkaline preparations of ammonia hold a very high place. These,
however, act not as stimulants alone, but also as antacids and
indirectly as antidotes since the alkaline reaction checks the acid
fermentation which determines the evolution of the gas. They also
unite with and condense the carbon dioxide. Three ounces of the
aromatic spirits of ammonia, one ounce of the crystalline
sesquicarbonate, or half an ounce of the strong aqua ammonia may
be given to an ox, in not less than a quart of cold water. Next to this
is the oil of turpentine 2 oz., to be given in oil, milk, or yolk of egg.
But this too is an antiferment. The same remark applies to oil of
peppermint (½ oz.), the carminative seeds and their oils, and the
stronger alcoholic drinks (1 quart). Sulphuric or nitrous ether (2 oz.)
may be given in place. Pepper and ginger are more purely stimulant
and less antiseptic. Other alkalies—carbonate of potash or soda, or
lime water may be given freely.
Among agents that act more exclusively as antiseptics may be
named: muriatic acid 1 to 1½ drs. largely diluted in water; carbolic
acid, creosote or creolin, 4 drs. largely diluted; sulphite, hyposulphite
or bisulphite of soda 1 oz.; kerosene oil ½ pint; chloride of lime 4
drs.; chlorine water 1 pint; wood tar 2 oz. The latter agent is a
common domestic remedy in some places being given wrapped in a
cabbage leaf, and causing the flank to flatten down in a very few
minutes as if by magic. The extraordinarily rapid action of various
antiseptics is the most conclusive answer to the claim that the
disorder is a pure paresis of the walls of the rumen. The affection is
far more commonly and fundamentally an active fermentation, and
is best checked by a powerful antiferment. Even chloride of sodium
(½ lb.) and above all hypochlorite of soda or lime (½ oz.) may be
given with advantage in many cases.
Among agents which condense the gasses may be named
ammonia, calcined magnesia, and milk of lime for carbon dioxide,
and chlorine water for hydrogen.
Among agents used to rouse the torpid rumen and alimentary
canal are eserine (ox 3 grs., sheep ½ gr. subcutem), pilocarpin (ox 2
grs., sheep ⅕ gr.), barium chloride (ox 15 grs., sheep 3 to 4 grs.),
tincture of colchicum (ox 3 to 4 drs.). Trasbot mentions lard or butter
(ox 4 oz., sheep ½ oz.), as in common use in France.
In the most urgent cases, however, relief must be obtained by
puncture of the rumen, as a moment’s delay may mean death. The
seat for such puncture is on the left side, at a point equidistant from
the outer angle of the ilium, the last rib and the transverse processes
of the lumbar vertebræ. Any part of the left flank might be adopted to
enter the rumen, but, if too low down, the instrument might plunge
into solid ingesta, which would hinder the exit of gas, and would
endanger the escape of irritant liquids into the peritoneal cavity. In
an extra high puncture there is less danger, though a traumatism of
the spleen is possible under certain conditions. The best instrument
for the purpose is a trochar and cannula of six inches long and ⅓ to
½ inch in diameter. (For sheep ¼ inch is ample.) This instrument,
held like a dagger, may be plunged at one blow through the walls of
the abdomen and rumen until stopped by the shield on the cannula.
The trochar is now withdrawn and the gas escapes with a prolonged
hiss. If the urgency of the case will permit, the skin may be first
incised with a lancet or pen knife, and the point of the instrument
having been placed on the abdominal muscles, it is driven home by a
blow of the opposite palm. In the absence of the trochar the puncture
may be successfully made with a pocket knife or a pair of scissors,
which should be kept in the wound to maintain the orifice in the
rumen in apposition with that in the abdominal wall, until a metal
tube or quill can be introduced and held in the orifices.
When the gas has escaped by this channel its further formation can
be checked by pouring one of the antiferments through the cannula
into the rumen.
When the formation of an excess of gas has ceased, and the
resumption of easy eructation bespeaks the absence of further
danger, the cannula may be withdrawn and the wound covered with
tar or collodion.
When the persistent formation of gas indicates the need of
expulsion of offensive fermentescible matters, a full dose of salts may
be administered. If the presence of firmly impacted masses can be
detected, they may sometimes be broken up by a stout steel rod
passed through the cannula. If the solid masses prove to be hair or
woolen balls, rumenotomy is the only feasible means of getting rid of
them.
In chronic tympany caused by structural diseases of the
œsophagus, mediastinal glands, stomach or intestines, permanent
relief can only be obtained by measures which will remove these
respective causes.
CHRONIC TYMPANY OF THE RUMEN.

Causes: catarrh of rumen, impaction of manifolds, debility, paresis, peritoneal


adhesions, neoplasms, concretions, sudden change in diet, gastric congestion,
lesions of gullet, or of mediastinal glands. Symptoms are usually after feeding only,
inappetence, rumbling, costiveness, rumen indentable. Treatment: obviate causes,
give salines, acids, bitters, and water, laxative food, carminatives, antiseptics,
electricity, emetic tartar, eserine, pilocarpin, barium chloride, apomorphin.

Causes. The persistence of causes of acute tympany may lead to


the appearance of the condition after each meal, or even in the
intervals between meals. Among the more specific causes may be
named catarrhal inflammation of the rumen, impaction of the third
stomach, paresis of the rumen, general debility, peritoneal adhesions
affecting the viscus, tuberculosis, actinomycosis or other morbid
productions in its walls, hernia of the reticulum into the chest, hard
stercoral, hair or wool balls, or masses or foreign bodies in the
rumen, and the ingestion of a very fermentescible quality of food.
When the rumen is affected by catarrh or paresis or debility, even
ordinary food will lead to tympany, but much more so any food to
which the animal has been unaccustomed (green for dry, or dry for
green, grain for grass or hay, or beans or peas for grain). Also food in
process of fermentation, or the seat of fungoid growth.
Again, so intimately related are the different stomachs that
derangement of one instantly impairs the functions of the other, and
thus a slowly progressive impaction of the third stomach leads to
torpor of the first, and the aggregation of more or less of its contents
into solid, fermenting masses. In the same way congestion of either
the third or fourth stomach impairs the functions of the rumen and
induces tympany.
Morbid conditions affecting the functions of the œsophagus and
interfering with rumination and eructation of gas are familiar causes.
For example, strictures and saccular dilations of the tube, and
enlargements—tubercular, sarcomatous, actinomycotic,—of the
mediastinal glands.
The symptoms do not differ from those of acute tympany
excepting that they are less severe; and are continuous or remittent,
suffering a material aggravation after feeding. Rumination may be
suppressed or tardy, the bowels also are torpid, the fæces glazed, and
the ordinary intestinal rumbling little marked. When the tympany
has temporarily subsided, the knuckles, pressed into the left side, can
often be made to strike against the hard, solid impacted mass of
ingesta. Symptoms of impacted manifolds may also be patent and
the patient steadily loses condition.
Treatment must be directed toward the removal of the special
cause of the trouble, and if this cannot be secured, as in tuberculosis,
the case is hopeless. In cases of solid masses in the rumen the free
use of common salt with a drachm of hydrochloric acid, and one
grain strychnine with each meal, and a free access to water may
succeed. The food had best be restricted to gruels and sloppy mashes.
The daily use of electricity through the region of the paunch is an
important accessory. The common salt may be increased as required,
so as to keep up a very relaxed condition of the bowels.
In obstinate cases of this kind puncture may be resorted to and an
attempt made to break down the impacted masses with a steel rod
introduced through the cannula. Should this also fail the solid
masses or foreign bodies may be extracted by rumenotomy.
In simple catarrh of the rumen the continued use of strychnine
with gentian, and sulphate of iron, may prove successful under a
carefully regulated diet. Oil of turpentine, balsam of copaiba, or
balsam of tolu may also prove useful, or in other cases extract of
hamamelis, or of wild cherry bark. While strychnine and electricity
are to be preferred to rouse the muscular activity of the viscus, such
agents as tartar emetic, emetine, apomorphin, eserine, pilocarpin
and barium chloride are recommended and may be resorted to in
case of necessity.
OVERLOADED (IMPACTED) RUMEN.

Definition. Causes, excess of rich unwonted food, gastric torpor, paresis,


starvation, debility, partially ripened, poisonous seeds, paralyzing fungi or
bacteria, lead, cyanides, congestion of rumen, chlorophyll, acrids, dry, fibrous
innutritious food, lack of water, enforced rest on dry food, over-exertion, salivary
fistula or calculus, diseased teeth or jaws, senility. Symptoms, suspended
rumination, inappetence, anxious expression, arched back, bulging pendent left
flank, impressible, no friction sounds, excessive crepitation, hurried breathing,
colics, grunting when moved, diarrhœa, stupor, cyanosis. Signs of improvement.
Phrenic rupture. Diagnosis from tympany, pneumonia, or gastro-intestinal catarrh.
Treatment, hygienic, antiseptic, stimulants, puncturing, purgation, rumenotomy.

Definition. The overdistension of the rumen with solid food is


characterized by two things, the excess of ingesta which produces the
torpor or paresis which is common to all over-filled hollow viscera,
and the comparative absence of fermentation and evolution of gas. If
the ingesta is of a more fermentescible nature the rapid evolution of
gas occurs before this degree of repletion with solid matters can be
reached, and the case becomes one of tympany, but if the contents
are comparatively lacking in fermentability they may be devoured in
such quantity as to cause solid impaction.
Causes. Overloading of the rumen is especially common as the
result of a sudden access to rich or tempting food to which the
animal has been unaccustomed. Accidental admittance to the
cornbin, breaking into a field of rich grass, clover, alfalfa, corn,
sorghum, vetches, tares, beans, peas, or grain, or into a barrel of
potatoes or apples will illustrate the common run of causes. A pre-
existing or accompanying torpor or paresis of the stomach is a most
efficient concurrent cause, hence the affection is especially common
in animals debilitated by disease or starvation, but which have
become convalescent or have been suddenly exposed to the
temptation of rich food. For the same reason it is most likely to occur
with food which contains a paralyzing element, as in the case of the
following when they have gone to seed but are not yet fully ripened:
Rye grass, intoxicating rye grass, millet, Hungarian grass, vetches,
tares and other leguminosæ, and to a less extent, wheat, barley, oats
and Indian corn. The same may come from the paralyzing products
of fungi or bacteria in musty fodder or of such chemical poisons as
lead, and the cyanides.
A catarrhal affection of the rumen, and the congestion produced by
irritant plants, green food with an excess of chlorophyll, and the
whole list of irritants and narcotico-acrids, will weaken the first
stomach and predispose to overdistension.
Anything which lessens the normal vermicular movements of the
rumen and hinders regurgitation and rumination tends to impaction,
and hence an aliment which is to a large extent fibrous, innutritious,
and unfermentable, such as hay from grass that has run to seed and
been threshed, the stems of grasses that have matured and withered
in the pastures, fodder that has been thoroughly washed out by heavy
rains, sedges, reedgrass, rushes, chaff, finely cut straw, and in the
case of European sheep, the fibrous tops of heather contribute to this
affection. Lack of water is one of the most potent factors, as an
abundance of water to float the ingesta is an essential condition of
rumination. Hence pasturage on dry hillsides, prairies or plains,
apart from streams, wells or ponds is especially dangerous unless
water is supplied artificially, and the winter season in our Northern
states, when the sources of drinking water are frozen over, and when
the chill of the liquid forbids its free consumption, is often hurtful.
Gerard attributes the affection to constant stabulation. This,
however, has a beneficial as well as a deleterious side. It undermines
the health and vigor, and through lack of tone favors gastric torpor
and impaction, but it also secures ample leisure for rumination,
which is so essential to the integrity of the rumen and favors the
onward passage of its contents. With dry feeding and a restricted
water supply it cannot be too much condemned, but with succulent
food and abundance of water the alleged danger is reduced to the
minimum.
Active work and over exertion of all kinds must be admitted as a
factor. At slow work the ox can still ruminate, but in rapid work or
under heavy draft this is impossible, and the contained liquids may
pass over from rumen to manifolds conducing to impaction of the
former, or fermentations may take place, swelling up the mass of
ingesta and distending the walls of the first stomach. Similarly, cattle
and sheep that are hurried off on a rapid march with full stomachs
are greatly exposed to both tympany and impaction.
In speaking of dry, fibrous food and lack of water as factors, we
must avoid the error of supposing that succulent or aqueous food is a
sure preventive. In a catarrhal condition of the rumen or in a state of
debility, impaction may readily occur from the excessive ingestion of
luscious grass, wheat bran, potatoes, apples, turnips, beets, or
cabbage.
Finally defects in the anterior part of the alimentary tract may tend
to impaction. Salivary fistula or calculus cutting off the normal
supply of liquid necessary for rumination, tends to retention and
engorgement. Diseased teeth and jaws interfering with both the
primary and secondary mastication has the same vicious tendency.
Old cows, oxen and sheep in which the molar teeth are largely worn
out, suffer in the same way, especially when put up to fatten or
otherwise heavily fed. In this case there is the gastric debility of old
age as an additional inimical feature.
Symptoms. These vary with the quantity and kind of ingesta also
to some extent with the previous condition of the rumen, sound or
diseased. They usually set in more slowly than in tympany. On the
whole the disease appears to be more common in the stable than at
pasture. The animal neither feeds nor ruminates, stands back from
the manger, becomes dull, with anxious expression of the face,
arching of the back and occasional moaning especially if made to
move. The abdomen is distended but especially on the left side,
which however hangs more downward and outward and tends less to
rise above the level of the hip bone than in tympany. If it does rise
above the ilium this is due to gas and it is then elastic, resilient and
resonant on percussion at that point. The great mass, and usually the
whole of the paunch is nonresonant when percussed, retains the
imprint of the fingers when pressed, and gives the sensation of a
mass of dough. The hand applied on the region of the paunch fails to
detect the indication of movements which characterize the healthy
organ. The ear applied misses the normal friction sound, but detects
a crepitant sound due to the evolution of bubbles of gas from the
fermenting mass. This is especially loud if the impaction is one of
green food or potatoes, even though the gas remains as bubbles
throughout the entire fermenting mass, instead of separating to form
a gaseous area beneath the lumbar transverse processes.
The respiration is hurried, labored and accompanied with a moan,
the visible mucosæ are congested, the eyes are protruded and glassy
from dilatation of the pupils, the feet are propped outward, and the
head extended on the neck. There may be signs of dull colicy pains,
movements of the tail and shifting of the hind feet, in some cases the
patient may even lie down but never remains long recumbent. There
may be occasional passages of semi-liquid manure, though usually
the bowels are torpid and neither passages nor rumbling sounds on
the right side can be detected. When moved the animal usually
grunts or moans at each step, and especially when going down hill,
owing to the concussion of the stomach on the diaphragm. In cases
due to green food the irritation may extend to the fourth stomach
and intestines and a crapulous diarrhœa may ensue. The
temperature remains normal as a rule. The disease is more
protracted than tympany, yet after several hours of suffering and
continual aggravation the dullness may merge into stupor, the
mucosæ become cyanotic and death ensues from shock, asphyxia, or
apoplexy.
Course. Termination. Many cases recover in connection with a
restoration of the contractions of the rumen, the eructation of gas, in
some rare cases vomiting or spasmodic rejection of quantities of the
ingesta, and the passage of gas by the bowels. This may be associated
with a watery diarrhœa, and loud rumbling of the right side, which
may continue for twenty-four hours or longer. With the subsidence
of the diarrhœa there comes a return of health, or there may remain
slight fever, inappetence, suspended or impaired rumination,
dullness, listlessness, and a mucous film on the fæces. This indicates
some remaining gastro-enteritis.
In some instances there is rupture of the diaphragm with marked
increase in the abdominal pain and the difficulty of breathing. In
others there is a laceration of the inner and middle coats of the
rumen so that the gas diffuses under the peritoneum and may even
be betrayed by an emphysematous extravasation under the skin.
Diagnosis. From tympany this is easily distinguished by the
general dullness on percussion, the persistence of the indentation
caused by pressure, the outward and downward rather than the
upward extension of the swelling, and the slower development of the
affection.
It is far more likely to be confounded with pneumonia, which it
resembles in the hurried, labored breathing, the moans emitted in
expiration, in the dullness on percussion over the posterior part of
the chest, it may be even forward to the shoulders, and in the
cyanotic state of the mucosæ. The distinction is easily made by the
absence of hyperthermia, and of crepitation along the margins of the
nonresonant areas in the lungs, by the fact that the area of chest
dullness covers the whole posterior part of the thorax to a given
oblique line, and by the history of the case and the manifest
symptoms of overloaded stomach, not with gas but with solids. From
gastro-intestinal catarrh it may be distinguished by the more rapid
advance of the symptoms and by the absence of the slight fever
which characterizes the latter.
Treatment. Slight cases may be treated by hygienic measures only.
Walking the animal uphill, injections of cold water, friction on the
left side of the abdomen to rouse the rumen to activity, antiseptics as
in tympany to check further fermentation, and stimulants to
overcome the nervous and muscular torpor, may be employed
separately or conjointly. When it can be availed of, a rubber hose
may be wound round the abdomen and a current of cold water forced
through it.
When further measures are demanded we should evacuate any gas
through the probang or a cannula, as in tympany, and thus relieve
tension and then resort to stimulants and purgatives. Common salt 1
℔. is of value in checking fermentation, and may be added to 1 ℔.
Glauber salts in four or five quarts of warm water. A drachm of
strong aqua ammonia or 2 oz. oil of turpentine and ½ drachm of nux
vomica may be added. Bouley advocated tartar emetic (2 to 3
drachms), and Lafosse ipecacuan (1 oz. of the wine) to rouse the
walls of the rumen, and more recently pilocarpin (ox 3 grs.), eserine
(ox 2 grs.) and barium chloride (ox 15 grs.), have offered themselves
for this purpose. The three last have the advantage of adaptability to
hypodermic use, and prompt action. The repetition of stimulants and
nux vomica may be continued while there appears any prospect of
restoring the normal functions of the paunch, and when all other
measures fail the only hope lies in rumenotomy.
Rumenotomy. The warrant for this operation is found in the
entire lack of movement in the rumen, the absence of eructation, the
cessation of rumbling and motion of the bowels, and the deepening
of the stupor in which the patient is plunged. The longer the delay
and the deeper the stupor and prostration the less the likelihood of a
successful issue from the operation. The animal is made to stand
with its right side against a wall, and its nose held by the fingers or
bulldog forceps. If judged necessary a rope may be passed from a
ring in the wall in front of the shoulder around the animal to another
ring behind the thigh and held tight. Or a strong bar with a fulcrum
in front, may be pressed against the left side of the body, and well
down so as to keep the right side fast against the wall. A line may be
clipped from the point of election for puncture in tympany down for
a distance of six inches. A sharp pointed knife is now plunged
through the walls of the abdomen and rumen in the upper part of
this line, and is slowly withdrawn, cutting downward and outward
until the opening is large enough to admit the hand. The lips of the
wound in the overdistended stomach will now bulge out through to
the wound in the abdominal walls, and three stitches on each side
may be taken through these structures to prevent displacement as
the stomach is emptied and rendered more flaccid. A cloth wrung out
of a mercuric chloride solution may be laid in the lower part of the
wound to guard against any escape of liquid into the peritoneal
cavity. The contents may now be removed with the hand, until the
organ has been left but moderately full. Two or three stable
bucketfuls are usually taken, but it is by no means necessary nor
desirable that the rumen be left empty, as a moderate amount of food
is requisite to ensure its functional activity. As a rule at least fifty
pounds should be left. Before closing the wound and especially in
cases due to dry feeding, it is well in a tolerably large animal to
introduce the hand through the demicanal to ascertain if impactions
exist in the third stomach and to break up these so far as they can be
reached. This done, the edges of the wound in the stomach are to be
carefully cleansed, washed with the mercuric chloride solution and
sewed together with carbolated catgut, care being taken to turn the
mucosa inward and to retain the muscular and peritoneal layers in
close contact with each other. It will usually be convenient to cut first
the two lower stitches through the abdominal walls, and suture from
below upward. When finished the peritoneal surface of the gastric
wound may be again sponged with the mercuric chloride solution,
together with the edges of the wound in the abdominal walls. Finally
the abdominal wound is sutured, the stitches including the skin only
or the muscular tissues as well. The smooth surface of the paunch
acts as an internal pad and support, and with due care as to
cleanliness, antisepsis and accuracy of stitching, it is rare to find any
drawback to continuous and perfect healing. It is well to restrict the
animal for three days to well boiled gruels, and for ten days to soft
mashes in very moderate amount lest the wound in the paunch
should be fatally burst open before a solid union has been effected.
RUMINITIS. INFLAMMATION OF THE
RUMEN.
Prevalence in different genera. Causes, as in tympany and impaction, irritants,
specific fevers. Symptoms: impaired rumination, tympanies, impactions, depraved
appetite, fever, nervous disorders. Lesions: hyperæmia, petechiæ, exudates, ulcers,
desquamation, swollen or shrunken papillæ. Treatment: remove cause,
mucilaginous food, or gruels, sodium sulphate, or chloride, bismuth, bitters,
mustard cataplasm, electricity.
This is not a prevalent disease but affects animals at all periods of
life and is a cause of tardy and difficult digestion and rumination. It
usually shows itself as a catarrhal inflammation and by favoring
fermentation in the food, and torpor of the muscular walls of the
organ contributes to tympany and impaction. It is more common in
the ox than in the sheep owing, perhaps, to the more habitual
overloading of the stomach and to the hurried, careless manner of
feeding. In the goat it is rare.
Causes. Among the causes may be named tympany and
overloading, so that all the dietary faults that lead to these may be set
down as causes of inflammation. Irritants taken with the food,
whether in the form of acrid plants (ranunculaceæ, euphorbiaceæ,
etc.), musty fodder, irritant products in spoiled fodder, aliments
which are swallowed while very hot or in a frozen state, and foreign
bodies of an irritating kind are especially liable to induce it.
Congestions of the paunch are not uncommon in specific infectious
diseases like Rinderpest, malignant catarrh, anthrax, and Texas
fever, and specific eruptions sometimes appear in aphthous fever
and sheeppox.
Symptoms. Rumination is slow and irregular, appetite capricious,
tympanies appear after each feed, and there is a marked tendency to
aggregation of the ingesta in solid masses, which resist the
disintegration and floating which is necessary to rumination, and
favor the occurrence of putrid fermentation. There is usually a
tendency to lick earth, lime from the walls, and the manger, and a
depraved appetite shown in a desire to chew and swallow foreign
bodies of many kinds. Vomiting or convulsive rejection of the
contents of the rumen is not unknown (Vives, Pattaes). There is
slight fever with heat of the horns and ears, dry muzzle, and
tenderness to pressure on the left flank. The bowels may be
alternately relaxed and confined, and bad cases may end in a fatal
diarrhœa. In other cases the disease may become acute and develop
nervous symptoms, as in tympany and impaction. When the disease
takes a favorable turn, under a careful ration, recovery may be
complete in eight or ten days.
Lesions. These are violet or brownish patches of hyperæmia on the
mucosa of the rumen, circumscribed ecchymoses, exudates in the
sense of false membranes and even pin’s head ulcerations. On the
affected portions the mucosa is swollen, puffy, dull and covered with
mucus, and epithelium may desquamate. The papillæ are often red,
and thickened or shrunken and shortened. In the specific affections
like aphthous fever and sheeppox the lesions are rounded vesicles
containing liquid. The ingesta is more or less packed in masses.
Treatment. If irritant foreign bodies have been taken rumenotomy
is demanded. If caustic alkalies, acetic or other mild acid. If acids,
lime water or magnesia. Feed well boiled flax seed, or farina gruels,
and wheat bran or middlings in limited quantity. Solids may be at
first withheld, coarse or indigestible food must be. It may be
necessary to rouse the organ by 10 or 12 ozs. of sulphate of soda with
a little common salt and abundance of thin gruels as drink. As a tonic
the animal may take nitrate of bismuth ½ oz., powdered gentian ½
oz., and nux vomica 20 grains, twice a day. The application of a
mustard pulp or of oil of turpentine on the left side of the abdomen
may also be resorted to. A weak current of electricity through the
region of the paunch for twenty minutes daily is often of great
service.
HAIR BALLS IN THE RUMEN AND
RETICULUM. EGAGROPILES.
Balls of hair, wool, clover hairs, bristles, paper, oat-hair, feathers, chitin, mucus,
and phosphates. Causes: Sucking and licking pilous parts, eating hairy or fibrous
products. Composition. Symptoms: Slight, absent, or, gulping eructation,
vomiting, tympany, in young putrid diarrhœa, fœtid exhalations, emaciation.
Diagnosis. Treatment.
Definition. The term egagropile, literally goat-hair, has been given
to the felted balls of wool or hair found in the digestive organs of
animals. The term has been applied very widely, however, to
designate all sorts of concretions of extraneous matters which are
found in the intestinal canal. In cattle the hair licked from their skin
and that of their fellows rolled into a ball by the action of the
stomach and matted firmly together with mucus and at times traces
of phosphates, are the forms commonly met with. In sheep two
forms are seen, one consisting of wool matted as above and one
made up of the fine hairs from the clover leaf similarly matted and
rolled into a ball.
In pigs the felted mass is usually composed of bristles,
(exceptionally of paper or other vegetable fibre), and in horses felted
balls of the fine hairs from the surface of the oat, mingled with more
or less mucus and phosphate of lime make up the concretion. These
are found in the stomach, and intestines. In predatory birds the
feathers and in insectivorous birds chitinous masses are formed in
the gizzard and rejected by vomiting.
Causes. Suckling animals obtain the hair from the surface of the
mammary glands hence an abundance of hair or wool on these parts
favors their production. The vicious habit of calves of sucking the
scrotum and navel of others is another cause. In the young and adult
alike the habit of licking themselves and others especially at the
period of moulting is a common factor.
Composition. Hair, wool, and the fine hairs of clover are the
common predominant constituents, but these are matted together
more or less firmly by mucus and phosphates, the ammonia-
magnesian phosphate uniting with the mucus and other matters in
forming a smooth external crust in the old standing balls of adult
animals. The centre of such balls is made up of the most densely
felted hair. In balls of more recent formation the external crust is
lacking and the mass is manifestly hairy on the surface, and the
density uniform throughout. These have a somewhat aromatic odor,
contain very little moisture, and have a specific gravity
approximating .716 (sheep) to .725 (ox). Ellagic, and lithofellic acids,
derivatives of tannin, are usually present, and are abundant in the
egagropiles of antilopes.
In the balls of recent formation, as seen especially in sucking
calves, the hair is only loosely matted together, and often intermixed
with straw and hay, and is saturated with liquid and heavier than the
old masses. These are usually the seat of active putrefactive
fermentation, and being occasionally lodged in the third or even the
fourth stomach, the septic products act as local irritants, and general
poisons. They are therefore far more injurious than the consolidated
hairballs of the adult animal, and often lay the foundation of septic
diarrhœas and gastro-enteritis.
The balls may be spherical, elliptical, ovoid, or, when flattened by
mutual compression, discoid.
Symptoms. Generally these balls cause no appreciable disturbance
of the functions of the stomach. This is especially true of the large,
old and smoothly encrusted masses. The museum of the N. Y. S. V.
College contains specimens of 5½ inches in diameter, found after
death in a fat heifer, which had always had good health and which
was killed for beef. This is the usual history of such formations, they
are not suspected during life, and are only found accidentally when
the rumen is opened in the abattoir.
The smaller specimens, the size of a hen’s or goose’s egg, or a
billiard ball, have produced severe suffering, with gulping,
eructation, vomiting and tympany from obstruction of the demicanal
or gullet, and such symptoms continued until the offending agents
were rejected by the mouth. (Caillau, Leblanc, Prevost, Giron). Again
they may block the passage from the first to the third stomach
(Schauber, Feldamann, Adamovicz, Tyvaert, Mathieu).
In calves on milk they are especially injurious as beside the
dangers of blocking the passages already referred to, the unencrusted
hairs and straws irritate the mucous membranes and still worse, the
putrid fermentations going on in their interstices, produce irritant
and poisonous products, and disseminate the germs of similar
fermentations in the fourth stomach and intestine. Here the
symptoms are bloating, colics, impaired or irregular appetite, fœtid
diarrhœa, fœtor of the breath and cutaneous exhalations, and rapidly
progressive emaciation.
Diagnosis is too often impossible. Tympanies, diarrhœa, colics,
etc., may lead to suspicion, but unless specimens of the smaller hair
balls are rejected by the mouth or anus there can be no certainty of
their presence. If arrested in the cervical portion of the gullet they
may be pressed upward into the mouth by manipulations applied
from without. The looped wire extractor may be used on any portion
of the œsophagus. If lodged in the demicanal the passage of a
probang will give prompt relief. If retained in the rumen and
manifestly hurtful, rumenotomy is called for as soon as a diagnosis
can be made.

You might also like