You are on page 1of 374

Victor 

Lazzarini

Computer
Music
Instruments II
Realtime and Object-Oriented Audio
Computer Music Instruments II
Victor Lazzarini

Computer Music Instruments II


Realtime and Object-Oriented Audio
Victor Lazzarini
Department of Music
Maynooth University
Maynooth, Kildare, Ireland

ISBN 978-3-030-13711-3 ISBN 978-3-030-13712-0 (eBook)


https://doi.org/10.1007/978-3-030-13712-0

Library of Congress Control Number: 2017953821

© Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained herein
or for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword

Today’s tools for music production have become increasingly democratised. Since
the advent of the personal computer in the 1980s, means of audio synthesis, record-
ing, editing and processing have become available to the general public. Before that
time, a composer or other creative individual would need to go to a big studio or
a computer centre to be able to work professionally with sonic creations. Likewise
means of content distribution and tools for reaching audiences have become gen-
erally available, both for passive and for interactive listening media. Seen together,
these technological changes have deeply affected the conditions for creative audio
work. With wider and more affordable access, many more individuals from diverse
backgrounds can work in this manner, and also the possible outcomes have multi-
plied. In tandem with this evolution, we have seen that the tools have become easier
and easier to use. Many aspects of the expert knowledge of audio practitioners of
earlier decades have been coded into the tools. Any piece of technology will af-
fect the possible outcomes of a production process utilising it. This is also the case
with audio production tools, by means of the affordances given to the creative indi-
viduals working with them. With ease of use comes also a delimitation of possible
outcomes: some of the tools offered to the broad mass of creative consumers can
be said to offer ‘off-the-shelf creativity’. The individual using these tools is not so
much creating but instead recombining the elements offered in pleasing ways. With
some of the creative decisions being aided by the properties of the tools used, it
becomes increasingly important to be able to make our own tools. This book pro-
vides a solid basis for doing so by introducing computational concepts and audio
programming paradigms together with a firm foundation in programming.
As the book starts with the basics of the operating system, we are never lost for
context. We then deal with compiling and running programs, getting to know C and
C++ from the ground up and then proceed directly into realtime audio program-
ming. There’s as much DSP as we need to get to work and make things. Then, by
the time the need for more occurs, the reader’s general acquaintance with the field
through practical work means that they should be well equipped to understand the
literature needed to solve specialised problems outside the scope of this book. The
interleaving of programming languages, by means of interfacing them with each

v
vi Foreword

other, allows freedom to choose the best tool for the job. This ability to create freely
also allows freedom from the imperatives of commercial actors, as well as freedom
to create commercial products should one wish to do so.
I got to know Victor through international communities for open source audio
programming, first and foremost though the Csound community. I deeply respect
Victor’s skills as a programmer, composer, musician, researcher and writer. His pro-
ductivity seems to know no limit. I had the good fortune of contributing to the book
‘Csound: A Sound and Music Computing System’ together with Victor, John ffitch,
Steven Yi, Iain McCurdy and Joachim Heintz in 2016. I also count myself lucky
to be working with Victor in a current research project on crossadaptive process-
ing, where we have also developed new methods of live convolution together with
Sigurd Saue.
With all of the creative freedom afforded by the knowledge presented in this
book, one could easily forget an additional benefit of this manner of working: trans-
parency. For any future research on the creation process, to be able to trace the steps
taken in the production, and to be able to study the intentions and incentives invested
in the process of a work’s creation could be of great value. Many of today’s tools for
the creative industry are closed source commercial products that are not compatible
across versions of the same tool. This makes archiving for one’s own purposes a
hard task, and archiving for longer-term purposes nearly impossible. This is not to
say that all our current creations deserve to be studied in the future, but it might
just happen that someone sometime may be interested in knowing what we did and
how we worked. Working with open source software does not in any way guaran-
tee that our projects can be run on future versions of the same software. It merely
allows the possibility for someone interested to be able to decode how the software
was supposed to work, and then by careful reconstruction to be able to create the
environment to open those saved projects. Reconstruction will always be time con-
suming, but by using open source, at least we offer the opportunity to do so.

Trondheim, March 2018 Øyvind Brandtsegg


Preface

This book can be read in a number of different ways. First and foremost, it is a com-
panion volume to Computer Music Instruments: Foundations, Design and Develop-
ment. Here, many ideas and concepts introduced in that book are broken down and
explored at a lower level. Another way to read this book is to take it as a fairly com-
plete course on C11 programming, with a slant towards sound and music computing,
and an added introduction to key concepts of C++ and object-oriented programming
(OOP). It is also possible to take this as an applied Digital Signal Processing text,
which uses programming to discuss mathematical concepts. I would also think that
a number of other readings can be attempted.
In any case, this book is complementary to its companion, but can also be taken
on its own, as an independent text. It is true that many ideas explored here at an
implementation level work out the elements of what was described there in more
formal ways. There is however a conscious choice (in both books actually) to de-
velop everything from first principles. In this text, we will also pay some attention
to the discipline involved in writing code, and for this reason, programming prob-
lems are suggested in each chapter. It is my belief that we can only achieve fluency
with plenty of practice, and readers who want to achieve a good level of C/C++
programming skills should attempt to solve every exercise proposed.
The book is divided into two parts, the first of which, as I have outlined above,
is a comprehensive exploration of the C programming language and fundamental
programming concepts, from the ground up. The fact that this language can be dis-
cussed fully in this space is one of the great attributes of C: being small. Part I traces
a journey from zero to complete realtime audio programming. It equips readers with
all the tools necessary to create realtime audio instruments at a reasonably low level.
From early on, it prioritises examples and applications that have direct relevance to
making sound with computers.
Chapter 1 introduces the reader to the desktop programming environment. In
some ways, it picks up where we left off in the first Computer Music Instruments
book, where a description of modern computing platforms for music making was of-
fered. In the following chapters, we introduce all the components of C programming
in a stepwise manner: data types, variables, arithmetics, input and output, control of

vii
viii Preface

flow, arrays, pointers, functions, and data structures. By the time we reach Chapter
8, all of the language has been dealt with, and we start looking at key elements of
the C standard library, such as memory allocation, and file input and output.
From Chapter 10 onwards, the focus is completely turned on to sound computing.
In fact, we had introduced principles of audio signals as early as Chapter 4. As
soon as we find some means of iterating operations, we are off producing sound
waveforms. We discuss realtime audio synthesis and processing in Chapter 11 and
complement it with MIDI control in the last chapter of Part I. At this stage, many
key concepts of audio programming have been explored and we are ready to dive
into DSP components, which is one of the main themes of Part II.
The other theme, of course, is OOP. Throughout the chapters in Part II, we con-
tinuously demonstrate how this paradigm is extremely useful for the modelling of
computer music instruments. In Chapter 13, we introduce it gently by applying its
principles to the development of a cornerstone of sound synthesis: the oscillator.
Each chapter in Part II is devoted to a set of instrument components that are paired
with key C++ programming concepts. Midway through, we are able to discuss the
development of a fully-fledged object-oriented library, AuLib, which is used to il-
lustrate the discussion of DSP algorithms, as well as OOP.
The following two chapters are devoted to specific audio processing concepts:
delay lines and spectral manipulation. The latter connects very firmly with its com-
panion text, Chapter 7 of Computer Music Instruments, and provides a complemen-
tary perspective to it. It covers similar ground, but uses programming as the main
means to explore frequency-domain processing in a mostly non-mathematical way.
The book closes with a look at the concept of plugins, also from an object-oriented
perspective. At this point, we return, full circle, to Csound and study the means of
developing the building blocks of instruments, opcodes, using C++. This final chap-
ter connects very closely with the topics in the companion text, as it provides the
means to implement in a native form many of the principles outlined in that earlier
book.
The target audience of this book is aligned with that of its predecessor. While
some understanding of acoustics and electronic music would be helpful in assisting
the reader to understand some applications, it is not strictly necessary to have prior
knowledge of audio DSP or even programming. Familiarity with other languages is
also not a requirement, but may allow a faster progression through the first part of
the book. C/C++ programmers with no experience with audio may be able to jump
into the specific sections dealing with sound and music computing. Together with its
companion volume, the present book aims to provide a comprehensive discussion
of computational instruments for sound and music.

Maynooth, March 2018 Victor Lazzarini


Acknowledgements

Much of this book has been the result of over fifteen years of audio programming
teaching at postgraduate level to music technology students. The flow and balance
of topics has been tested in a large number of classes and seminars over the years. So
I am deeply indebted to all of the students who have worked with me over the years,
some of whom have gone on to become researchers, lecturers, and developers, and
have made great contributions to the field themselves. In particular, I would like to
thank Rory Walsh for taking the time to read some of the trickier sections of this
book, helping me to pitch them at the right level, and providing useful comments.
I would also like to acknowledge the help and encouragement of the computer
music community, as well as the various contributions to software development,
ideas, and concepts that have arisen from them. Special thanks should go to col-
leagues in the Csound development team John ffitch, Steven Yi, Tarmo Johannes,
Joachim Heintz, Stephen Kyne, François Pinot, Alex Hoffmann, and Bernt Isak
Waerstad, for their input into this open-source project and also for the enlighten-
ing discussions on all matters to do with audio programming and beyond.
I am very grateful for the endorsement given by Øyvind Brandtsegg, who very
kindly wrote the foreword for this book. Our collaboration stretches back many
years, and recently I have had the chance to work closely with him and Sigurd
Saue on some very interesting musical signal processing bits and pieces, which have
indirectly contributed to elements in this book.
It is important to note the continued support of Ronan Nugent at Springer, who
has been very helpful in facilitating the editorial process for this book.
As ever, the work for this book has been thoroughly supported by the patience
and help I get from my wife Alice, and our children Danny, Ellie, and Chris. They
are an integral part of any achievements I might be in a position to claim.

ix
Contents

Part I Towards Realtime Audio in C

1 Introduction to the Programming Environment . . . . . . . . . . . . . . . . . . . . 3


1.1 The Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 The File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 The Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.4 The Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.5 The POSIX Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 The C/C++ Toolchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Compilers and Interpreters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Compiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.3 Running Programs from the Terminal . . . . . . . . . . . . . . . . . . . 11
1.3 Introduction to C Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Character and Keyword Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.2 Entry Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 The shin Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Data Types and Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19


2.1 Variables and Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.2 Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.3 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.4 Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Initialisation, Assignment and Arithmetic Operations . . . . . . . . . . . . . 24
2.2.1 Variable Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.2 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

xi
xii Contents

2.2.4 Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.5 Arithmetic Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.6 The sizeof Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Standard Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


3.1 Printing to the Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 The Format String . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Getting Input from the Terminal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.1 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Character Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 The calc Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Control of Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1 Conditional and Logical Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Conditional Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1 Conditional Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.1 The while and do – while Loops . . . . . . . . . . . . . . . . . . . . 45
4.4.2 The for Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.3 The break and continue Statements . . . . . . . . . . . . . . . . . 48
4.5 A First Synthesis Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5.1 Plotting the Waveform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5.2 Playing the Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5.3 Other Waveforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Arrays and Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55


5.1 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.1 Two-Dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3 Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4 Pointers and Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.1 Pointer Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.2 Pointers and Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Contents xiii

6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1 Function Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.1.1 Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.1.2 Variable Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.1.3 Call Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.1.4 Function Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1.5 Parametrised Macros and Inline Functions . . . . . . . . . . . . . . . 70
6.1.6 Variable Argument Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.1.7 Recursive Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Modular Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Pointers to Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 The C Standard Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5 Another Synthesis Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.5.1 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.5.2 Realtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.6 Arguments to main() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.6.1 Translating Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.1 Defining a New Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.1.1 Member Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.1.2 Pointers to Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 Functions in Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.3 Unions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.4 Enumerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.5 Bitwise Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.5.1 Bitwise Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.5.2 Bitshift Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

8 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.1 Allocating Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.1.1 Reallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.1.2 Freeing Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1.3 Setting and Copying Memory Blocks . . . . . . . . . . . . . . . . . . . 97
8.2 Dynamic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.3 Linked Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
xiv Contents

9 File Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105


9.1 Standard C Library File IO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.2 Text File Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.3 Direct File IO Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.3.1 Reading/Writing Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.3.2 Error Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.4 File System Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.5 Programming Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.5.1 The tobin Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.5.2 External Score Generation for Csound . . . . . . . . . . . . . . . . . . . 111
9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

10 Soundfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
10.1 Digital Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
10.1.1 Sampling Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
10.1.2 Sample Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.1.3 Audio Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
10.2 Basic Operations on Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.2.1 A Synthesis Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.2.2 Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
10.2.3 Self-Describing Soundfile Formats . . . . . . . . . . . . . . . . . . . . . . 121
10.3 The libsndfile Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.3.1 Opening Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.3.2 Reading and Writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
10.3.3 Seeking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.3.4 An Example Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

11 Realtime Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131


11.1 Portaudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
11.1.1 Listing Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
11.1.2 Stream Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
11.1.3 Opening Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
11.1.4 Synchronous Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
11.1.5 Asynchronous Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
11.1.6 Closing Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
11.1.7 The todac Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
11.1.8 An Audio Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
11.2 The Jack Connection Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
11.2.1 Opening a Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
11.2.2 Registering Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
11.2.3 The Processing Callback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
11.2.4 Connecting Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Contents xv

11.2.5 Closing a Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149


11.2.6 Application Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
11.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

12 Realtime MIDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155


12.1 The Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
12.1.1 Hexadecimal Notation Revisited . . . . . . . . . . . . . . . . . . . . . . . 156
12.1.2 MIDI Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
12.1.3 Packing and Unpacking the Status Byte . . . . . . . . . . . . . . . . . 158
12.2 MIDI Programming Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
12.2.1 MIDI on MacOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
12.3 MIDI Programming with Portmidi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
12.3.1 Timers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
12.3.2 Opening Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
12.3.3 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
12.3.4 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
12.3.5 A MIDI Synthesiser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
12.4 MIDI on Jack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
12.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
12.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Part II Object-Oriented Audio in C++

13 Oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
13.1 Moving to C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
13.1.1 C++ Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
13.1.2 Overloading and Optional Parameters . . . . . . . . . . . . . . . . . . . 190
13.1.3 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
13.2 The Table Lookup Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
13.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

14 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
14.1 Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
14.2 Cubic Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
14.3 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
14.3.1 Polymorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
14.3.2 Oscillator Inheritance Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
14.4 Function Table Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
14.5 Reference Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
14.5.1 Copy Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
14.5.2 Object Reference Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . 212
14.5.3 Self References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
14.6 Phase Generators and Table Readers . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
xvi Contents

14.6.1 The Phasor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215


14.6.2 Table Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
14.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

15 Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
15.1 Envelope Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
15.1.1 Linear Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
15.1.2 Exponential Envelopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
15.2 Access Control and Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
15.2.1 Namespaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
15.2.2 A Line Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
15.3 Operator Overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
15.3.1 Standard IO Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
15.4 An Audio Output Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
15.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

16 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
16.1 Feedback Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
16.1.1 First-Order Tone Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
16.1.2 Second-Order Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
16.1.3 Fourth-Order Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
16.1.4 Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
16.2 Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
16.2.1 Templates in the Standard C++ Library . . . . . . . . . . . . . . . . . . 244
16.2.2 Range-Based Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
16.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

17 AuLib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
17.1 Object-Oriented Audio Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
17.2 Library Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
17.2.1 Stateful versus Stateless Representations . . . . . . . . . . . . . . . . . 251
17.2.2 Abstraction and Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . 253
17.2.3 Code Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
17.2.4 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
17.3 A Tour of the Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
17.3.1 Signal Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
17.3.2 Signal Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
17.3.3 Audio Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
17.4 Synthesis and Processing Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
17.5 An AuLib Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
17.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Contents xvii

18 Delay Line Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265


18.1 Circular Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
18.2 Fixed-Delay Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
18.2.1 Comb Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
18.2.2 All-Pass Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
18.3 Variable Delay Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
18.4 Multiple Taps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
18.4.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
18.5 Lambda Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
18.5.1 Auto Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
18.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

19 Frequency-Domain Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287


19.1 Fundamental Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
19.1.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
19.1.2 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
19.2 The Fast Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
19.2.1 Real-to-Complex and Complex-to-Real Transforms . . . . . . . 298
19.3 Fast Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
19.3.1 Overlap Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
19.3.2 Overlap Save . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
19.3.3 Multiple Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
19.3.4 Convolution Reverb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
19.4 Streaming Spectral Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
19.4.1 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
19.4.2 Resynthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
19.4.3 Spectral Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
19.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324

20 Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
20.1 Plugins in Csound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
20.2 Framework Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
20.2.1 The Base Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
20.2.2 Deriving Opcode Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
20.2.3 Registering Opcodes with Csound . . . . . . . . . . . . . . . . . . . . . . 331
20.3 The Csound Engine Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
20.4 Opcode Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
20.4.1 Delay Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
20.4.2 Table-Lookup Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
20.4.3 Text Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
20.4.4 Spectral Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
20.4.5 Array Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
20.4.6 External Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
xviii Contents

20.4.7 Multithreading Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343


20.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

Appendix

A AuLib Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347


A.1 Library-Wide Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
A.2 AudioBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
A.3 Deriving New Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
A.4 Audio DSP Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
A.5 Control Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
A.5.1 MIDI Synth Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
A.6 Other Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
A.7 Building AuLib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Acronyms

0dbfs Zero decibel full scale


ADC Analogue-to-Digital Converter
ADSR Attack-Decay-Sustain-Release
AP All Pass
API Application Programming Interface
BP Band Pass
BR Band Reject
cps cycles per second
DAC Digital-to-Analogue Converter
dB Decibel
DFT Discrete Fourier Transform
DSP Digital Signal Processing
FFT Fast Fourier Transform
FIFO First In First Out
FIR Finite Impulse Response
FS File System
GUI Graphical User Interface
HAL Hardware Audio Layer
HP High Pass
Hz Hertz
IDFT Inverse Discrete Fourier Transform
IF Instantaneous Frequency
IIR Infinite Impulse Response
IO Input-Output
IR Impulse Response
ISTFT Inverse Short-Time Fourier Transform
LFO Low Frequency Oscillator
LP Low Pass
LSB Least Significant Byte
MIDI Musical Instrument Digital Interface
MSB Most Significant Byte

xix
xx Acronyms

OLA Overlap-Add
OLS Overlap-Save
OOP Object-Oriented Programming
OS Operating System
PCM Pulse Code Modulation
PID Process Identifier
PV Phase Vocoder
RMS Root Mean Square
STFT Short-Time Fourier Transform
Part I
Towards Realtime Audio in C
Chapter 1
Introduction to the Programming Environment

Abstract The desktop programming environment is explored, from the perspective


of its major software components. We begin by discussing the concept of operat-
ing systems, and their main components: file system, terminal, and commands. The
C/C++ toolchain is introduced as the fundamental collection of software that will
support all the work in this book. Finally, we take a first look at the C language and
its basic elements.

The C/C++ programming environment of a modern desktop computer comprises


a complex collection of software sometimes called the compiler toolchain. It in-
cludes programs to transform code written in plain text into a form that can be exe-
cuted, as well as a number of utilities to help the development process. In addition
to these, two other key components are essential. The first one of these is a program
called a text editor, which is also widely employed, to create the plain text files that
contain program source code. The other one is a command interpreter, sometimes
called the terminal, used by the developer to invoke the different programs needed to
build software. In this chapter, we will introduce these components of the program-
ming environment, which will be used throughout the book to develop software in
C/C++.

1.1 The Operating System

In order to run any programs, computers generally depend on a fundamental soft-


ware set called the operating system (OS) [60], which is made of several compo-
nents that provide the support for applications to run. At the core of the OS sits the
kernel, which provides the basic functionality for the operation of a computer, for
instance, the instructions to communicate with the different peripherals, memory,
input devices (e.g. keyboard, mouse), output (screens, etc.), and disk files, and to
load and run programs, among other things. For personal computers, the most com-
monly used OSs are MS Windows, MacOS and Linux. In mobile environments, iOS

© Springer Nature Switzerland AG 2019 3


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_1
4 1 Introduction to the Programming Environment

and Android are fairly ubiquitous. This book will concentrate on development under
and for UNIX-like operating systems [27], which include MacOS and Linux on the
desktop side1 as well as iOS and Android for mobile devices2 . From now on, all
discussion will assume this type of development environment.

1.1.1 The File System

A key component of the OS is the file system (FS) [53], which is the software
responsible for the storing of data in a permanent (or in some cases temporary)
form. There are different types of FS, but we normally do not need to worry about
the specific characteristics of these in normal use. Most of them operate in a similar
way to organise stored data in terms of its logical units, files and directories. The
former store data (of various types, such as text, images, sound, etc.), and the latter
are used as containers for files or directories themselves.
The FS is organised hierarchically as a tree, starting from a root, with backslashes
representing symbolically the separation between levels. The root will generally
contain files and directories that have a system-wide relevance, such as user pro-
grams and configuration data. Under this, we will also find a directory containing
a set of user directories, one for each user registered in the system. A user direc-
tory for a given username is known as its home directory. That is where all files or
directories created and manipulated by that user will be stored.
For example, a user directory for the username jane in the MacOS FS is denoted
by /Users/jane, with the different directory levels separated by the forward
slash symbol \. Its location in the FS tree is shown in Fig. 1.1.

/Users

/jane

Fig. 1.1: The directory /Users/jane in the MacOS FS tree.

1 It is possible to emulate a UNIX-like environment under Windows, using the Msys/MinGW or

Cygwin software tools. See http://www.mingw.org and https://www.cygwin.com for more details
on these tools.
2 The common practice with mobile applications is to develop them on a desktop system, rather

than directly on the device themselves. Thus, we will concentrate on the use of desktop OSs in this
book.
1.1 The Operating System 5

Its parent directory is /Users (holding all user directories), and the parent di-
rectory to that is / (the root directory, which contains all directories). The unique
directions under the FS to that given home directory, called the path is given as
/Users/jane. Thus, each file or directory in the FS has a given path to it, for
example:
• /: the root directory
• /Users/jane/mysrc.c: a file in Jane’s home directory
• /usr/bin/cc: the cc command in the /usr/bin directory

As hinted above, files can be of various types, but a fundamental distinction can
be made between two types of files:

1. Those that hold data (text, sound, photos, etc.).


2. Programs: executables.

The basic difference is that program files are marked by the FS in a way that
identifies them as executables, i.e. containing code that can be loaded and run. Data
files are not marked in this way and thus cannot be run (but can be opened in pro-
grams for viewing, editing, playing, etc.). Another distinction can be made between
two types of data file with regards to the format of their contents:

1. Plain text.
2. Other unspecified data (sound, photos, word-processed text, etc.).

The first type is very important for us, as we will use it to hold the source code
for programs. It holds only text encoded using a given character set. For C/C++ pro-
grams, these files should use the ASCII character set and nothing else. This means
that we need to be careful that the files we are using are produced correctly, without
any extraneous characters. To ensure this, we should always edit source code using
a plain text editor (and not, for instance, a word processor3 ).

1.1.2 The Terminal

The terminal [3, 46, 59] is an application that contains a command interpreter pro-
gram, called the shell, which allows the user to type in and execute programs (or
commands). In general, the OS allows any user program to be run from the termi-
nal, including graphical and non-graphical programs. The former will in most cases
be launched as a separate window, whereas the latter will run under the shell. A
terminal can hold more than one command interpreter, either at the same time (sep-
arately in a different window or tab) or as a subprocess of a parent shell. An OS
often has several different shell programs, which can be chosen by the user. The
3 Word processors produce files in different formats that often include a mixture of plain text and
other formatting information, which puts these in the second category of data files as described
above.
6 1 Introduction to the Programming Environment

most common of these is bash (the default in MacOS), or /bin/bash (full path),
which is based on the original UNIX Bourne shell4 [27]. Once the terminal is open,
it will start the default system shell. The following discussion assumes this to be
bash.
The shell gives you a prompt denoted by a symbolic character5 (for instance, $ is
commonly used for this purpose, and we will adopt it throughout this book) where
you can type commands and press the enter key to execute them. In most shell
programs, the up and down arrow keys allow you to recall older command lines.
Commands are made up of
$ [command] [argument] [argument] ...
where [command] stands for the program you want to run, and is followed by
a number of optional or required arguments (depending on the command), which
are passed as parameters to the program. Programs and arguments are separated by
blank spaces.
The shell is always opened in a given directory of the FS. This is called the
working directory, and it is normally the user home directory when the shell is
started. The working directory can also be identified by a dot (./); its parent (the
one that contains it) is denoted by a double dot (../). We can get the path to the
working directory with the command pwd (print working directory):
$ pwd
/Users/jane
It is possible to navigate around the FS using the command cd (change direc-
tory):
$ cd [directory]
where [directory] is the path of the directory we want to go to. The path can be
relative to the current (working) directory or absolute from the FS root. For instance:
$ pwd
/Users
$ cd /
$ pwd
/
$ cd
$ pwd
/Users/jane
Note that the cd command with no arguments always bring us back to the home di-
rectory. We can navigate to anywhere in the FS where we have the right permissions
to do so. In particular, we should be able to go anywhere in our user directory.

4The command /bin/sh can also be used, generally invoking the default system shell.
5 In some cases this is preceded by the machine name, working directory, and/or username. For
example, the full prompt for where I am working now is ligeti:src victor$.
1.1 The Operating System 7

A number of commands are going to be useful for looking at and manipulating


files through the shell. The command ls is used to list files in a directory. You can
check that it matches the names that you can get using the graphical file finder/-
manager program in your system (e.g. Finder on MacOs). The ls command can
also show hidden files and a long listing of file names and attributes if you use the
optional arguments -a (all) and -l (long). These types of options that are given to
some commands are also known as flags. The long listing shows us the owner of the
file, its group, and the permissions associated to the owner, members of the group,
and all other users in the system. For instance, the following two entries
drwxr-xr-x 6 jane staff 204 13 Jun 13:44 audio
-rw-r--r-- 1 jane staff 2371 12 Jul 2016 voice.txt
can be interpreted as follows:
1. The first letter: d (directory) or - (file).
2. The first group of three letters, rwx: permissions to read, write or execute (if
present) for the owner. In order for directories to be opened, they need to have
the x permission.
3. The second group of three letters: permissions for the members of the group
staff.
4. The third group of three letters: permissions for all other users.
The other information in the long list provides the owner, group, size (in bytes),
date and name. Generally speaking, files created by the user will be owned by her
and will generally have permissions for reading only to group and others. Executable
files (programs) will have x permissions.
The OS provides commands for moving (renaming), copying, deleting, and view-
ing contents of files. It also provides means of making new directories and removing
empty directories. Here is a short list of these commands:
• mv: move files from one name (path) to another.
• cp: copy files from one location to another.
• rm: remove files permanently.
• cat: concatenate (show) the contents of a file.
• mkdir: create a new directory.
• rmdir: remove an empty directory.
The shell and some of the commands it runs can be configured through the use of
environment variables. These hold values that can influence how the shell or other
programs behave. An important such variable is PATH, which keeps the names of
directories where the shell will look to find executables to run. If a command file is
not in this list of directories, it will not be found and cannot be executed. The system
gives users a basic pre-filled PATH with the most common executable directories in
it. In order for us to check the value of an environment variable we prepend a dollar
sign ($) to it, and pass it as an argument to the echo command:
$ echo $PATH
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
8 1 Introduction to the Programming Environment

The echo program prints to the terminal (shows) all of its arguments, in this case
the value ($) of the variable. Each directory in the PATH is separated from the next
by a colon (:), as we can see in the example. Generally your working directory (.)
is not in the path. This means that any programs in it will not be found and cannot
be executed unless the full path is given. You can type ./ before the program name
in this case to indicate that you want to run an executable file from your directory.

1.1.3 Processes

When programs are executed by the OS, they do so under a process. For example,
the shell program, which takes in input from the user and can start other programs, is
a process run by the OS. Several such processes are being executed concurrently in
a system, each one with their own access to resources, memory space, etc. A process
has one or more threads executing at the same time, which run independently but
share resources. Processes have an owner, which for programs started by the user,
is generally the user herself, and a number called the process identifier (PID). It is
possible to get a list of active processes, their PIDs, as well as their owners using the
command ps. For instance the following line prints the PIDs and full pathnames to
all processes running on a system:
$ ps -A
A user may kill her own processes using the kill command and the relevant
PID. Alternatively, a process can be stopped by name, using killall:
$ kill PID
$ killlall program_name
Finally, a process can be started in the background, returning to the shell immedi-
ately, before it completes execution. This is often used with graphical user interface
(GUI) programs, when run from the shell. In this case they will open the program
window and return to the shell for the user to continue to type commands into it. To
run a process in the background we use an ampersand (&) at the end of the command
line. Once the program starts, the shell reports its PID,
$ emacs &
[1] 20331
which can also be used to stop the program if we need to:
$ kill 20331
[1]+ Stopped emacs
1.2 The C/C++ Toolchain 9

1.1.4 The Manual

The system manual can be accessed directly from the shell with the command man.
This can be used to print information about commands, as well as C programming
subroutines (as we will see later), and specific topics. The command is
$ man [topic]
where [topic] stands for the topic you want to get information about (e.g. a
command). The manual is arranged into sections, which you can access by passing
the section number (optional) before the topic name.

1.1.5 The POSIX Standard

Many of the concepts introduced here are defined as part of the POSIX (Portable
Operating System Interface) standard [26]. This is a specification that encompasses
much of the programming environment discussed in this book, and in some ways it
can be taken as the basic specification for UNIX-like operating systems. While Ma-
cOS is POSIX-certified, and thus fully compliant, Linux adheres to it very closely,
but does not have a certification. The standard defines the interface, not the imple-
mentation, of a variety of components of the OS. It also alines closely to the ISO
specification of the C language [24], which is followed by this book.

1.2 The C/C++ Toolchain

In order to make a working program from C/C++ code, we need to build it. This is
a multi-stage process in which compilation is one of the key steps, but not the only
one. Although building is a more accurate term for this, we often use compiling in an
informal way to denote the complete process. To support this, the compiler toolchain
provides a series of programs, which can be invoked with a single command, or in
separate steps.

1.2.1 Compilers and Interpreters

The central component of the C/C++ development toolchain is called the compiler.
This is a program that takes the code as a plain text file and translates it into bi-
nary instructions that can be understood by the computer to execute the intended
computation. The binary file that is produced by the compiler needs to be combined
with other binary data, generally from other system files, in order to produce the full
executable program. This is done in the final stages of the process.
10 1 Introduction to the Programming Environment

C and C++ are languages designed to be compiled in this way, producing highly
efficient programs. In contrast, there are other languages, such as Python and Lisp,
that are not dependent on compilers, but on an interpreter program, which does the
translation from code text to computation directly, without the need for a compila-
tion stage. These are generally less efficient from a pure computation point of view,
but have an advantage of being generally more interactive and they work at a higher
level (i.e. demand fewer programming steps/number of code lines in a program).
For the type of computation involved in audio and music applications, we often re-
quire the efficiency of compiled code. Languages that are run on optimised virtual
machines, such as Java and Javascript, can be seen as an in-between solution, where
compilation to an intermediate bytecode representation is used in place of direct
interpretation or machine code.

1.2.2 Compiling

In the first part of this book, we will concentrate solely on the C language, and thus
the discussion from now on will turn to the specific tools used to build programs
written in that language. The command cc is used to invoke the C compiler6 , to
which we need to pass the name of file to be compiled, and the name of the output
program we want to create:
$ cc mysource.c -o myprog
where we are passing mysource.c, called the source file, containing the code for
the program. We are also using the flag -o to indicate the name of the output file
myprog, which will hold the compiled program. We can see that this file has been
created in the current directory by listing it:
$ ls -l myprog
-rwxr-xr-x 1 jane staff 8432 13 Jun 21:42 myprog
Note that the file has execute permissions as it was created as binary executable.
Using the cc command in this way invokes all the toolchain commands in one
single step, behind the scenes, to build the new program. The main stages of this
process can be listed as:
1. Preprocessing.
2. Compiling.
3. Linking.
In the first step, the code text in the source file is manipulated to produce the
input to the compilation process. One of the typical aspects of this preprocessing is
6 We assume you have the compiler toolchain installed on your system. This might need to be
installed, please revert to the instructions for your specific platform in order to do so. You can
check whether the tools are installed by typing the cc command and checking whether it exists in
the system.
1.3 Introduction to C Programming 11

the addition of code taken from other existing files called header files (because they
are often placed at the top of the source file). These files usually have names that use
the extension .h (although this is not mandatory) and contain standard lines of code
that are used by many programs. They are used to facilitate programming, reducing
the need for these lines to be rewritten in every new source file. Other preprocessing
operations can be invoked, such as text substitution (also known as macros).
Once the final program code in text form is ready, with all preprocessing done,
the compiler translates it into object code. The output from this stage will contain
only the compiled binary version of the code that was written in the source file,
nothing else. In the majority of cases, to make a full executable, we require some
extra chunks of object code to allow the OS to load and run it. These come from
existing pre-compiled components that are kept in library files. Again, much of this
binary code is standard and does not need to be compiled every time a program
is built. To bring in these extra components and combine them with our compiled
object code, we need the third step, linking, from which emerges the full program.
While it is possible to perform these three stages in separate calls to the different
compiler tools, we will not need to do this in most of the examples in the early
part of this book. With larger and more complex projects containing multiple source
files, it will make sense to split the build process into separate compiling and linking
steps.

1.2.3 Running Programs from the Terminal

The compiler places the newly built program in your current directory. It can be run
from the terminal like any other command/program in the system. For this, we give
the full path to the filename, as in the following example,
$ /Users/jane/myprog
Alternatively, we can use the . shorthand,
$ ./myprog
which is more convenient as it will not require us to remember the full path to the
working directory. This, of course, assumes that the working directory is not in the
PATH list.

1.3 Introduction to C Programming

Now that we have introduced the environment in which we will be developing our
programs, we can turn our attention to the C language. In this section, we will ex-
plore the fundamental elements of program structure, layout, compilation, and exe-
cution. This will be done by looking a simple program, which, although trivial, will
illustrate all of these basic aspects of programming.
12 1 Introduction to the Programming Environment

1.3.1 Character and Keyword Sets

All C Programs may avail of the following set of distinct characters [24]:

1. The 26 uppercase letters of the latin alphabet


A B C D E F G H I J K L M
N O P Q R S T U V W X Y Z
2. The 26 lowecase letters of the latin alphabet
a b c d e f g h i j k l m
n o p q r s t u v w x y z
3. The 10 decimal digits
0 1 2 3 4 5 6 7 8 9
4. The 29 graphic characters
! " # % & ' ( ) * + , - . / :
; < = > ? [ \ ] ˆ _ { | } ∼
5. The space character and the control characters representing horizontal tab, verti-
cal tab, and form feed.

This list implies that the language is case-sensitive, which means that it pays at-
tention to the capitalisation of identifiers. In addition to this character set, we should
note that the language reserves a series of specific keywords for particular uses. The
following is a list of these that is defined by the C language standard [24]:
auto break case char const continue default do double
else enum extern float for goto if inline int long
register restrict return short signed sizeof static
struct switch typedef union unsigned void volatile while
_Alignas _Alignof _Atomic _Bool _Complex _Generic
_Imaginary _Noreturn _Static_assert _Thread_local

1.3.2 Entry Point

Programs are organised in structural blocks called functions7 . All the code that per-
forms computations, the program statements, is then placed inside these program-
ming units, which are executed by the computer. Programs are executed statement
by statement in a sequential manner, until the last one is performed, when the pro-
gram exits.

7 A more precise definition will be given in Chapter 6.


1.3 Introduction to C Programming 13

C programs will consist of at least one function, called main() [24]. The main
function is the first function that is called when your program runs. It is known as its
entry point, from where the OS makes the program start the execution of a process.
From there onwards all instructions in the program source code are executed in
sequence. When the last instruction is performed, the process is exited, returning an
exit code to the OS (to indicate successful completion or otherwise). A flowchart
demonstrating this operation is shown in Fig. 1.2.

user starts the OS loads the


program
- executable

?
instructions are entry point

executed in sequence is found

?
last instruction executed OS takes a return
program exits
- code from program

Fig. 1.2: Running a program.

A simple program may therefore consist of a single function, called main(),


with a sequence of statements inside it:
int main()
{
statement_1;
statement_2;
...
statement_N;
}
Each statement is terminated by a semicolon (;). This serves as a full stop for C
program code. Without it, the compiler will not know where one statement ends and
where the next starts. Statements may span multiple lines, so it is very important to
pay attention to the placement of semicolons.
All our C programs will need at least one of the C standard libraries, which deals
with standard input and output of data. Its associated header file is stdio.h. We
add it to the program code with this preprocessor command at the top of the source
file:
#include <stdio.h>
All lines starting with a # (hash) are preprocessor commands. The include com-
mand effectively copies all the text data from a header file into the position where
the preprocessor finds it in the source file.
14 1 Introduction to the Programming Environment

1.3.3 The shin Program

The archetypal first program is an analogue to the classic Hello World by Kernighan
and Ritchie [28]. This is simple enough to demonstrate the basic C program structure
and layout introduced in the previous section:
1 #include <stdio.h>
2 int main()
3 {
4 printf("Live Long and Prosper.\n");
5 return 0;
6 /* end */
7 }
This program contains one function, main, which holds two statements: printf
(...); and return 0;, each one duly terminated with a semicolon. Note that,
for the sake of clarity, we have placed each statement on a separate line. This is not
actually required by the C syntax in order to distinguish them. As we have noted
before, only the semicolon is used for this purpose. Single statements can span mul-
tiple lines; a single line can contain multiple statements.
The first statement in line 4,
printf("Live Long and Prosper.\n");
calls the printf() function that is defined outside this program. We did not write
its code, it is provided by a library. The C program knows about printf() be-
cause it is listed in the stdio.h header we are including at the top. This function
is part of the standard C library and is used to display text. The characters that make
up the text are passed to the function inside double quotes. This is called a string,
which is how C programs store text. All parameters to functions are always placed
inside parentheses after the function name. The expected result of this call is that the
text characters are printed to the standard output, which is by default the terminal.
The final statement of the main() function in line 5,
return 0;
is used to yield a result (0) as the output from this function, which is the numeric
code returned to the OS to indicate all went well and the process finished cleanly.
The final line of the main() function (line 6) is a comment, defined by the /* and
*/ delimiters, which contains no program statements and is therefore ignored by
the compiler.
1.3 Introduction to C Programming 15

Compiling and running

Using the text editor of choice8 , this code is placed in a file called shin.c and com-
piled with
$ cc -o shin shin.c
producing a program called shin. Note the use of the -o flag, indicating that the
output of cc is a file called shin. The cc command will invoke the preprocessor to
deal with the #include line, then the compiler itself to transform the preprocessed
code into binary form, and finally the linker to insert the extra externally-defined
bits, such as the printf() function.
We can run it with the following command (which is the name of the program
file):
$./shin
Live Long and Prosper.
where ./ means the file is in the current working directory.
As we have seen before, in order to run the program, the command-line interface
(shell) looks for executable files (programs) in certain directories indicated by the
environment variable PATH. Only directories in the path will be searched for. The
current directory might not be in the path; to make sure you are running the right
file, always type in the full path to it:
./shin
which is a program file called shin in the current directory.

1.3.4 Summary

The following is a summary of some of the fundamental details of program structure


that we should be aware of:
• Comments: programmers can add comments using the /* and */ delimiters
anywhere in the program source code. Anything placed in between these will not
be read by the compiler. They can span multiple lines:
/* shin.c
author: V Lazzarini, 2018
* /
#include <stdio.h> /* header file for stdio */
int main() /* main function */
{
8 Gnu Emacs (https://www.gnu.org/software/emacs, also called Aquamacs on MacOS, http://

aquamacs.org) and Atom (http://atom.ie) are good examples of text editors that are available for a
variety of platforms.
16 1 Introduction to the Programming Environment

/* this prints a message */


printf("Live Long and Prosper.\n");
return 0;
}
The C language standard [24] also allows single-line comments beginning with
//, running to the end of the line:
int main() // this is a comment until the of the line
Use comments wisely: do not over-annotate. The code should be readable with-
out any external references, if at all possible. Comments can also be used to
isolate (comment out) code statements when diagnosing a problem or trying al-
ternative versions of a program.

• Entering and exiting: as we have discussed above, main() is the entry point
of the program. Thus, when this function reaches its end, the program stops. The
C language standard mandates that we define main() with a return type int9 :
int main() {
...
}
Thus, by this definition, main() is expected to return a numeric code to the OS.
This is generally 0 if everything was OK, and anything else if not. Since int is a
keyword for an integral data type (a whole number), a statement will need to be
provided to return a value of this type. This is what the function does at the end,
using the keyword return:
int main() /* main function returns
integers*/
{
printf("Live Long and Prosper.\n");
return 0; /* we return 0, meaning 'OK' */
}
• Standard IO: text output to the terminal is handled by the C standard input
output (IO) library. The printf() function is defined in stdio.h and imple-
mented by the C library. To use it, we have to include that header file. Similarly,
as we will see, to get input from the terminal, we can use other stdio.h func-
tions.

1.4 Conclusions

In this chapter, we have seen that the OS is a collection of software that provides the
environment for programming and running applications. As part of this, it includes
9 Types will be discussed in the next chapter.
1.4 Conclusions 17

a file system (FS) that organises files and directories (folders) and allows these to
be manipulated. Directories hold files and other directories, files can hold data or
programs (executables). The terminal (through a program called the shell) can be
used to run programs (also known as commands). The PATH is used by the shell to
locate commands.
C programs are built in three major stages, which include pre-processing, com-
piling and linking. Header files contain definitions that are required, for instance, by
programs using code from libraries. They are added to programs using a preproces-
sor directive, #include. All programs have an entry point, which is usually the
main() function. C programs are run sequentially, statement by statement. They
terminate by returning a value to the OS.
Next, we will start looking at the fundamental elements of programming in C,
using the tools and concepts developed in this chapter. In particular, we should try
to be comfortable with the development environment described here, and bear in
mind what has been discussed with regard to the overall structure of a program, its
compilation, and execution.

Problems

1.1. Modify the existing lines of the shin program, compile it and observe the
result:
(a) What happens if you add copies of the line containing
printf("Live Long and Prosper.\n");?
(b) What happens if you modify the text inside the double quote marks?
Chapter 2
Data Types and Operators

Abstract In this chapter, some fundamental concepts of programming are discussed.


Data types and variables are introduced, as well as the principles of binary encod-
ing, bits, bytes, and endianness. We then look at the different built-in types that
are available in the C language and the arithmetic operations that can be applied to
them.

The C language is fundamentally oriented towards executing operations with


numeric data, in particular for the applications we will be targeting in this book.
Everything we program will ultimately be based on arithmetic and logic operations,
even if on the surface, the resulting software might not immediately appear to be so.
This furnishes us with a good starting point to learn the language. We will start by
introducing the concepts of variables, their types and the basic operations we can
apply to them.

2.1 Variables and Types

Variables, in a programming context, are memory locations that we can address


directly or indirectly to store numbers or text characters. They are also called ob-
jects in the C language standard [24], when referring particularly to those that can
be modified. Types are provided to determine the meaning of the contents that are
stored in a variable. The following are some of the fundamental C language types
that can be employed in a program:
• Integer: whole numbers.
• Floating-point: real numbers1 .
• Character: text characters.

1 Actually, a finite representation of a real number, as some of these may have an infinite decimal
expansion [29].

© Springer Nature Switzerland AG 2019 19


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_2
20 2 Data Types and Operators

Before any variable can be used in a program, it needs to be declared appro-


priately. In addition to being given one of the types above, each variable will be
identified by a symbolic name, which must begin with a letter or a _ (underscore)
character.
All variables will occupy a certain amount of space in memory, which will be
determined by its type. The bit is the name we will use for a unit of data that can hold
two states, 0 or 1. Based on this, we can define a byte, the implementation-dependent
addressable unit of storage2 [6], which for our present purposes is equivalent to 8
bits. Each specific type defined in the language has a given size in bytes, which is
also implementation dependent.

2.1.1 Encoding

In binary architectures, all numbers are ultimately encoded using base 2 [6]. Al-
though we will not generally use a binary representation directly in our programs,
it is important to know some fundamental principles related to this. For instance, to
translate a non-negative integer from decimal to binary, we can state it in terms of a
series of powers of 23 :

1310 = (1)23 + (1)22 + (0)21 + (1)20 = 11012 (2.1)


More generally, we have, for a decimal integer d and its binary encoding b of
size N in bits [10],
N−1
d= ∑ b(n)2n (2.2)
n=0

where the n + 1 binary digit of b is given by b(n). Notice that the lower-order bits
(low n) are less significant than the higher-order ones. This means that a change in
one of those bits leads to a smaller change in value than does a change in a higher-
order bit. The least-significant bit is of order zero, associated with 20 in eqs. 2.1 and
2.2, assuming the standard right-to-left positional notation.

Byte order

Data types that hold more than one byte (all types listed above except for characters)
can also be ordered in terms of the most-significant byte (MSB) or least-significant
byte (LSB) [6]. This follows the same idea: the latter ordering is the one where a
change of 1 in its least-significant byte will mean a minimum change in value. In

2 This means that in normal situations the effective minimum storage size of a variable is a byte.
3 We use the notation xN to mean x in base N.
2.1 Variables and Types 21

the case of the MSB, a change of 1 will mean a bigger change. For instance, if we
have a 2-byte number in MSB – LSB arrangement, then
0000 0000 0000 0000 = 0
0000 0000 0000 0001 = 1
0000 0001 0000 0000 = 256
Byte ordering in computer memory is system dependent. There are two typical
arrangements: big-endian and little-endian ordering [10]. In the first case, bytes are
addressed in increasing order of significance, LSB to MSB, whereas in the other
case, the MSB comes first. For example, as shown in Fig. 2.1, a 4-byte number
in big-endian architectures will have the bytes ordered 0, 1, 2, 3, whereas in the
little-endian case it would be 3, 2, 1, 0. The x86 64/i386 family of processors has a
little-endian architecture.

MSB LSB
3 2 1 0 little-endian
LSB MSB
0 1 2 3 big-endian

Fig. 2.1: Little-endian and big-endian byte order for a 4-byte number.

Note also that, since the C language has a byte as its lowest addressable data unit,
we are not concerned in general with how bits are stored inside a byte. Additionally,
the underlying byte order is not relevant when we are denoting literal constants,
which are always written using the right-to-left positional convention from math-
ematics4 . Generally, we only need to be careful with byte ordering when we need
to transfer data from one system to another (e.g. by copying files, see Sect. 10.2.2)
or when accessing individual bytes packed in a multi-byte data type. Therefore this
is an issue that will not be significant immediately, but we will meet a number of
situations where it is, at later stages in this book.

4 We always write the most significant digit to the left of the number. This can be viewed from a
little-endian or big-endian perspective, depending on the way we read it. Cohen [10], for instance,
considers this to be a big-endian order as the ‘wider’ end of the number comes first if we are
reading it as we would do a text in English; a little-endian ordering under this perspective is akin
to Arabic or Hebrew writing. In [6], it is concluded that big-endian ordering is superior in terms of
computer architecture design.
22 2 Data Types and Operators

2.1.2 Integers

An int variable is used to store signed whole numbers. For example, the C state-
ment
int a;
declares an int variable and calls it a. There altogether five standard types of signed
integers, signed char, short int, int, long int, and long long int.
For each one of these, there is a corresponding unsigned type, declared by the
unsigned keyword. The C language standard [24] requires that a signed char
occupies at least a single byte (minimum range: -127 to +127); a short integer
should hold at least two bytes (-32767 to +32767). The long type is defined as
using at least four bytes (-2147483647 to +2147483647) and the long long type,
eight bytes (-9223372036854775807 to +9223372036854775807). Unsigned inte-
gers will be able to hold twice their corresponding signed range. The exact size of
each data type in C is always implementation dependent. In most modern 64-bit
architectures, the five standard integer types listed above will be stored in 1, 2, 4, 8,
and 8 bytes, respectively. The following are some examples of type declarations:
unsigned int ua; /* an unsigned integer */
unsigned long ulb; /* an unsigned long integer */
short sample; /* a signed 16-bit integer */
The C language standard [24] defines the following exact size integer types in
the stdint.h header file. If we include this file, we can use them in a program:
int8_t
int16_t
int32_t
uint8_t
uint16_t
uint32_t
int64_t
uint64_t
As can be inferred, u* means unsigned, and *N_t means N bits of precision. The
64-bit sizes might not present in some platforms. If the size of an integer variable is
crucial for an application, we should use these whenever our compiler toolchain is
compliant with the C99 (or later) version of the standard.

2.1.3 Real Numbers

Floating-point numbers are so named because they store a real number in two parts:
an exponent (which tracks the point position) and a mantissa (which holds the actual
numbers over which the point floats). For example,
2.1 Variables and Types 23

2.56 = 256 × 10−2 (2.3)


where 256 is the mantissa (or significand) and −2 the exponent5 ,
and this can be
represented as 256e − 2.
There are two common sizes of floats (as defined by the IEEE 754 standard [21])
commonly used in the C language:

• float: a single precision floating-point number has about seven digits of pre-
cision. Single-precision floats use three bytes (24 bits) for the mantissa and one
byte for the exponent.
float result;
• double: a double precision number has about fifteen digits of precision. A dou-
ble takes eight bytes to store, using fifty-three bits for the mantissa and eleven
bits for the exponent.
double value;

A long double type is also defined by the language, which may implement
the ten-byte IEEE extended format in most of the commonly-used computer archi-
tectures.

2.1.4 Characters

The type char holds a single character, stored in one byte. For example:
char c;
This type is most often used to store ASCII characters (which are themselves
7-bit codes), but can be used for any single-byte numerical use. The type char can
either be signed or unsigned6 .
In the shin program of Sect.1.3, we used a sequence of characters to print a
message to the screen, and called this a string. We also noted that this is the usual
form for C programs to handle text data. Each character in a string is effectively
a char, but the complete sequence is treated as a single block. We will leave the
details of how strings can be manipulated as variables for later, but will discuss
literal strings in Sect.2.2.2 of this chapter. For now, we will just determine that
strings will be defined by the char* type (note the asterisk).

5In this case, we are using 10 as the base for the exponent. Other bases may be employed.
6 The C language does not specify whether char is signed or unsigned. If you are using it for
numeric applications, you might need to explicitly declare it, or use int8_t/uint8_t if you
have them.
24 2 Data Types and Operators

2.2 Initialisation, Assignment and Arithmetic Operations

When first declared, variables can be initialised to a given value:


int a = 0;
Multiple variables can be declared and/or initialised in a single statement, sepa-
rated by commas:
int a = 0, b, c = 2, d = 3, e;
In general, the comma can be used to place two or more operations or expressions
in a single statement. Operations are ordered left to right.
If a variable is not initialised, its value will be undefined until some data is written
into it. You can store a value in a variable using an assignment operation:
name = value;
For instance,
a = 10;
stores the value 10 in the variable a, which was previously declared7 .

2.2.1 Variable Scope

The scope of a variable is the extent of a program in which it is relevant. Variables


declared within a program block are valid, and in existence, only inside that block
(and in all enclosed blocks). A program block is delimited by brackets ({ ... });
thus, a function is a program block. In general, blocks can be used freely to define
variable scope, if needed.
Variables declared inside a function are known as local, to separate them from
variables declared outside them, which are global. They are seen by all functions
within a source code file. It is best practice to avoid global variables whenever pos-
sible.
The lifetime of a C variable is generally automatic (implying the storage class
auto), that is, they come into being when declared and are destroyed when they go
out of scope. Local variables will have function or block lifetime, whereas global
variables will last until the program exits. It is possible to make a local variable
have program lifetime by marking it as static (instead of the default auto),
which will also mean that it refers to a single memory location that is shared by all
accesses to that particular variable.

7Note that = is the assignment operator and does not mean identity (or equality) (which is denoted
by ==, as we will see later).
2.2 Initialisation, Assignment and Arithmetic Operations 25

2.2.2 Constants

Constants are numeric values that cannot be changed throughout a program. Literal
integer constants are normally written in base-10 format (decimal system): 1, 2.
For long integer constants, an L is added: 2L, 10L. For explicitly unsigned con-
stants we can use a U: 2U, 10UL8 . Literal floating-point constants will have two
forms: with an f at the end, for floats and just with a decimal point somewhere for
doubles (2.f is a float; 2.0 is a double).
Integer literals can also be written as either hexadecimals (base 16) or octals
(base 8):

1. Octal constants are preceded by a 0. The decimal 31 (= 000111112 ) can be writ-


ten as:
int a = 037; // 037 in octal is 31 in decimal
Octal digits will range from 0 to 7. Each one can hold 3 bits (0002 to 1112 ).
2. Hexadecimal constants are preceded by an 0x:
int a = 0x1F; // 0x1F in hexadecimal is 31 in decimal

Hexadecimal digits will range from 0 – F, with A – F representing the decimals


10 – 15. Each digit holds 4 bits, two of them encode 1 byte. For instance, F in hex-
adecimals represents (1-valued) set bits. For instance, the 16-bit (2-byte) bitmask9
0xFF00 is a series of 8 set bits followed by 8 zeros (1111 1111 0000 0000).
Floating-point literals may be written in exponential form. For example, the dou-
ble constant 0.004 can be notated as
double f = 4e-3;
and an f may be appended to it to make it a single-precision float.
Macros10 can also be used to give names to constants. The preprocessor state-
ment #define will do this for you, and so
#define VALUE 10000
will substitute the integer literal 10000 for any instances of the word VALUE, so that
you can use VALUE as a constant in your code. The preprocessor takes care of all
replacements for you.
Single-character literals are defined by single quotes:
char c = 'a';
will store the code for the single ASCII character a in the variable c.
Literal strings are defined inside double quotes " ":

8 Lower-case u and l can also be used.


9 Bitmasks are used in bitwise operations, which we will see later in the book.
10 Macro is the general name given to the token replacement operation supported by the prepro-

cessor.
26 2 Data Types and Operators

"Live Long and Prosper."


is an example. They are used to define constant text objects to be employed in pro-
grams, such as a message printed by the printf() function. String literals are
read-only, and any attempt to modify them leads to undefined behaviour. C string
constants cannot span multiple lines inside a single pair of double quotes, but can
be split into two or more sets inside multiple pairs of double quotes, which are con-
catenated by the compiler. For instance,
"Live "
"Long "
"and "
"Prosper.";
is a valid string literal. Alternatively, and more generally, the backslash character \
can be used as a line continuation character to indicate the absence of a line break
at that point:
"Live \
Long \
and \
Prosper.";
Finally, C also includes a const keyword which can be used to declare variables
that are read only, which effectively makes them constants:
const int end = 0;
in which case we require an initialisation (since the identifier end is not modifiable).
Read-only variables and literal constants are distinct: in some cases where a constant
is called for, compilers might require a literal to be given explicitly instead of a
constant that is defined by a const object.

2.2.3 Operations

The fundamental arithmetic operators are:

• addition: a + b
• subtraction: a - b
• multiplication: a * b
• division: a / b
• remainder: a % b

For both division and remainder, if the value of the second operand (b) is zero,
the behaviour of the operation is undefined [24].
When mixing variable types, as in
a = 20.0/6
2.2 Initialisation, Assignment and Arithmetic Operations 27

care needs to be taken. The actual result will depend on the types involved (in this
case, we know there is a double constant being divided by an int constant). If a
is an integral variable, then the result will be truncated to 3. If it is a floating-point
variable, it will be expanded up to the type precision (single or double). Note that:

1. Integer division may truncate the result (in which case the remainder will be
non-zero).
2. If a floating-point type is included in the expression, an integer variable will be
upgraded to an equivalent floating-point type before the operation is carried out.

The operator % returns the remainder of an integer division:


int a = 5, b = 2;
int q, r;
q = a / b; /* q = 2 */
r = a % b; /* r = 1, thus a = b*q + r */
For unsigned numbers, it can also be interpreted as a modulo operator. In general,
this is defined to match the following relation, for r = a mod b, and q = a/b,

r = a − bq (2.4)
with non-negative integers (and b > 0) [29]. We can think of it as counting up from
0 to b − 1 and then starting back at 0, and repeatedly to b − 1, until we have counted
a + 1 numbers: 5 mod 2 is 1 (0, 1, 0, 1, 0, 1). Conversely, 2 mod 5 is only 2 (0, 1,
2), which is the same as 7 mod 5 (0, 1, 2, 3, 4, 0, 1, 2). This is sometimes called
clock arithmetic, as it follows the idea that the hours are calculated modulo 12, and
minutes modulo 60.

2.2.4 Conversion

Data types can be explicitly converted into one another by using a cast, defined by
the operator (type):
int a = 1;
float b = 1.f;
a = (int) b;
b = (double) a;
Conversions between integral and floating-point types may cause truncation, as
the fractional part of the number is lost. It is also important, when converting types
to ensure that the recipient has enough range to hold the data or overflow might
occur.
28 2 Data Types and Operators

2.2.5 Arithmetic Order

Arithmetic ordering puts multiplication, division and remaindering at a higher


precedence than addition and subtraction. All of these operations are left-to-right
associative, so operators of the same level of precedence are executed in that or-
der of appearance. To eliminate any confusion, we can use parentheses, ( and ), to
group operations. These have the highest precedence of all, so whatever is placed
inside them is evaluated first:
1. Addition and subtraction:
1 - 2 + 3 /* 2 */
1 - (2 + 3) /* -4 */
2. Multiplication and division:
18 / 2 * 3 /* 27 */
18 / (2 * 3) /* 3 */

2.2.6 The sizeof Operator

As we have noted, most of the data types defined in the C language standard have
implementation-dependent sizes. To get the exact size of a variable or a type, we
can employ the sizeof operator. This can be used with any operand whose size
is known at the time of compilation. The result of this operation is the size in bytes
occupied by the operand, and the type of this result is the unsigned integer type
size_t11 (itself an implementation-dependent type) [24]. For example,
size_t int_size = sizeof(int);
can be used to get the size of an integer in the system. Likewise, we can check the
size of a given variable:
float f;
size_t f_size = sizeof(f);
This operator will allow us to verify requirements in certain situations when we
will need to manage memory space ourselves in a program.

2.3 Conclusions

We have examined some of the most fundamental aspects of C programming in this


chapter. In particular, the concepts of variable and type are crucial to the function-
ing of a program. We should try to make sure the general principles outlined here
11 Defined in stddef.h.
2.3 Conclusions 29

are well understood as they will serve as the basis for the remainder of this book.
Unfortunately, however, what we have explored so far does not allow us to write
our first fully-functional program, as we are missing one key element: the capacity
to interact with the external world. This is what we call input/output, and we will
introduce it in the next chapter.

Problems

2.1. As a pen and paper exercise, do the following:


(a) Write 32, 55 and 102 in binary form (using as many bits as you need).
(b) For each of these binary numbers, shift all bits by one position to the left (adding
a zero to the new lowest order bit, i.e. 101 → 1010). Convert the results into decimal
form and compare with the original numbers.
(c) Do a similar operation with the same original binary numbers, but instead shift
by one to the right, i. e. 101 → 10. Convert them to decimal and compare.
(d) What is the effect of these shifting operations?

2.2. What are the results of these operations with C constants?


(a) 1 + 2 / 3 * 4
(b) 3 * 3 / 4.5
(c) 10 / 3 / 2
Chapter 3
Standard Input and Output

Abstract This chapter covers the basic means of input and output that are available
to C programs. We introduce the principles of formatted input and output, which
will provide the most generic methods of getting data in and out of programs. In
addition, we also explore other methods of single character input and output, and
string output. With the ideas presented in this and the previous chapter, we are able
to start writing our first straight-line programs.

Before we are able to write our first programs, we need to find a means of in-
terfacing with the world outside it. For this purpose, we have a variety of input and
output (IO) means, the simplest of these being the standard IO functions. With them,
we will be able to feed data into our program and display the results. This function-
ality interacts with the shell in a very tight way, which can be used for more than
just typing inputs and printing data.

3.1 Printing to the Terminal

The most general way to output results from a program is through the printf
function, which we have first encountered in Chapter 1. It takes a constant string and
a number of optional arguments. The function prototype, which tells us its overall
form, is
int printf(const char *format,...)
where the ellipsis indicates that we can use one or more extra parameters at the end
of the argument list, all separated by commas. The format string1 determines how
many parameters we will need. If it contains any format specifiers [24] introduced
by the % character, it will call for one or more extra arguments.
1 As we indicated earlier in Sect. 2.1.4, the char* type defines a string, and the const key-
word indicates it will be used read-only (constant) in the function. More details on strings will be
furnished later in the book.

© Springer Nature Switzerland AG 2019 31


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_3
32 3 Standard Input and Output

We have seen the case of


printf("Live Long and Prosper.");
where we only have the format strings and nothing else. As this contains no %,
characters it results in the string literal being printed without anything extra. The
function always returns the number of characters printed, but we can ignore this
value if we want.

3.1.1 The Format String

In the format string, the characters following the % indicate how the value of a
corresponding parameter is displayed, the conversion specification. This is defined
by a sequence containing, in the following order:

1. Zero or more flags, modifying the meaning of the conversion specification.


2. An optional minimum field width, to determine how many characters to be dis-
played. The field width is defined by an asterisk (*) or a non-negative decimal
number. If the converted value has fewer characters than the field width, it will
be padded with spaces.
3. An optional precision, giving the minimum number of digits for numeric con-
versions. The precision is defined by a period (.) followed by an asterisk or an
optional decimal integer.
4. An optional length modifier, specifying the size of the argument.
5. The actual conversion specifier character to determine the type of conversion.

For each conversion specifier, we need to supply an argument to be converted.


The format specifier determines the type of the argument expected. If you use a
specifier with the wrong argument type, printf() will not work properly. The
basic conversion specifiers that you can use in the C language are as shown in Table
3.1

Table 3.1: Basic format specifiers.

specifier type printed output


\%c char single character
\%d (\%i) int signed integer
\%e (\%E) float or double exponential format
\%f float or double signed decimal
\%s char* sequence of characters
\%u unsigned int unsigned integer
\%x (\%X) int unsigned hex value
\%o int unsigned octal value
3.1 Printing to the Terminal 33

The optional length modifiers are:

• hh: specifies that the integer conversion applies to a char argument.


• h: specifies that the integer conversion applies to a short argument.
• l: specifies that the integer conversion applies to a long argument.
• ll: specifies that the integer conversion applies to a long long argument.
• z: specifies that the integer conversion applies to a size_t argument.
• L: specifies that the floating-point conversion applies to a long double argu-
ment.

When the field width, precision, or both are indicated by an asterisk, an extra int
argument needs to be supplied to determine it. In this case, such argument should be
provided before the corresponding argument that will be converted. The precision
gives the minimum number of digits for integer conversions, the number of digits
after the decimal point for floating-point conversions, and the maximum number of
bytes for the string conversions.
The optional flags are as follows:

• -: left justify.
• +: always display sign
• space: display space if there is no sign
• 0: pad with leading zeros
• #: use alternate form of specifier

The alternate form # of the modifier can be used as follows

• %#o: adds a leading 0 to the octal value


• %#x: adds a leading 0x to the hex value
• %#f or \%\#e: ensures decimal point is printed
• %#g: displays trailing zeros
Format strings may contain any ASCII characters, including some special for-
matting codes. These are always escaped with a backslash:

• \b: backspace.
• \f: formfeed.
• \n: newline.
• \r: carriage return.
• \t: horizontal tab.
• \v: vertical tab.
• \': single quote.
• \": double quote.
• \0: null character.
• \a: sound/bell alert.

Examples:
– A message including an integer, followed by a newline:
34 3 Standard Input and Output

int a = -10;
printf("This is an integer: %d \n", a);
– Two unsigned integers separated by a tab and followed by a newline:
unsigned int a = 1, b = 4;
printf("%u \t %u \n", a, b);
– A long integer with ten characters of field width, right justified (no newline):
long int a = 100;
printf("%10ld", a);
– A floating-point number with three decimal digits of precision, that is the result
of an expression, followed by a newline:
int a = 100;
printf("%.3f\n", a/3.);
– A vertical tab and three characters inside double quotes, followed by a newline:
printf("\v\"%c%c%c\"\n", 'h', 'i', '!');

3.2 Getting Input from the Terminal

Data from the standard input can be retrieved with scanf(), which has a similar
prototype to printf(),
int scanf(const char *format, ...)
This function will return the number of items assigned, but this value can be
ignored if not needed. In some cases, as we will see later, it can return a special
code defined by the macro EOF, indicating that there is no more input to be read. .

3.2.1 Pattern Matching

The format string in the scanf() case will perform pattern matching, reading what
has been typed at the input and placing it in one or more corresponding variables.
The main difference here is that each argument will receive data (rather than provid-
ing it, as in the case of printf()), and for this reason we will need to expose the
memory address of each parameter. This will allow scanf() to use these parame-
ters as output rather than input. Once an address has been passed, the function can
place data in it. In the C language, the address of a variable can be obtained using
the & operator:
int a; // variable a
&a; // the address of a
3.3 Character Input and Output 35

Thus, to get two integers from the input, we can use


int i, j;
scanf("%d %d",&i,&j);
which will read in two whole numbers into i and j, and ignore any whitespace or
new lines in the input. The following rules apply as far as the formatting string is
concerned:

• Any format specifiers will be used to translate a given input into a variable ad-
dress provided. For example the %c places a single character typed at the terminal
into a char variable:
char c;
scanf("%c",&c);
• Any whitespace characters in the formatting string will match any number of
such characters typed at the input. For instance
char c;
scanf("%c ",&c);
will ignore any number of spaces, newlines or tabs typed after a single character.
• Any ordinary character (except %) will match a corresponding character in the
input. This means that a scanf() call will attempt to match an input to a format
string. If it cannot, it will return without scanning any further inputs. For instance
char c;
scanf("hello %c ",&c);
will look for an input that matches the string "hello" followed by any number
of spaces and a single character.

3.3 Character Input and Output

In addition to the formatted IO functions outlined above, which provide a compre-


hensive means of IO for programs, we call avail of single and multi-byte character
functions provided by the C library. These are
int putchar(int c);
int getchar();
for single characters (which are converted to/from int), and
int puts(const char *s);
for character strings. With the latter function, in particular, we could have written
the shin program as
36 3 Standard Input and Output

int main()
{
puts("Live Long and Prosper.\n");
return 0;
}
These functions have more limited applications than the general-purpose
printf() and scanf(). However, they might be more appropriate for some spe-
cific tasks such as retrieving individual characters from the standard input, printing
user messages, and character-by-character output.

3.4 The calc Program

The following program implements an interactive calculator that outputs the sum of
two whole numbers:
1 #include <stdio.h>
2 int main()
3 {
4 int a,b;
5 printf("\n Please enter the two numbers: ");
6 scanf("%d %d",&a, &b);
7 printf("%d + %d = %d \n", a, b, a+b);
8 return 0;
9 }
Line 1 includes the stdio.h header, which contains the declarations for the
functions printf() and scanf(). Two variables, used as memory to hold each
input number separately are declared in line 4. The next line prints an instruction
to the terminal, which is followed in line 6 by a call to scanf() to get the input
data. This will block execution until the pattern in the format string (two numbers
separated by spaces) is matched by the user input. Once this happens, the numbers
are placed in variables a and b. Line 7 prints the two numbers and their sum in a
format string.
If we place this program in a file called calc.c, we can compile and run it as
shown below:
$ cc -o calc calc.c
$ ./calc

Please enter the two numbers: 2 3


2 + 3 = 5
$
Note that a newline is printed at the start of the program, as we had \n as the
first character of the message string, followed by a white space. This string did not
3.5 Conclusions 37

terminate with a newline, so the program waited for input at the same line it printed
to the shell. Two numbers were typed followed by an ‘enter’, leading to the result
being printed out in the next line.

3.5 Conclusions

We are now in good shape to attempt to program some of our first software. This
will be very simple at first, but we should be paying a lot of attention to the details of
getting data into the program, performing the required computation, and producing
the output. These first programs are based on straight-line code: we start at the top of
the main() function, and perform a sequence of steps, exiting at the last statement.
Once we are comfortable with this, we will be able to start adding detours and
repeats, which are collectively known as control of flow, as we will see next.

Problems

3.1. Ask for a distance in feet, convert it to metres and print out the result. (1 ft =
0.3048 m).

3.2. Calculate the average of three numbers input at the terminal.

3.3. Write a program to calculate travel expenses. Request the payable rate (cents
per kilometre), then the start and finish odometer readings and output the payable
expenses in euros.

3.4. A winery produces N litres of wine per kilogram of grapes. Calculate (1) how
many 50-litre barrels will be needed to store a certain weight of produce; and (2)
the remaining volume in the last barrel (if not completely full). Request as input the
yield N and the weight of fruit.
Chapter 4
Control of Flow

Abstract The methods of controlling and directing the flow of execution of a pro-
gram are the main topics of this chapter. We first look at branching, which can be
controlled by logical tests, or by pattern matching. Then we introduce the principle
of iteration and the three types of loop constructs available in the C language. With
this, we are able to start generating audio waveforms that can be displayed in graphs
or played back after a minor conversion step.

Computer programs normally require means of selecting statements (or blocks


of statements) for execution while ignoring others, in order to provide more flexi-
bility for developers. Straight-line code, such as the one employed in the previous
chapter, is very rarely used. We also need to provide means of iterative (repeating)
computation to implement loops, which are fundamental for certain applications.
All of these aspects of programming are provided by control-of-flow constructs. In
all of these, we will need to provide a decision procedure that will determine what
gets executed. This is called a condition, which is made out of a logical expression.

4.1 Conditional and Logical Expressions

Conditional and logical expressions are made up of operations that result in a binary
outcome: they are either false (0) or true (1). Unlike arithmetic, they only evaluate
to one of these two values. Thus they can be used to test a condition and provide
a means of selecting the subsequent sequence of execution in a program. The basic
operators in such an expression are called relational operators,
>, <, >=, <=, ==, !=
evaluating a condition of greater than, less than, greater than or equal to, less than or
equal to, equal to, or not equal to, respectively1 . The result of any of these operations

1 The equality operator is == and not simply =, which is used for assignment instead.

© Springer Nature Switzerland AG 2019 39


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_4
40 4 Control of Flow

is either 0 or 1. The first four operators in the list above have the same precedence
level, which is higher than the next two, which also have the same priority. All
relational expressions have a lower priority than arithmetic ones.
To these conditional expressions, we can add a set of logical operators, which
are used to combine one or more relational expressions. The two fundamental op-
erations are AND and OR, denoted by && and ||, respectively. A truth table can
be constructed for each one of them, indicating the outcome of the expression for
two operands. Note that in C, while false (F) is only represented by 0, true (T) can
actually be given by any non-zero value. Tables 4.1 and 4.2 show the truth values
for each combination of operands.

Table 4.1: Truth table for AND.

a && b T F
T T F
F F F

Table 4.2: Truth table for OR.

a || b T F
T T T
F T F

The priority of logical AND is higher than that of logical OR. Both operators
have less precedence than relational expressions. In addition to these two operators,
we should also mention the unary negation operator !, which returns false for a true
operand and true for a false operand. The result of a relational or logical expression
has type int. The C language standard [24] also defines the type _Bool for uses
where the value can only be 0 or 1, which could also be employed to hold the results
of such expressions, known as Boolean expressions.

4.2 Conditional Execution

The if() expression is the most basic means of conditional execution, allowing us
to select one or more statements depending on the result of a logical expression:
if(a < 0) printf("%d is negative\n", a);
4.2 Conditional Execution 41

If the result of the test is false, then the program skips the execution of that
particular statement. In general, the if() expression is defined as
if(logical_expression) ...

?
@
@ 0 (false)
condition @
@
@
@
!0 (true)

?
{ ... }


Fig. 4.1: The if ... statement.

where what follows the expression is either a single statement or a group of them
inside a program block2 (Fig. 4.1). To the single if() expression we may add a
complement that will be executed when the condition defined by the logical expres-
sion turns out to be false:
if(logical_expression) ...
else ...
where the else keyword labels the statement or block that will be executed when
the logical expression evaluates to false (Fig. 4.2). For instance,
if(a == 0) printf("zero \n");
else printf("%d is nonzero \n", a);
We can also have an alternative form of the if() expression that checks for
several alternatives (Fig. 4.3, with a catch-all none-of-the-above (optional) else at
the end:
if(conditional_expression) ...
else if(logical_expression2) ...
...
2 We have already noted that blocks are defined by brackets.
42 4 Control of Flow

?
@
@
0 (false)
condition @
@
@
@
!0 (true)

? ?
{ ... } { ... }

Fig. 4.2: The if ... else ... statements.

else if(logical_expressionN) ...


else ...
Note that all of these expression can be nested inside others using block delim-
iters if needed.

4.2.1 Conditional Operator

The if() expression has an operator version, whose result can be assigned to a
variable,
logical_expression ? true_expression : false_expression;
the result of which is defined by the logical expression: if 1, it is set to the value of
true expression, else it is set to the value of false expression. This can be used as a
means of selecting a value to be assigned to a variable:
a = b > c ? b : c;
where when b is bigger than c, then b is assigned to a, otherwise c is (an example
of how to select the maximum value of two inputs). This is equivalent to:
if(b > c) a = b;
else a = c;
4.3 Switch 43

?
@
@ 0 (false)
condition @
@
@ ?
@ @
!0 (true) @
0 (false)
condition @
@
@
@
!0 (true)
? ? ?
{ ... } { ... } { ... }

Fig. 4.3: The if ... else if ... else ... statements.

4.3 Switch

The switch block is another example of conditional execution. Here we will have
a series of discrete options defined by labels that will be compared with a value, if
they are equal, then the program executes from that point. If no options match, it
looks for a default label. The expression passed to the switch statement needs to
evaluate to an integral type. Each label is composed of the keyword case followed
by spaces and an integer constant and completed by a colon. The default case is
defined by the keyword default.
The break statement can be used to exit the switch block after the desired
statement has been executed to avoid the execution from continuing on to the next
statement (called a fall through). The most common form of this construct is
switch(expression)
{
case constant1:
...
break;
case constant2:
...
break;
44 4 Control of Flow

...
case constantN:
...
break;
default:
...
}
For instance, we can use this mechanism to select the result in a multiple-option
question:
switch(i)
{
case 1:
printf("option one selected\n");
break;
case 2:
printf("option two selected\n");
break;
case 3:
printf("option three selected\n");
break;
default:
printf("no selection\n");
}
Note that it is perfectly possible to use switch statements that include legiti-
mate uses of fall through. It is also possible to use multiple cases mapping to a single
statement:
switch(i)
{
case 1:
printf("the selection is positive\n");
/* fall through */
case 2:
printf("the selection is bigger than 1\n")
/* fall through */
case 3:
printf("the selection is bigger than 2\n");
break;
case 4:
case 5:
printf("the selection is 4 or 5 \n");
break;
default:
printf("the selection is > 5 or < 1\n");
}
4.4 Iteration 45

4.4 Iteration

In complement to conditional execution, it is possible to write programs whose flow


of control produces iterations of the same computation sequence. This is enabled by
two types of loop constructs, both of which will depend on the result of a logical
expression following similar principles to those observed in the if() statement.
Loops are essential for many applications. For example, all graphical user interface
software such as those we commonly use on a daily basis will require some sort
of loop to keep them open and ready to receive input from the user, otherwise they
would eventually reach the end of the program statements and exit.

4.4.1 The while and do – while Loops

The while loop will repeat a statement or block depending on the result of a logical
expression:
while(logical_expression) ...
Effectively, it is a version of if() that will carry on executing until the condition
becomes false (Fig. 4.4). If the logical expression is constant and true, the program
will enter an infinite loop. If there are no other means of exiting the loop built into
the program or the loop block, it may be hard to close the application. Thankfully,
operating systems have means of signalling to a program to make it interrupt execu-
tion, so in most cases, this should not be an issue.

?
@
@ 0 (false)
condition @
@
@
@
!0 (true)

?
{ ... }

Fig. 4.4: The while loop.


46 4 Control of Flow

The do – while loop has the following structure (Fig. 4.5):


do ...
while(logical_expression);
which allows the program to execute the body of the loop (its statement or block) at
least once before checking the result of the condition.

?
- { ... }

?
@
@ 0 (false)
condition @
@
@
@
!0 (true)

Fig. 4.5: The do – while loop.

The iterations of a loop are generally controlled by a variable that will make the
logical expression false at some point. A typical way of doing this is to use a counter
that can control the number of iterations. This will keep track of how many repeats
the program has gone through and exit the loop at the expected time. For instance,
int cnt = 0;
while(cnt < 10) {
...
cnt = cnt +1;
}
will iterate ten times and then exit. The expression cnt = cnt +1 can be under-
stood as taking the value of the variable, adding one to it and storing it back in the
same place. This is called an increment (by one). It is so common that two shorthand
forms exist, one with a prefix operator, and the second with a postfix one:
++cnt; // prefix increment
cnt++; // postfix increment
4.4 Iteration 47

The difference between these is that while ++cnt increments the variable before
using its value, cnt++ will use the value of the variable, and then increment it. This
only has an impact if we are using the value (assigning or checking it). For instance:
int cnt = 0;
while(++cnt < 10) {
printf("%d \n", cnt);
}
will print the numbers 1 to 9, whereas if we had used cnt++, the printing would
go one step further, to 10, as the check would be made before the variable was
incremented. Decrement operators (--) can also be used in a similar way. Postfix
operators have a higher precedence level than prefix ones, which themselves have
higher priority than normal arithmetic expressions.
Assignment operators, += and -= can also be used for increment or decrement.
They have a right-hand side step value or expression (e.g. cnt+=2 for an incre-
ment of 2). Such operators have a lower priority than the relational and arithmetic
expressions, so they need to be placed inside parentheses if we want to check their
value correctly. Similarly, we have a *= b (for a = a * b), as well as a /= b
and a %= b.
In addition to counting variables, there are other ways of controlling a loop that
can be used. We could request the user to enter specific values via scanf(), which
are then checked for a given condition. We could also examine the value of an
arithmetic expression and trigger a new iteration based on it, and so on.

4.4.2 The for Loop

Given the widespread use of counting variables in loops, a specialised version is


available to facilitate this use. The while loop
cnt = 0;
while(cnt < 10) {
...
cnt++;
}
can be implemented in the compact form of the following for loop:
for(cnt = 0; cnt < 10; cnt++) ...
As with conditional execution statements, loops can be nested within the body of
other loops. This is particularly useful if we have to execute repeated operations for
each operation of a loop (for instance, to trace a two-dimensional figure).
48 4 Control of Flow

4.4.3 The break and continue Statements

As we have seen before, the break statement makes a program to exit a block
from anywhere within it. It can be used as means of exiting a loop in the middle
of its body if we require it. In addition to this, loops can avail of the continue
statement, which is used to jump directly to the logical expression evaluation from
anywhere in a block, skipping any statements after it.

4.5 A First Synthesis Program

With loops and branching, we can write programs that do a lot of work with only
a few lines. This allows us to have our first go at sound synthesis. The principle is
very simple: we will generate a sequence of numbers that can be interpreted as a
digital audio signal. When we do that, we will hear a tone.
So let’s approach this in parts. First we will write a program to print a series of
numbers to the terminal. This sequence will have a repeating pattern: every now and
then it will look the same. Each repeated set of numbers is called a period, and if we
interpret this series of numbers as a signal, we have a periodic signal. The pattern
we will create first is a ramp, numbers that will increase from zero to a maximum.
We can do this by using the modulo operator in a loop:
while(n < END) {
s = n % max;
n++;
}
This is the core of our synthesis program. Let’s complete the rest around it and
call the resulting executable saw:
#include <stdio.h>
#define END 44100

int main(){
unsigned int n = 0, max = END/441;
float fmax = (float) max, s;

while(n < END) {


s = (n % max) / fmax;
printf("%f \n", s);
n++;
}
return 0;
}
4.5 A First Synthesis Program 49

Note that we have made sure the numbers are output as floats in the 0 to 1 range.
This will facilitate the later translation into a digital signal.

4.5.1 Plotting the Waveform

If we run this program, we will see the following pattern at the terminal: a series of
floating-point numbers moving from 0.0 to close to 1.0, repeatedly:
$ ./saw
0.000000
0.010000
0.020000
...
0.980000
0.990000
0.000000
0.010000
Now we can interpret this as a digital audio signal. In doing so, we can, for
instance, plot the waveform it produces. A simple graphic display can be made
with a separate standard IO program, which can feed off the data we produced with
the saw program. For this purpose, we introduce two important concepts of shell
operation:

1. Redirection: the output of printf(), i.e. the standard output, or stdout,


is normally directed to the terminal screen by the shell. We can redirect it to a
different destination, for instance to a file in the FS, which will be filled with
the contents produced by printf(). To do this, we use the output redirection
symbol > after the program name, and the name of the file after that. For instance,

$ ./myprog > output.txt


Likewise, the input to scanf(), i.e. the standard input, or stdin, normally
comes from the terminal, but we can redirect it from a file. The process is similar:
we use the input redirection symbol < to take the input from a named file:
$ ./myprog < input.txt
So we could write a program to plot this output as a waveform, vertically, on the
terminal, using this principle (let’s call it plot):
#include <stdio.h>
#include <math.h> /* round() is declared here */

int main(){
float sample;
50 4 Control of Flow

int i = 0, s, nsamp = 0;
do {
i = scanf("%f", &sample); /* read sample */
s = (int) round(sample * 100); /* scale it */
printf("[%5d]", nsamp++); /* sample index */
while(--s >= 0) printf("-"); /* plot the value */
printf("*\n");
}
while(i != EOF);
return 0;
}
This program scans the standard input for float samples and then prints an equiv-
alent number of dashes to the terminal, terminating the line with an asterisk.
Each line also receives the corresponding sample index as a time reference. Note
that, in order to keep the plot aligned, we print enough spaces to hold up to 5
digits (by setting the field width to 5 in the formatting string, "%5d"), since the
biggest index we will print, 44100, contains 5 digits. The program checks for a
special end-of-file code (the constant EOF), which is returned by scanf() once
the stream of characters is finished3 . With this in hand, we can now produce a
simple plot of the waveform:
$ ./saw > wave.txt
$ ./plot < wave.txt
[ 0]*
[ 1]-*
[ 2]--*
[ 3]---*
[ 4]----*
[ 5]-----*
[ 6]------*
[ 7]-------*
[ 8]--------*
[ 9]---------*
[ 10]----------*
...
While this is not a standard way of plotting data, and the program can only cope
with non-negative numbers, it is about the best we can do at the moment. In
Chapter 6 we will develop a better terminal plotting program.
2. Pipes: in addition to redirection, we can send the standard output of one program
into the standard input of another using the pipe symbol |:
$ ./saw | ./plot

3 The EOF condition can also be signalled to a program by typing the ctl-d key sequence at the
terminal.
4.5 A First Synthesis Program 51

These same principles can be applied to more advanced plotting programs, such
as gnuplot. For example, the waveform graph in Fig. 4.6 was created from the
data produced by the saw program using the following command line:
$ ./saw | gnuplot -p -e "set xrange[0:400]; \
plot '-' with lines"

Fig. 4.6: The sawtooth waveform generated by the saw program, as produced by gnuplot.

This pipes the output of saw to gnuplot, with commands to create a line
plot using the first 400 numbers taken from the standard input4 . This particular
gnuplot command is fairly general-purpose oriented, and can be used with any
single-column standard input data.

4 For further information, see http://www.gnuplot.info/.


52 4 Control of Flow

4.5.2 Playing the Sound

Since our program generates an audio waveform, we can just as easily listen to
the sound it produces. To do this, we have to first convert the numbers from text
(ASCII) to a binary encoding, place them into a file and then open that file with a
sound editor. The following are the steps to run the synthesis program, perform the
text-to-binary conversion, and produce an audio file for listening:
1. The conversion is done by another program, tobin.c5 , which we compile as
tobin.
2. We connect the output of our synthesis program, let’s call it saw, to the input of
tobin using a pipe (|):
$ ./saw | ./tobin
3. We redirect the output of tobin from stdout to a file (e.g. output.raw)
using >:
$ ./saw | ./tobin > output.raw
4. We import the file as raw data into the sound editor, with the encoding set to
32-bit floating-point data, the sampling rate to 44100, and channels to 1.
What we have done in the last step is the interpretation of the sequence as making
up an audio signal with 44100 samples (numbers6 ) in one second, containing one
channel of audio, with each number to be read as a 32-bit float with little-endian
byte order7 . So, the 44100 numbers we generated will constitute a 1-second tone,
whose frequency is going to be 441 Hz8 (because the ramp pattern is repeating
every 100 samples, there will be 441 periods, or cycles, in one second). This is a
very simple digital sawtooth wave [36].

4.5.3 Other Waveforms

If we replace the synthesis loop for this:


float s = 1.f;
while(n < END) {
if((n % max) == 0) s *= -1.f;
printf("%f \n", s);
n++;
}
5 We will study this code in Chapter 9, where you can find the source code for it.
6 A sample is the name we give to each individual element of a sequence that represents the digital
audio signal.
7 We are assuming this is being built and run in a little-endian architecture, such as the x86.
8 1 Hz = 1 cycle per second [36].
4.6 Conclusions 53

we can generate a digital square wave. When this is played back, note that the pitch
will have dropped by one octave. This is because the square wave we generated has
a period that is twice the size of the original sawtooth. Note that the loop alternates
between −1 and 1 every max samples, so the whole cycle takes twice the time to
complete. Problem 4.4 prompts you to think about how you could generate another
one of these waveforms based on simple geometric shapes.

4.6 Conclusions

This chapter has introduced some important concepts of structured programming,


such as conditional execution and loops. We are now at the stage where we can cre-
ate programs that generate sequences of numbers which can be interpreted as digital
audio signals. This is a significant development. To build on it, we will move on to
a deeper level, where we can manipulate the program memory and compute larger
blocks of data. This will be the topic of the next chapter, where we will encounter
another set of fundamental programming concepts.

Problems

4.1. Write a program to read in three numbers and write the smallest.

4.2. Travel expenses are paid as follows: 15c per km for cars up to and including
1.5 litre engines; and 20c per mile for cars with engines above that size. Write a
program to calculate travel expenses which takes as input the car engine size and
the distance travelled.

4.3. Add N input numbers and write out the result. Ask for the number of inputs (N)
first.

4.4. Write a version of the synthesis program that can generate a triangle wave.
Chapter 5
Arrays and Pointers

Abstract This chapter introduces the principles behind the composite data types
called arrays. It discusses their memory layout and how to manipulate them. We
then introduce the more advanced topic of memory addresses and pointer variables,
showing how they relate to arrays. Finally, strings are presented as a special kind
of character array. The chapter concludes by exploring ways of manipulating string
variables.

In this chapter, we will look at how we can create lists or sequences of the various
built-in data types, called arrays, and manipulate memory addresses through the
specially-defined pointer variables. These objects will be fundamental to many of
the sound and music computing applications we will be working with throughout
this book. In particular, they will allow us to access contiguous blocks of digital
audio data, which will be essential for all synthesis and processing techniques.

5.1 Arrays

All the variables we have so far used have been able to store only a single value (of a
given type). In many applications, however, it is to common to group a whole block
of data together, so that we can store multiple values of a certain type. In order to
do this, we introduce the concept of arrays. For example, let’s say we would like to
hold ten integers together. The following declaration
int numbers[10];
declares an array called numbers with ten elements. The general form of an array
declaration is
type name[size];

© Springer Nature Switzerland AG 2019 55


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_5
56 5 Arrays and Pointers

where the variable name is an array of type type and contains size elements,
which in general needs to be a constant expression1 . Arrays declared in this way are
not initialised, and might contain garbage. We can initialise them using the following
notation:
int numbers[10] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
where each integer literal will be stored in the array with respect to the order in the
initialisation list. In fact, with such a list, we do not need to declare the array size
directly, as it will be implied by the number of items in it:
int numbers[] = {1, 2, 3, 4, 5};
Alternatively, we can also initialise members out of order using designators,
which are indices declared inside square braces, as in
int numbers[5] = { [4] = 1, [1] = 2 };
and, in general, initialiser lists (whether using designators or not) may be incom-
plete.
Once we create an array, we can use the array indexing notation to select in-
dividual items, e.g. a[n], where n can be an integer variable or a constant. Array
indices are zero-based; that is, the first element is in index 0, and the last in size-1
(Fig. 5.1). Arrays are stored as contiguous memory locations; thus the indices are
used to select a given offset from a start location in memory. We should never try
to access data beyond the end of an array, as this can lead to undefined errors or
segmentation fault during execution. This is a different problem from a syntax or
compilation issue, which is caught when we are trying to build the program, and
it can be more difficult to fix. It is important to know that the C compiler does not
check for these mistakes, so the programmer should always be aware of them.

a[0] a[1] a[2] a[3] a[4]

1 2 3 4 5
Fig. 5.1: A graphic representation of the array int a[5] = {1, 2, 3, 4, 5}.

Arrays can be manipulated very easily via loops. In particular, the for loop is
well suited to accessing values in them. For instance, we use
for (i=0; i<10; i++) a[i] = i+1;
to fill the array with 1, 2, ..., 10.

1 The C language standard [24] defines the concept of a variable-length array (VLA), which can
optionally be implemented (but is not strictly required). The C++ language does not define these.
To avoid compatibility problems with some compilers and a better forward link to C++, we should
always treat array sizes as constant.
5.2 Strings 57

5.1.1 Two-Dimensional Arrays

Two-dimensional arrays are also possible in C, where, instead of a single row of


elements, we have several, stacked. They are declared by passing the number of
rows and columns required:
type name[rows][columns];
For example, if we want to create a 10 by 10 array, we use
int matrix[10][10];
which is a 10-element array of a 10-element array of int. Thus, two-dimensional
arrays are initialised and accessed in row-column order. For instance, a 2 by 3 array
is effectively an array of two elements, each of which is itself an array of three
numbers:
int mat[2][3] = {{0, 1, 2},
{3, 4, 5}};
where mat[0][0] = 0, mat[0][1] = 1, mat[0][2] = 2, mat[1][0]
= 3, mat[1][1] = 4, and mat[1][2] = 5. In these cases, we have an array
of arrays of a given fundamental type. Likewise, higher-dimensional arrays are also
defined by the C language standard [24], and can be declared by the presence of
further array of specifications.

5.2 Strings

We have seen that in C, text is kept in strings. The underlying storage of this pseudo-
type is an array of type char. The convention employed in the language is that each
string is terminated by a null character ('\0'). This is automatically added by the
compiler in the case of a string constant. For example:
char s[2] = "a";
where we note that the array has two elements. The first one is initialised by the
character 'a', whereas the second holds '\0'. Notice also that we are using dou-
ble quotes to declare a string constant, which means it will always contain one extra
character in addition to those inside the quote marks (Fig. 5.2). Remember that a
single character constant will always be declared insingle quotes.

s[0] s[1] s[2] s[3] s[4] s[5]

'H' 'e' 'l' 'l' 'o' '\0'


Fig. 5.2: A graphic representation of the string char *s = "hello".
58 5 Arrays and Pointers

Since strings are arrays, we have to be careful when manipulating them. We


cannot assign them directly and expect that all of its elements will be magically
copied from one character array to another. It is possible, however, to initialise a
character array with a constant string, as we have seen above. In order to manipulate
them, we will need to access each character individually. As we will see later, there
are a number of C library functions that do just that, copying, checking, duplicating,
and concatenating strings.

5.3 Pointers

We have seen that a variable declared in a program reserves a certain storage space
to hold a value of a given type. We give this memory location a name and we
can proceed to assign and retrieve data to/from it. The OS organises all such lo-
cations through unique addresses that are used to identify them. We have seen that
the address-of operator (&) can be used to obtain this location reference so that some
functions such as scanf() can place their results directly there.
This is complemented in the C language by the ability to store such addresses in
special variables called pointers. These are not the usual regular types we have seen
before, but ones designed to hold the memory location for a variable of a given type.
We declare them by using an asterisk. For instance:
int *p;
is a pointer to an address that can hold an integer. The position of the asterisk in the
declaration is not important; we can equally well use int* p or int * p, as long
as there is an asterisk somewhere after the type name. In any case, the variable p we
declared is not pointing to anything, because it has not been initialised or assigned
to a location yet. We can initialise it with a given memory address:
int n;
int *p = &n;
where the pointer variable p is initialised to the address of the variable n. Once
a pointer is holding a memory location, we can access it using the dereferencing
operator (also known as indirection), as illustrated in Fig. 5.3. This is represented,
again, by an asterisk, but now with a different meaning. A * placed in front of a
pointer accesses the value stored in the memory address that is held by the variable
(rather than its contents, which are the address itself). For example
/* n is declared and initialised
with 10, k is just declared */
int n = 10, k;
/* a pointer p is initialised
with the memory address of n */
int *p = &n;
/* k is assigned the contents of n, 10 */
5.3 Pointers 59

k = *p;
/* the contents of n now hold 12 */
*p = 12;

LL NN
p NN n 10

6
Fig. 5.3: A pointer int *p = &n, whose contents are the address of n (NN). The pointer memory
address is LL. The arrow indicates the indirection operation, which yields the value of n, 10.

Note that the asterisk has two functions in this context, which have slightly dif-
ferent semantics. These are some basic principles of pointer syntax:
1. When declaring a variable, it marks the variable as a pointer to a data type:
int *, float *. It is possible to have a generic pointer with type void *,
when it is not clear what pointer type we need. However, in this case we cannot
do much apart from storing an address. It cannot be used to access data because
we do not know the size of the allocated space we are pointing at.
2. When using a pointer variable, it dereferences it, so that the expression refers to
the address pointed at. In this case, it is a unary operator applied to variable or
expression to the right of it.
3. Precedence and associativity: the unary dereferencing * operator binds to the ex-
pression to its right, and has a higher priority than all binary arithmetic operators
(+,-,*,/, %). It has the same priority level as the prefix increment/decrement,
but lower priority than postfix increment/decrement.
The asterisk is also the symbol used as the multiplication sign. However, this is
a binary operator, applying to both sides. Care needs to be taken not to confuse the
two. Parentheses could be used to clarify their application, if they happen to appear
together in an expression.
Pointers can be assigned to other pointers, in which case, as indirection is not
involved, we are assigning memory addresses:
int n, k;
int *p = &n;
int *q;
q = p; /* q now points to n */
p = &k; /* p now points to k */
Location identifiers are just unsigned integers (of a given size, dependent on
the OS and hardware). However we try not to confuse the two and always keep
addresses as pointers, and never as integers. The pointer type is also very important
as it tells the compiler the size of the memory location it is holding. This is important
in some operations, such as those involving pointer arithmetic, which we will study
below.
60 5 Arrays and Pointers

5.4 Pointers and Arrays

A variable declared as
int a[10];
is also a constant pointer to the start address of the array memory; thus, in practice,
a and &a[0] are the same. As it is constant, we cannot point it to anywhere else.
But we can use it like any other pointer. We can dereference it, for instance, to obtain
the value of the first position of the array (e.g. *a), and apply offsets, as described
in the next section.
Similarly, a two-dimensional array can be described as a pointer to a pointer,
having two levels of indirection. Dereferencing it once will yield a pointer, and
dereferencing it twice will give us access to the actual data stored in memory.

5.4.1 Pointer Arithmetic

We can manipulate pointers through integer arithmetic operations. For example, let’s
say we have the following declarations:
/* an array */
int a[5];
/* p is pointing to the start of the array */
int *p = a;
We have already seen that we can use indexing to access an array, so in this case
p[n] and a[n] are exactly equivalent. However, given that p is a variable, we can
then manipulate this pointer to access the different locations in the array (Fig. 5.4).

a a+1 a+2 a+3 a+4 a+5

? ? ? ? ? ?

Fig. 5.4: The array a[5] and the various offsets to the location of each of its elements. Note that
the memory range of this array is [a, a+5), and thus a+5 points to an address outside it.

The following operations are commonly used:

• +: adding an integer moves the pointer a certain number of memory positions


ahead. For instance,
p = a + 4;
5.4 Pointers and Arrays 61

places p 4 int-size memory locations beyond a. This means that after this *p
and a[4] will yield the same value.
• -: likewise, subtraction will move the pointer a number of locations backwards.
• ++, +=, --, +=: these are also used to increment and decrement pointer
positions.

This means that we can access a given space in memory using either pointer
arithmetic or array indexing. For instance,
for(i=0; i<n; i++) *(p+i) = i;
is equivalent to:
for(i=0; i<n; i++) p[i]= i;
as well as
for(i=0; i<n; i++) a[i]= i;
and
for(i=0; i<n; i++) *(a+i) = i;
We can also use increment operators with pointers if they are not constant. For
instance, we can use *p++ = i but not *a++ = i, because the latter is a constant
pointer, and cannot be moved (but an offset can be added to it to yield a pointer to
a different location). Pointer arithmetic is useful in some situations, but can in most
cases be replaced by indexed access.
With regard to two dimensional arrays, it is important to observe how the data is
stored in memory, if we wish to access it via a pointer. For instance:
int a[3][2] = {{0, 1}, {2, 3}, {4, 5}};
will be stored as a sequence of 3 rows of 2 elements each (Fig. 5.5).
If we point to the first position (note the necessary dereferencing, as a is a two-
dimensional pointer, see Fig. 5.6),
int *p = *a;
then we can traverse the array by incrementing p, which will give us access to the
values 0, 1, 2, 3, 4, 5 if we increment by one each time, as if we were accessing a
one-dimensional array. However if we offset a, we will move from row to row, as
the variable is a pointer to arrays of three elements.
The following statements illustrate these principles (Fig. 5.7):
/* first row, second column is modified through p */
*(p+1) = -1;
/* same, through array indexing */
p[1] = -1;
/* p now points to &a[1][0], which holds 2 */
*p = *(a+1);
/* b is assigned 3 */
62 5 Arrays and Pointers

(i) 0 1

2 3

4 5

a[0][0] a[0][1] a[1][0] a[1][1] a[2][0] a[2][1]

(ii) 0 1 2 3 4 5
Fig. 5.5: Two graphic representations of the array int a[3][2] = {{0, 1},{2, 3},{4, 5}}:
(i) two-dimensional row-column arrangement; (ii) flat memory layout.

p
?
*a
0 1 2 3 4 5
Fig. 5.6: Graphic representation of int *p = *a, for the array a in Fig. 5.5.

int b = *(++pp);
/* c is assigned 4, NB: double dereferencing */
int c = **(a+2);
/* pp is an array of two-int arrays, NB: parentheses */
int (*pp)[2] = a;
/* d is assigned 5, pp[2][1] */
int d = *(*(pp+2)+1);
The final three lines require some further explanation. We need to double deref-
erence the pointer expression because it is a pointer to a pointer of int. Since a
is an array of arrays of two int elements, it can be represented by a pointer, the
type of which is int (*)[2]. We can assign a to a variable pp of this type. The
variable declaration needs an extra set of parentheses around the name, otherwise
the type would be int *[2], which is an array of two pointers to int. It can then
be accessed as normal through array indexing, or through dereferencing, as shown
5.4 Pointers and Arrays 63

in the last line. For that, we need to first dereference to get the pointer to the row,
and then dereference again to get the element value at the desired column.

p+1
(i)
*a ?
0 -1 2 3 4 5
p
(ii)
?
*(a+1)
0 -1 2 3 4 5
b = *p
(iii)
?
0 -1 2 3 4 5
c = **(a+2)
(iv)
?
0 -1 2 3 4 5

Fig. 5.7: Graphic representation of: (i) *(p+1) = -1; (ii) *p = *(a+1); (iii) b = *(++p);
and (iv) c = **(a+2), using the array a in Fig. 5.5.

5.4.2 Pointers and Strings

As we have already seen, strings are arrays of characters. This means that we can use
a char* variable to refer to strings in a program in a convenient manner. For exam-
ple, we can declare a pointer and initialise it with a string literal without necessarily
needing to set aside a certain amount of memory in an array (since the compiler will
deal with allocating the space for the constant):
char *string = "Live Long and Prosper.";
64 5 Arrays and Pointers

In this case, since string is a variable, it can be pointed somewhere else to a


different address at a later time. However, as long as it is pointed to a string literal,
we cannot modify its contents, since that memory location is read only. To avoid
problems, we might like to mark the variable as such. This is done with the addition
of the const keyword to the variable declaration, for instance
const char *string = "Live Long and Prosper.";
This means that the string pointed by the variable cannot be modified (since the
pointer is to a const char). However, the pointer variable itself is not constant,
and can be reassigned. In this case, whatever address the variable points to cannot
be modified via this variable. Note, however, that
char string[] = "Live Long and Prosper.";
is a different case. Here, we have declared a char array with enough memory to
hold the characters in the string literal (including the terminating NULL character),
and initialised it. We are able to modify the contents of that array, but we cannot
point string elsewhere, as it is a constant pointer.
While initialising a string to a literal is one of the basic operations we can do with
strings, it is clearly not enough to fully manipulate text. If we were, for instance to
copy a string from one place to another, the code to do that would look like this:
const char *src= "Live Long and Prosper";
char dest[30]; /* enough space for 29 characters */
int i = 0;
do {
/* bounds check */
if(i == 29) {
dest[i] = '\0';
break;
}
dest[i] = src[i];
} while(src[i++] != '\0')
where we copy each character until we find a NULL or we reach the end of mem-
ory (note the use of the postfix increment operator). To facilitate these types of
operations, a number of subroutines are offered by the C library under the header
string.h:
• strlen(char *s): checks for null terminator and returns the number of
characters in the string (excluding the terminator).
• strcpy(char *dest, const char *src): copies strings, from a source
src to a destination dest, which needs to contain enough space for the full
string to be copied. The strncpy() variant allows for a limit on the number of
characters copied and is therefore safer.
• strcat(char *dest, const char *src): concatenates two strings into
dest (which needs to have enough space for the resulting string). Similarly,
strncat() is its bounds-checking version.
5.5 Conclusions 65

• sprintf(char *dest, const char *fmt, ...): prints a formatted


string to a destination. This is a version of printf() that outputs to a string in-
stead of the standard output. A snprintf() variant is available, which checks
the size of the destination memory.

5.5 Conclusions

In this chapter, we have focused on data types that can hold sequences and how we
can manipulate these. We have also explored some fundamental issues of memory
access by introducing the idea of pointers, which are data types that operate at a
lower level. They are, however, widely used in C programming, as they provide a
level of flexibility that makes this language very well suited to sound and music
computing. Complementing this, we will see in the next chapter, another key aspect
of structured programming, subroutines, in which pointers also have an important
role to play.

Problems

5.1. Write a program to read in ten integers and write them in reverse order. Use
loops to read and write the numbers.

5.2. Sort a sequence of ten integers in ascending order. Here’s a simple sorting al-
gorithm for a sequence of N elements [17]:
(a) Find the location I of the largest element from A[0] to A[N-1].
(b) Interchange A[I] with A[N-1].
(c) Decrease N by 1.
(d) If N == 0 finish else repeat from (a).
Chapter 6
Functions

Abstract In this chapter, functions are introduced as the fundamental organising el-
ement in the C language. Topics related to their definition, argument passing, and
call semantics are presented first. This is followed by a discussion of the principle
of recursion. The paradigm of modular programming as implemented in C is dis-
cussed. The standard C library is introduced, allowing us to develop a sine wave
synthesis program. Finally, we develop an ASCII-based terminal plotting function
as an example of the ideas presented in the chapter.

In C programming, the concept of a function may or may not conform to the


mathematical sense, which is narrower. Here, a function is better equated to the idea
of a subroutine, that is, a self-contained section of code that can be invoked by other
parts of the program. Together with control of flow constructs, subroutines support a
paradigm called structured programming, which is a fundamental form of computer
programming.

6.1 Function Definition

A function is defined by four elements:


1. Return type: the type of the result returned by the function. If the function does
not return anything, it can be set to void.
2. Name: a symbolic name that will identify the function. Similarly to variables, it
needs to start with a letter or an underscore character.
3. Argument list: inside parenthesis, we have a list of arguments, each one with a
type and a name, multiple arguments are separated by commas. Once declared
all arguments become local variables for the function.
4. Body: inside brackets, this is the code block for the function. The function exits
if it finds a return statement or the end of the block.

© Springer Nature Switzerland AG 2019 67


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_6
68 6 Functions

We have seen all of these elements in the definition of the main() function in
the programs encountered so far. In addition to this, we can define other functions
that we can call whenever we need to. A simple example is given by this version of
the shin program:
void shin() {
printf("Live Long and Prosper.\n");
}

int main() {
shin();
return 0;
}
In this program, main() delegates all of its work to shin(), which calls
printf() to display the message. Since the function does not have a result, it
is defined with void as return type, and it does not necessarily require a return
statement. This also means that it would be a syntax error to try to assign the return
value of this function to a variable. In some cases, where an early exit is required, we
can use the return keyword, with no arguments as the function has been declared
with the void type.
The function also has no inputs, so its argument list is empty. Alternatively, it
could also have been declared void:
void shin(void) {
printf("Live Long and Prosper.\n");
}
Note that functions are always defined in global space, outside any other func-
tions. The C language standard does not allow for local functions that are placed
inside the body of other functions (including main()).

6.1.1 Arguments

We can pass parameter values to a function via its arguments. For instance, this is a
simple function taking two integers:
int sum(int a, int b) {
return a + b;
}
This calculates the sum of its parameters. If we want to use it, we call it by pass-
ing integer constants, variables, or expressions. For example, we can call sum()
in
a = sum(b,2);
where a variable and a constant are used,
6.1 Function Definition 69

a = sum(x+y,z*w);
where two expressions involving four different variables are used, and
a = sum(sum(a,b),3);
where the result of a function call and a constant are used.

6.1.2 Variable Lifetime

As we have already seen, variables are treated as auto (automatic) by default. In


a function, this means that all local variables, including its arguments, which are
also local variables, will come into being as the function is called and cease to exist
once the function exits. An exception to this is if we explicitly mark a variable as
static. In that case, the variable will persist between calls, which also implies that
a single instance of it exists for all separate invocations made to the given subroutine.
This specific use of the static keyword should be avoided if it all possible.

6.1.3 Call Semantics

In C functions, all arguments are passed by value. A local variable is created in the
function and the value of the input is copied into it (for each argument). Since the
local variable disappears once the function has finished executing, this means that
the function cannot modify the values of variables that are passed to it as arguments.
However, if we are using pointers, since the addresses are being copied rather the
actual contents, we can modify variable values indirectly. Consider the following
case:
void reset(int *p){
*p = 0;
}
This function will reset to 0 the value stored in a memory address that is passed
to it. This means that a variable that is held outside the function can be modified
through a pointer. The following program demonstrates this:
#include <stdio.h>

int main(){
int a = 0;
printf("value of a: %d\n", a); // prints 0
a = 1; // assign 1 to a
printf("value of a: %d\n", a); // prints 1
reset(&a); // pass the address of a to reset()
70 6 Functions

printf("value of a: %d\n", a); // prints 0


return 0;
}
Arrays, as they have some equivalence to pointers, are also passed through mem-
ory addresses. For instance, let’s extend the function reset() to work on an array:
void reset(int *p, int size){
while(--size>=0) *(p++) = 0;
}
In this case, we also need to pass the size of the array, as it cannot be inferred
directly (only strings are terminated by an ASCII null, ordinary numeric arrays are
not). Note that since p is a pointer to int, we can apply pointer arithmetic to it, or
alternatively we could have used array indexing (p[size] = 0, for example). On
the other hand, if we had declared the function as explicitly taking an array, as in
void reset(int a[], int size){
while(--size>=0) a[size] = 0;
}
then we would have been able to move the pointer about, as it is a constant pointer.
In this case we would have to use array indexing. In any case, we recommend
always using array indexing, which is less confusing. In summary, an argument
type *var is a pointer that can accept any memory address, including that of an
array. An argument type[] var explicitly takes in an array as a constant pointer
(to the start of it).

6.1.4 Function Prototypes

A function prototype is the declaration of its return type, name and argument types.
For example,
int sum(int, int);
tells the compiler that there is a sum() function somewhere in the code that takes
two integers and returns an integer. Once this is declared, we can use the function
in our code. The definition needs to exist somewhere, either in the same source file
where the declaration was placed, in a different file, or in an external library, pre-
compiled. If it is not defined anywhere, the linker will issue an error (‘symbol not
found’).

6.1.5 Parametrised Macros and Inline Functions

The C language preprocessor supports the use of macros containing parameters,


which in some cases can be used as an alternative to function definitions. The funda-
6.1 Function Definition 71

mental difference is that macro declaration and use is a text-replacement operation,


rather than a function call. This means that code gets inserted at the point where
the macro is used. The macro definition is done as we have seen before, through
#define, but now it features one or more parameters given as a comma-separated
list inside parentheses. For instance,
#include <stdio.h>
#define SUM(a, b) a + b

int main(){
printf("%d \n", SUM(2,2));
return 0;
}
where the macro instance SUM(2, 2) gets replaced by 2 + 2. We should note
that this replacement is strictly textual, so if we were to use it with two variables,
say i and j, SUM(i, j) would yield i + j. The macro definition is delimited
by a newline (not a semicolon), and we should be careful not to add a semicolon to
the definition if it is not needed. In the case above, it is clear that adding one would
not allow us to use it inside printf(), but we can always use the same definition
in a well-delimited statement such as a = SUM(b, c);.
Macros of this type can have more than one statement, provided that the macro
declaration itself is in a single line. To allow a code line to extend over several lines
(for visual or formatting purposes), we can use a backslash character (\), as already
discussed in Sect. 2.2.2:
#include <stdio.h>
#define SWAPINT(a, b) { \
int tmp = a; \
a = b; \
b = tmp; }

int main(){
int a = 1, b = 2;
printf("original: %d %d \n", a, b);
SWAPINT(a, b)
printf("swapped: %d %d \n", a, b);
return 0;
}
Note that there is no need to place a semicolon after SWAPINT(a, b), since the
final statement of the macro already includes one. Also, the brackets are only added
to the macro definition to allow the code to compile in the pre-C99 standard, where
variable declarations could only happen at the start of a block. This also ensures that
tmp only exists inside that block, which might prevent variable name clashes. We
can use these types of macros for any useful text replacement. For instance,
#include <stdio.h>
72 6 Functions

#define CAST(type, var) (type) var

int main(){
int a = 1;
float f = 2.0;
printf("%f %d \n", CAST(float, a), CAST(int, f) );
return 0;
}
allows us to pass in a type and an object to the macro (something we cannot define
for a function, for instance).
A similar but not exactly identical mechanism of substitution is defined in the
C standard [24] by the inline function specifier. The standard determines that if
this is present, the compiler is suggested to make the call to the function as fast
as possible (and the extent of this is implementation defined). This is usually done
through inline substitution, where the body of the function replaces the function call
at the compilation stage (not at preprocessing). This mechanism can provide some
gains in performance (by eliminating the calls), at the expense of executable file
size.

6.1.6 Variable Argument Lists

It is possible to define a function with a variable list of arguments. In this case, we


will use an ellipsis delimiter (...) in place of an argument. This is required to be
in the rightmost position in an argument list (if there are other arguments). Once we
define this, we will need to use some definitions from stdarg.h to access each
argument:

• va_list is the type holding a variable argument list.


• va_start(va_list ap, parmN) is a macro that initialises an argument
list, and parmN is the name of the rightmost parameter before the variable argu-
ment list.
• type va_arg(va_list ap, type) is a macro that retrieves each succes-
sive argument in the list.
• va_end(va_list ap) is a macro used to close the argument list access op-
eration.

The following example demonstrates how to access each argument in a variable


argument list. Note that it is important that the number and type of arguments passed
to the function are known:
#include <stdio.h>
#include <stdarg.h>

void func(int n, ...) {


6.2 Modular Programming 73

va_list ap;
int i;
va_start(ap, n);
for(i = 0; i < n; i++)
printf("%d", va_arg(ap, int));
va_end(ap);
printf("\n");
}

int main() {
func(2, 1, 2);
func(3, 1, 2, 3);
return 0;
}

6.1.7 Recursive Calls

Although not a common practice in C programming, the use of recursion is sup-


ported. This takes place when a function is defined to call itself to repeat some
computation. It is normally the main means of iteration in some other programming
environments, and for certain applications, it allows for very elegant and compact
code.
A typical example is the computation of a factorial, which can be defined recur-
sively as !N = N×!(N − 1), with !0 = 1. This can be implemented using recursion,
by splitting the base case (for N = 0) from the rest:
unsigned int fact(unsigned int z) {
if(z == 0) return 1; /* base case */
else return fact(z-1)*z; /* recursion */
}
If we unwrap the calls, we will see that the function will recurse until it gets to the
base case, and then go back executing the multiplication part until it exits the first
call (Fig. 6.1). While this is a very compact way of implementing an algorithm, it
does not always compile to the fastest code. In this particular example, it is more ef-
ficient to calculate the factorial of a number using a loop if it is very large. However,
there will be applications in which recursion is the correct method to use.

6.2 Modular Programming

In C, each source file is treated as a separate translation unit. This means that some
functions (and their local variables) and global variables can be made to be internal
74 6 Functions

!5 - fact(5) return 24*5 - 120

? 6
fact(4) return 6*4

? 6
fact(3) return 2*3

? 6
fact(2) return 1*2

? 6
fact(1) return 1*1

? 6
fact(0) - return 1

Fig. 6.1: Recursive factorial calculation.

to that unit, which we can call a module, and hidden from the rest of the program
code. Conversely, we may be able to open up access to some of the functions that can
provide an interface to the module. In this way, we can separate different compo-
nents into separate source code files and provide a means of accessing these through
a set of interfaces. This may be a useful strategy for mid- to large-size projects.
While so far we have only been using a single source file for the whole of our pro-
gram, it is very common for code to be split into separate files that are compiled and
then linked together to make up the software (Fig. 6.2).
In practice this is what we need to do:
1. In a given source file, mark all functions that are only accessible locally as
static 1 , e.g.
static int my_local_func(int a, float f) {
...
}
where the static keyword prevents external access from outside the module
by making it invisible to the rest of the program. Likewise, any global variables,
that is, those declared outside functions, are marked static to make them ac-
cessible only within the file (that is, their scope is the file). For these purposes, the
source file is understood as the translation unit controlling the scope of objects
that are internal to it.
2. Functions that make up the interface to the module, i.e. those that will open up the
functionality to the rest of the program should be declared in a header file that can
1 Note that this is a slightly different use of the keyword from the one we have encountered before.
6.3 Pointers to Functions 75

be included in another file2 . Additionally, if we need to make any global variables


accessible outside the translation unit, they should not have the static type
qualifier, and need to be declared as extern in the outside module where they
are accessed. However, we should in general avoid using global variables of any
kind, preferring instead to pass values cleanly as parameters to functions rather
than accessing them directly from global variables if at all possible.

interfaces - module1.h module2.h

implementation - #include "module1.h"


#include "module2.h"

module1.c module2.c main.c

program

Fig. 6.2: Modular programming.

A note about the #include statement: you might have noticed that all header
files included from the C library (e.g. stdio.h) are enclosed by angle brackets
(< and >). This is the common procedure when including headers that have inter-
faces to the external libraries we are using. The compiler will search for them in
standard locations, as well as in directories we pass to it using the optional -i flag.
For header files that accompany our own source code and are located in the same
directory as the implementation code, we should use double quotes instead. That
will indicate that the file should be searched for locally. Thus, for a module.h
used by a source file in the same directory, we should include it with the line
#include "module.h".

6.3 Pointers to Functions

In C, functions and numeric data are distinct, so we cannot assign a function directly
to a variable3 . However, it is possible to use pointers to refer to subroutines, and
2 Alternatively, they can be declared in parts of the program that will call them. The keyword

extern can be used, but it is not necessary, as functions have external linkage by default.
3 This is possible in some languages, for instance, those where the functional paradigm is imple-

mented.
76 6 Functions

assign these to other pointers. These are called pointers to functions. In fact, just
as the array variable name is a constant pointer to the start address of the array, a
function name is a constant pointer to a given subroutine and can be treated as such.
A function pointer declaration is a little convoluted:
type (*pointer_name) (arguments);
declares a pointer called pointer_name, which can be used to store the address
of a function with a given type as its return type and arguments as its argument
types. It can only store a function with that prototype (any other will make the
compiler barf). For example,
int (*func)(int, int);
is a pointer to a function with two int arguments that returns an int. We could
for instance, assign an existing function to it, for instance, sum(), defined earlier
in the chapter:
func = sum;
We could use them in a program:
a = func(b,c);
The most common application of these ideas is to employ function pointers as
arguments to other functions. Consider the following example. We would like to pro-
cess the contents of two integer arrays, element by element in various ways: adding,
subtracting, multiplying, taking the maximum or minimum of the two, and so on.
This involves repeated application of a function that takes two int parameters and
returns another int as a result.
To implement this, we could design a subroutine that takes four inputs: two ar-
rays, their length and a function to process them. The output can be kept in place in
one of the arrays or placed in another. Let’s use the first option (in-place processing).
Here is the code:
void process(int *data1, int *data2, int len,
int (*func)(int, int)) {
int i;
for(i = 0; i < len; i++)
data1[i] = func(data1[i], data2[i]);
}
This code takes care of the function application. To use it, we have to pass the
arrays, length and an existing function to do the job:
int a[5] = {1,2,3,4,5};
int b[5] = {6,7,8,9,10};
process(a,b,5,sum); /* result: {7,9,11,13,15} */
Other functions can be passed if they match the prototype. For instance,
6.5 Another Synthesis Program 77

int prod(int a, int b) { return a*b; }


int max(int a, int b) { return a > b ? a : b; }
...
process(a,b,5,prod); /* result: {6,14,24,36,50} */
process(a,b,5,max); /* result: {6,7,8,9,10} */
This mechanism will be very useful in a number of applications. Another name
given to the routine passed to the function is a callback, in other words, a routine
we are supplying to be called later by a program. This is in contrast to all the other
functions that we call directly in our code.

6.4 The C Standard Library

The C language is very lean. It does not include many built-in resources for pro-
gramming beyond its formal syntax. For instance, as we have noted for standard
IO we need to employ calls to externally-defined functions that are not part of the
language per se. It is common, however, to the treat the C standard library as an
integral part of the whole C programming environment, if not of the language itself.
Any compilation toolchain that does not supply it is seriously limited.
The C standard library contains a large number of function, type, and constant
definitions that are widely used in programming. These will include IO, mathe-
matical routines, string manipulation, and a series of other utilities. Each particular
header file will allow us to access the prototypes and declarations for a given set
of functionality. We can find extensive information about each subroutine in the li-
brary in section 3 of the system manual. This can be accessed from the shell by the
command man. For instance,
$ man 3 printf
will print the complete information for the printf() function, including header
file, prototype, arguments, return value, etc.

6.5 Another Synthesis Program

With the standard library functions, we can start doing proper sound synthesis. We
will leave discussing any further details of digital audio theory details, such as the
concept of sampling rate (or frequency, fs ), for later. For now, as we have done
before, it is sufficient to say that we will be generating a pulse code modulation
(PCM) signal with a certain number of samples per second (so we will have fs
numbers for each second of audio).
We will generate the purest signal of all, a sine wave. We can do that by using
the sine function. For each number we output, we calculate the sine of an angle ω ,
and as it increases from 0 to 2π and then to 4π and 6π , we will generate complete
78 6 Functions

sine wave cycles. This uses the sin() function from the C math library, declared
in math.h, which implements the following expression4 :

x = sin(ω ) (6.1)
If we use a frequency multiplier f , then we can generate as many cycles as we
want over a given period:

x = sin(2π f t) (6.2)
where t is just the time in seconds. Since we are generating fs numbers per second,
we need a time index n in samples, so t = fns [36]:
 
2π f n
x = sin (6.3)
fs
The following program implements this expression directly:
#include <stdio.h>
#include <math.h>
#define FREQ 440
#define SR 44100
#define DUR 2.0
#define TWOPI 6.283185307179586

int main(){
int i;
double pi2osr = TWOPI/SR;

for(n=0; n < DUR*SR; n++)


printf("%f\n", sin(FREQ*pi2osr*n));
return 0;
}
This program will generate a 2-second digital sine wave at 44100 Hz sampling
rate, as ASCII formatted floating-point numbers printed to the terminal (stdout).
If you run it, you will see the numbers that compose the digital signal. As before,
we can use pipes, redirection and the tobin program:
$. /sine | ./tobin > waveform.raw
As we have seen in Sect. 4.5.2, you can open waveform.raw in an editor, as a
32-bit float-encoded raw soundfile with fs = 44100 and one channel of audio.

4 Some compilers require the command-line option -lm to link to the standard C math library

(libm). You can add this if an undefined symbol error is reported by the linker.
6.5 Another Synthesis Program 79

6.5.1 Plotting

Now that we are able to store data in arrays, we can create a better terminal plotting
program to display this waveform. The idea is that we will use a buffer, which is
a block of memory used to hold data temporarily, to accumulate input samples.
When the buffer is full, we will plot it. The buffer will hold enough numbers to
print the maximum number of columns in the terminal (e.g. 80). To plot the data,
we will check whether each sample matches the number of the line we are currently
printing. The input data is expected to be in the normal range [−1.0, 1, 0] and is
scaled up to the plot range.
Since the standard output is line oriented, we have no choice but print line-by-
line, even if the intention is to print the data in columns. As the printing position can
only move to the right and downwards, we will need to pay attention to this when
plotting. Here is a function that does this: it takes a data buffer (array), the maximum
and minimum plot values, and the number of samples in the buffer:
void plot(float *data, int ymin, int ymax, int nx) {
int n, m;
/* for each value in the range [ymin, ymax] */
for(m=ymax; m >= ymin; m--) {
/* on each column */
for(n = 0; n < nx; n++) {
/* print zero line */
if(m == 0) printf("-");
/* print star if rounded value matches */
else if(lround(data[n]*ymax) == m)
printf("*");
/* else print blank */
else printf(" ");
}
/* jmp a line */
printf("\n");
}
}
We proceed from the top left of the figure, from line ymax to line ymin and
plot an asterisk if the value of the waveform at a given column matches the line
number. Since the signal sample is a floating-point number, we use the standard
library function lround() to round it to the nearest integer before we compare
it. When we reach the end of the line (nx columns), we move to the next line,
decrementing the line count.
With this function, we can write a simple program, plot2, to take data from
the standard input and print it to the terminal. In this case, the code assumes an 80
80 6 Functions

column by 24 line display, but this can be modified by setting the COLS and LNS
constants5 :
#include <stdio.h>
#include <math.h>

#define COLS 80
#define LINS 24

int main(){
float buffer[COLS];
int err, n;
do {
/* get data input from stdin into buffer */
for(n=0; n < COLS; n++)
err = scanf("%f", &buffer[n]);
plot(buffer, -(LINS-1)/2, (LINS-1)/2, COLS);
/* clear buffer */
for(n=0; n < COLS; n++)
buffer[n] = 0;
} while(err != EOF);
return 0;
}
The program will read ASCII float samples from the standard input until an EOF
signal is detected. If we set the sine wave frequency to 5506 , by modifying the FREQ
constant, and run the program, piping its output to the plot2 input,
$ ./sine | ./plot2
we will get the plot to the terminal shown in Fig. 6.3.

6.5.2 Realtime

Furthermore, if we have a program that can send ASCII samples directly to the
soundcard digital-to-analogue converter (DAC), then we can also use the sine pro-
gram to generate audio in realtime. This will employ the same pipe mechanism as
in the raw-waveform writing and terminal plotting programs, except that the desti-
nation is now the default soundcard in the system. Supposing this program is called
todac7 , then
5 In fact, we will be using 23 lines. In order to accommodate the [-1,1] range, we need an extra line
to account for values at 0. Therefore the plot requires an odd number of lines.
6 This is to line up a single period with the terminal size. Actually, a 551.25 Hz wave would

complete a single cycle in 80 samples at 44100 Hz.


7 We will study this program later, in Chapter 11, where we will also find its source code.
6.6 Arguments to main() 81

Fig. 6.3: A plot to the terminal using plot2.

$ ./sine | ./todac
will play a 440 Hz sine wave.

6.6 Arguments to main()

C programs can accept initial parameters when they start. These are normally passed
from the shell in the form of separate arguments when the program is invoked. De-
pending on the shell and on the system, there may be other ways to pass these pa-
rameters. However, they are generally accepted in a C program in the same manner,
regardless of their source, as arguments to the main() function, the entry point to
the program.
To give arguments to a program, we use a second form of main(), which we
have not yet discussed. Arguments are passed to any program through two param-
eters declared in the main() function. These are usually called argc and argv,
but these names can be anything. What is important is that the types match what the
linker will expect as the main function prototype:
int main(int argc, const char *argv[]);
The argc parameter gives the number of arguments passed to the program and
is declared as an int. Programs receiving no parameters will have an argument
count of one. The argv parameter is an array of constant strings containing any
arguments passed to the program. The first string in this array is always the program
name. For example,
82 6 Functions

#include <stdio.h>

int main(int argc, const char *argv[])


{
int i;
for (i=1; i<argc; i++) printf("%s\n", argv[i]);
return 0;
}
This program will print out all of its arguments, starting with the program name.
Note that the argv parameter can also be declared as const char**, which
indicates a two-level indirection (pointer to pointer to char), a two-dimensional
array. In this particular case, the two forms are equally applicable.

6.6.1 Translating Arguments

Each argument is a string. In some cases these arguments might need to be converted
or translated into numeric data of different types. The following standard C library
functions declared in stdlib.h can be used for this:
int atoi(const char *string); // string to integer
double atof(const char *string); // string to double
The following example demonstrates their use:
#include <stdio.h>
#include <stdlib.h>

int main(int argc, const char *argv[])


{
double a,b;
if (argc < 3) {
printf("too few arguments \n");
return 1;
}
a = atof(argv[1]);
b = atof(argv[2]);
printf("%f \n", a + b);
return 0;
}
In addition to these functions, conversions from ASCII strings to numeric data
can alternatively be done with strtof() (float), strtod() (double), and
strtol() (long int). These allow an initial portion of a longer string to be
converted. Unlike atof() etc., they also output a pointer to the remainder of the
input string so that further conversions or other operations on it can be carried out.
6.7 Conclusions 83

This is useful when a string contains several numbers that need to be retrieved. See
also sscanf(), which has a similar form to scanf() but takes its input from a
string.

6.7 Conclusions

This chapter set out to discuss subroutines as the final element to complement struc-
tured programming in C. We have examined all relevant aspects of functions, from
definitions to call semantics and prototypes. We saw how arrays and pointers can
be used as arguments, allowing functions to reference externally-defined memory
in addition to local variables. An alternative form of the main() function was also
introduced, together with the mechanism for passing arguments to a program. We
concluded the chapter with a new digital synthesis example, which was an improve-
ment on the earlier example, since we were able then to use the C standard library
mathematical functions to generate accurately pure tones of a given frequency. In
the next chapter, we will be able to conclude the study of the C language per se, so
that we can move on to looking at specific libraries that will be relevant to musical
applications.

Problems

6.1. Write a program that takes any number of arguments and reports the number of
characters in each of them.

6.2. Write a version of the synthesis program presented in Sect. 6.5 that takes a
frequency value as a command-line parameter.

6.3. Write a function to print the first N numbers of the Fibonacci sequence, defined
as F0 = 0, F1 = 1, Fn+1 = Fn + Fn−1 [29]: {0, 1, 1, 2, 3, 5, 8, 13, ...}.

6.4. Write a function that will take an input pressure amplitude in N/m2 and con-
verts it into sound pressure level (SPL) values in decibels. Write a program that
will print an SPL value given a certain pressure value at the command line. Use
a
the expression SPL(a) = 20 log10 20×10 −6 and the math library function log10()
(header file: math.h).
Chapter 7
Structures

Abstract User-defined types are the main topic of this chapter. We look at how
these can be defined via C-language structures and unions. We show how to manip-
ulate these new types, and how they can be treated like other built-in types, through
standard variables, arrays, and pointers. The chapter concludes with a look at bit-
oriented operations.

Arrays allow us to store contiguously in memory a number of data items of the


same type. The C language completes this with a means to reserve a non-uniform
block of memory that can contain a combination of elements of various different
types. This is implemented through structures.

7.1 Defining a New Type

In addition to all the built-in data types we have been using so far, including arrays,
it is possible to define new ones based on a combination of these. This is done by
creating a struct block, giving it a name, and adding elements to it called member
variables:
struct name
{
type member_name;
...
};
Once a new type is defined, we can use it our program to declare any variables
we need. To do this, we use again the keyword struct followed by the name we gave
our new type, and the variable name:
struct name var;

© Springer Nature Switzerland AG 2019 85


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_7
86 7 Structures

To facilitate the use of the new type in a simpler way, we can employ the
typedef mechanism. This allows us to define new names for already existing
types:
typedef type new_type;
This can be applied to a structure name after it has been defined,
typedef struct name my_type;
or directly to the structure definition,
typedef struct name
{
type member_name;
...
} my_type;
allowing my_type to be used in variable declarations:
my_type var;

7.1.1 Member Access

Structure members can be accessed by concatenating the name of a variable with


a dot and the name of the member we want to select, as in var.member. For
example, let’s consider the case of a data structure that models a synthesiser note.
For this new type, we need one integer to keep a note number (i.e. using the MIDI
protocol1 ), and two floating-point members to keep the amplitude and duration (in
seconds) of the note:
typedef struct note {
int number;
float amp;
float dur;
} NOTE;
Once that is defined, we can declare one or more variables of this type:
NOTE a = {0, 0.f, 0.f}, b = {60, 0.f, 0.f};
noting that we can initialise each member using a comma-separated list of con-
stants/variables inside brackets, matching the order of declaration of the structure
members. Alternatively, we can initialise members out of order by quoting their
names following a dot:
NOTE d = { .amp = 0.f, .dur = 0.f, .number = 62};

1 See Chapter 12.


7.1 Defining a New Type 87

To assign a value to a member variable, or to access it, we use a dot notation


similar to that used for initialisation, but prefixing it with the variable name instead:
a.amp = 1.f;
a.number = b.number;
b.dur = a.dur + 1.f;
It is also possible to assign a whole structure to another, which will copy all of
its members across to the destination:
a = b;
This also means that structures, like other types, can be used as function ar-
guments. As we have seen in Chapter 6, this implies copying parameters into the
arguments, which work as local variables for the function. For example, if we want
to implement operations on the NOTE data type, we need to supply a set of functions
to do that:
/* transpose pitch */
NOTE transpose(NOTE x, int semitones) {
x.number += semitones
return x;
}

/* scale amplitude */
NOTE ampscale(NOTE x, float gain) {
x.amp *= gain;
return x;
}

/* change duration */
NOTE temposcale(NOTE x, float amt) {
x.dur *= amt;
return x;
}
Note that while all of this copying in and out of the function (the return value
is also copied out) is probably OK for a structure that is small in size like NOTE.
Larger ones might create an overhead that might not be ideal. In this case, we should
consider keeping the structures in place and passing pointers to variables as param-
eters.

7.1.2 Pointers to Structures

A pointer to a structure works under the same principles as built-in variables. We


can declare it, as usual, by using an asterisk,
88 7 Structures

my_type *p;
and assign it to an existing memory address, dereference it, etc.:
NOTE *p;
NOTE a = {60, 1.f, 0.25f};
NOTE melody[7];
p = melody;
while(a.number < 67) {
*p++ = a;
a.number++;
}
To access the members, we need to dereference it first, and then apply dot selec-
tion. Since the latter operation has higher priority than the former, we need to use
parentheses to ensure the correct order:
(*p).amp = 0.f;
This is slightly awkward, but thankfully there is a simpler version provided by
the -> selector, which is the dot counterpart for pointers:
p->amp = 0.f;

7.2 Functions in Structures

Structure members can be of any built-in or user-defined type. This excludes func-
tions, which are not types themselves, but allows pointers of any kind, including
pointers to functions. Sometimes it is useful to pack together a series of operations
inside a data structure on which they are supposed to work. For instance, it would
be nice to be able to have a function that outputs the frequency in Hz (cycles per
second, cps) corresponding to a note number. We could include this as part of the
NOTE type to keep things together:
typedef struct note {
int number;
float amp, dur;
double (*cps)(struct note);
} NOTE;
This only creates a slot to hold the function. We now need to define the function
and then add it to an instance of the type as part of its declaration2 :
double func(NOTE x){
return 440.*pow(2., (x.number - 69.)/12.);
}
2 See Sect. 12.3.5 for more details on the expression used for the note number to cps conversion.
7.4 Enumerations 89

...
/* initialise a */
NOTE a= {60, 1.f, 1.f, func}, b;
/* get the pitch of the note in Hz */
double hz = a.cps(a);
/* copy a to b */
b = a;
Note that the function pointer func is copied from variable a to variable b as
part of the assignment in the last line. The operation is then available for that variable
also. While it looks a bit awkward in this trivial example, adding function pointers to
structures can facilitate some important means of coding that will lead us to object-
oriented programming.

7.3 Unions

Similarly to structures, C has a mechanism to create a hybrid type that can have two
or more different interpretations, called a union. In this case all members share the
same memory space, so, if one of them gets modified, this will be reflected in the
others. For instance,
typedef union _conv {
unsigned char bytes[4];
int whole;
float real;
} converter;
makes a union of a four bytes, an integer and a floating-point number. It allows us
to access the memory as an integer, a real, or four individual bytes:
converter a;
a.whole = 0; /* sets it to 0, as an int */
a.real = 3.5; /* sets it to 3.5 as a float */
a.bytes[3] = 255; /* sets the third byte */
Note that each access above will modify the variable memory in some way. The
first one resets it to zero, the second sets its four bytes to carry a floating-point
number, and the third modifies only the third byte by setting all of its bits.

7.4 Enumerations

C provides a means of easily making enumerations, i.e. sequential lists of integer


constants:
enum {ZERO, ONE, TWO, THREE};
90 7 Structures

This creates four constants set to 0,1,2,3, which can be used in the program as
ZERO, etc. This is what we call an anonymous enumeration. We can also give it a
name:
enum numbers {ZERO, ONE, TWO, THREE};
and declare variables of the type enum numbers to use in the program. A new
type can also be created with typedef, as before:
typedef enum numbers {ZERO, ONE, TWO, THREE} nums;
nums b = ZERO;

7.5 Bitwise Operations

As a final C language topic, we will look at a set of low-level facilities that allow us
to work on individual bits of an integer. These are known as bitwise operations, and
differ fundamentally from the kinds of expression we have seen so far. Two main
groups of operators exist: those dealing with binary logic and those implementing
the shift of bits in a variable.

7.5.1 Bitwise Logic

A number of operators are defined for bitwise logic operations, which treat integers
as bit fields rather than a binary representation of a given decimal number. They
compare each bit of one operand with the corresponding bit of another operand.

1. &: bitwise AND .


2. |: bitwise inclusive OR .
3. ˆ : bitwise exclusive OR .
4. ∼: bitwise negation (one’s complement, unary operator).

The bitwise AND (&) returns a set bit (1) only when both sides of the operation
have that bit set. It is often use with bitmasks to filter bytes off an integer:
short mask = 0xFF00, value, masked;
value = 0x0111;
masked = mask & value;
In the example above, the mask will only let the higher byte pass, filtering off the
lower one. So the value of masked will be 0x0100:
7.5 Bitwise Operations 91

0000 0001 0001 0001


& 1111 1111 0000 0000
0000 0001 0000 0000

The bitwise OR (|) returns a set bit when either of the operands has a set bit. It
is used to turn bits on (and to combine bytes).
masked = mask | value;
will turn the higher-order byte to 0xFF, resulting in 0xFF11:

0000 0001 0001 0001


| 1111 1111 0000 0000
1111 1111 0000 0000

The bitwise exclusive-OR returns a set bit when only one operand has a set bit,
otherwise it will return a zero. The unary one’s complement operator (∼) converts
each set bit into a zero and vice versa. Bitwise logic operators can be combined in
shorthand expressions with the assignment operator, for the updating of variables,
for example:
value &= mask; // same as value = value & mask;
There are several uses for bitwise logic. The most common of them is to use each
bit of a number to determine whether an option is turned on or off in a program. For
example, the following program uses an 8-bit integer to hold eight different options
that can be selected individually. If a given bit is set, the option is selected. We have
a list of constants in an array, each defining one bit. When an option is selected, we
OR it with the options list, so that the given bit is set. Later, when we want to check
which options have been chosen, we AND the list of options and each different
option constant:
#include <stdio.h>

int main()
{
unsigned int i = 1;
char options = 0;
char opt[8] = {0x01, // 0000 0001
0x02, // 0000 0010
0x04, // 0000 0100
0x08, // 0000 1000
0x10, // 0001 0000
0x20, // 0010 0000
0x40, // 0100 0000
92 7 Structures

0x80}; // 1000 0000

while(i != 0) {
printf("select an option 1-8 (0 to quit): ");
scanf("%u", &i);
if(i && i < 8)
options |= opt[i-1]; // select the option
}

for(i=0; i < 8; i++)


if(options & opt[i]) // if the option was selected
printf("selected option %d \n", i+1);

return 0;
}

7.5.2 Bitshift Operators

Two operators can be used to shift bits in an integer:


<< /* left shift */
>> /* right shift */
They shift bits by a number of positions specified by the right-hand operand:
x << 1 // shifts all bits by 1 position to the left
x >> 2 // shifts all bits by 2 positions to the right
Left shifts fill the vacated bits with 0-bits. Right shifts will depend on the type
of the operand: for unsigned types, bits will be filled with 0s; for signed types, the
sign bit is preserved and the shifted bits will fill with the sign bit (the first bit). This
is platform-dependent, but it is the norm in the systems we use. They employ a
representation for signed integers called two’s complement. In it, the first bit (sign)
is 1 for negative numbers and 0 for positive ones. Left shifts will also preserve the
sign bit.
This means that left shifts are equivalent to multiplication (a fast way of doing it;
see Fig. 7.1):
x << n // multiplication by 2ˆn
Likewise, right shifts are equivalent to division (with rounding, see Fig. 7.2):
x >> n // division by 2ˆn
So, a fast way of multiplying or dividing by 2 is to left or right shift a number by
one position. The division will be rounded down to an integer.
7.6 Conclusions 93

0 1 0 1 1 0 1 1 a = 91;

1 0 1 1 0 1 1 0 a << 1; //182

Fig. 7.1: Bitwise left shift.

0 1 0 1 1 0 1 1 a = 91;

0 0 1 0 1 1 0 1 a >> 1; //45

Fig. 7.2: Bitwise right shift.

7.6 Conclusions

In this chapter, we have seen the final elements of the C language syntax. It is a
wonder that we can introduce the whole of the language in a few chapters, but that
is a significant characteristic of C: it is small. From now on, we will be concerned
with the libraries that make up a modern computing environment, in particular those
that deal with sound and music computing. The power of the C language resides in
the combination of this simple, small set of rules, with the huge variety of system
libraries that provide specific functionality for particular tasks. In the next chapter,
we will start the next stage of our journey by looking at memory management.

Problems

7.1. Using a bitwise operation, write a program that checks if a user-provided num-
ber is a power of two.

7.2. Algorithmic Music Composer: the task in this problem is to develop a pro-
gram that can generate scores using Stochastic Music principles. The music will be
written as a numeric score for a system such as Csound (or equivalent). This score
should be printed to the terminal (using printf()).

(a) General outline:


94 7 Structures

– The program should ask for three inputs from the user: (i) the total number of
notes; (ii) the initial note; and (iii) the random walk interval (> 1).
– The program should generate five parameters for the score:
(1) the instrument number: a discrete random choice of a minimum of 2 instruments
(2) the note start time: random number values (starting from 0 secs). The sequence
of notes will have to be increasing in time, each note should not start earlier that the
previous (but can start at the same time). The random values should be limited so
that the next note never starts more than 1 sec after the current one.
(3) the note duration: a random value between 0.5 and 1.5 (secs)
(4) the note amplitude: a random value between 0.0 and 1.0.
(5) the note number (pitch): apply a random walk algorithm3 over a closed range
from 0 to 127 (MIDI note numbers4 ).

Notes:
– The C standard library function rand() can be used for all random number gen-
eration. See the relevant manual page for more details on how it works. Note that you
will need to keep the random numbers to various ranges (use the modulo operation).

– The score can use any numeric format, but should contain the five parameters
as outlined above. We suggest the use of the Csound standard numeric score as the
output as it provides a simple but structured format, which can be played directly.

– A data structure holding note parameters might be useful for modelling each note
in a sequence/list.

3 See [15, 365–8] for details on this algorithm.


4 See Chapter 12.
Chapter 8
Memory Management

Abstract With most of the C language already covered, this chapter looks at the
fundamental principles of dynamic memory allocation and management. The main
C standard library functions designed to create, expand, and dispose of free-store
memory are introduced. We employ these in two basic applications: dynamic arrays
and linked lists

Up to now, we have not been concerned with how memory is allocated in a


program. All we know is that when we declare a variable in a block, it comes into
existence while that block is active (i.e., if it is a function, during a call) and then
gets destroyed when the program leaves the block. This is the type of storage called
automatic. The mechanisms for it are managed by the compiler at compile time,
regardless of whether we are using a single variable, an array, or a structure. We do
not have to worry too much about the details of memory allocation, it is generally
seamless.
However, this can be problematic in two particular cases:
1. When we do not know how much memory we will need at compile time. As we
have seen, for instance, it is not possible to use a variable to define the size of an
array.
2. When the memory space required is substantial. Automatic variables and arrays
are allocated in a part of the program memory space called the stack, which might
not have enough space for very large memory blocks.
To cover these cases, we need to be able to manage the program memory in a
more precise way. This is done through dynamic memory allocation.

8.1 Allocating Memory

Memory management is provided by the C standard library, whose stdlib.h


header file supplies functions to allocate and dispose of memory space. These use

© Springer Nature Switzerland AG 2019 95


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_8
96 8 Memory Management

a different part of the program memory space, called the heap, which can handle
larger blocks in a dynamic way.
The basic allocation function is malloc():
void *malloc(size_t size);
This allocates a certain number of bytes (size) (size_t is an integer type) and
returns the address of that location as a generic pointer (type void*). With the
built-in sizeof() operator, we can retrieve the size of any data type at compile
time. For example, with
int *pa = (int *) malloc(sizeof(int)*N);
we can create an int array dynamically, where N is the number of items in it.
Because malloc is used to allocate an unspecified memory block, it returns a void
pointer, which then needs to be cast to the right type (int *, in this case). After
the memory is allocated, you can use pa as an array. For example,
pa[n]
is the element n of the array.
In addition to malloc, we have calloc:
void *calloc(size_t count, size_t size);
which allocates a count number of items of size size and resets the memory to
zero.
Note that when allocating space for strings, we need to take account of the ter-
minating '\0' character, so we should always add one extra character to the length
of the string. The strlen() function returns the length of a given string without
its terminating character and can be used in calculating the necessary space. The
function strdup() duplicates a string, allocating memory for it:
char *strdup(const char *s1);
which should be disposed of, after use.

8.1.1 Reallocation

If the memory has already been allocated, but it needs to be expanded or contracted,
the realloc() function can be used. It allocates new space and copies the existing
data to it, returning a new pointer to this location:
void *realloc(void *ptr, size_t size);
where ptr is the original memory address. This will be disposed of once the real-
location is completed. When a region allocated with calloc() is extended, there
is no guarantee that the extra memory will also be filled with zeros.
8.2 Dynamic Arrays 97

8.1.2 Freeing Memory

It is left to the programmer to dispose of memory that has been allocated dynami-
cally. If this is not done, the program will leak memory, which is never a good state
of affairs. To free memory we use
void free(void *ptr);

8.1.3 Setting and Copying Memory Blocks

The C standard library provides functions to reset and copy whole memory blocks.
These functions are declared in string.h. To set each byte of a memory location
to a given value we can use memset:
void *memset(void *b, int c, size_t len);
This function writes len bytes of value c, converted to an unsigned char,
to the memory b. It returns b. Note that this function is almost exclusively used with
c = 0, to set an area of memory to 0.
To copy data from one block to another, we can use
void *memcpy(void *dst, const void *src, size_t n);
This function copies n bytes from the memory area src to the memory area
dst. The memory blocks should not overlap. If this is the case, then memmove()
should be used instead.

8.2 Dynamic Arrays

We can take advantage of the memory management functions provided by the C


library to implement storage that can be dynamically resized. It is often the case
that we need to expand an array according to changes in program state. We can thus
design a module to provide this facility to our programs. For example, let’s consider
a data structure to model a variable-size floating-point array:
typedef struct _dynarray {
unsigned int size;
unsigned int length;
float *array;
} dynarray;
With this, a dynamic array can be created to have a given initial size. The array
should be allocated with some space to spare for future growth, and this is deter-
mined by the underlying length of the memory location we are using (Fig. 8.1).
98 8 Memory Management

This allows its size to grow without the need for reallocation, which can be an ex-
pensive operation. Under these conditions, the module can define a function that
will create a dynamic array, as well as another one to release the allocated memory:
dynarray *dynarray_create(unsigned int size) {
dynarray *p = (dynarray *) malloc(sizeof(dynarray));
p->size = size;
p->length = size * 2;
p->array = (float *) calloc(p->length, sizeof(float));
return p;
}

void dynarray_delete(dynarray *p) {


free(p->array);
free(p);
}

 length -
 size -

Fig. 8.1: Dynamic array.

We also need to provide means to access the data (getter and setter functions).
Since we are holding the size of the array, we can protect against fencepost errors
(i.e. accessing beyond the array size):
float dynarray_get(unsigned int index, dynarray *p) {
if(index < p->size) return p->array[index];
else return 0.f;
}

void dynarray_set(unsigned int index, dynarray *p,


float val) {
if (index < p->size)
p->array[index] = val;
}
Finally, we need to provide a means of resizing the array that will trigger a re-
allocation if we exceed the underlying memory space:
void dynarray_resize(unsigned int size, dynarray *p) {
if (size < p->length) p->size = size;
8.3 Linked Lists 99

else {
p->size = p->length;
p->length = size * 2;
p->array = (float *) realloc((void *) p->array,
p->length*sizeof(float));
memset((char *) (p->array + p->size), 0,
(p->length - p->size)*sizeof(float));
p->size = size;
}
}
Note that we make sure the newly allocated space is cleared (set to 0), as we did
in the dynarray_create() function (by using calloc()). With this module
in place, we should have enough flexibility to manipulate arrays that need to grow
(or indeed shrink).

8.3 Linked Lists

As we have seen above, the combination of dynamic memory allocation and struc-
tures allows us to design a new data type that can be grown or shrunk. However, for
some applications, array-style storage, where we use contiguous memory locations
for each data object, is not always ideal. This is especially the case if we need to
insert, delete, or reorder items. For these applications, we can avail of a linked list
[29].
Each element of a linked list is defined by a structure that will normally hold two
kinds of members: the data it holds and one or more link addresses (Fig. 8.2). These
are used to connect elements together (hence the name) so that we can manage the
list more cohesively.

- - - - NULL

Fig. 8.2: Linked list.

For example, a singly-linked list of integers would look like this:


typedef struct _elem {
int data;
struct _elem *next;
} elem;
100 8 Memory Management

To create a list we start with an empty list1 :


elem *head = NULL;
We can add items to the list (appending them):
elem *append_elem(elem *p, int data){
elem *newp = (elem *) calloc(1,sizeof(elem));
if(p != NULL) {
/* find the last element */
while(p->next != NULL)
p = p->next;
/* link the new element in */
p->next = newp;
}
newp->data = data;
return newp;
}
The function above returns a pointer to the last element of the list. Note the use
of calloc(), which ensures that the structure pointers are reset at the start. It is
also important to be able to delete each element (from the end of the list):
elem *remove_last(elem *p){
elem *r = NULL;
if(p != NULL){
/* find the last element */
while(p->next != NULL){
r = p;
p = p->next;
}
/* free the memory */
free(p);
/* unlink the deleted element */
if(r != NULL) r->next = NULL;
}
return r;
}
This also returns the last element so we can keep track of the end of the list. The
last element to be removed returns NULL, so we could use this function to destroy
the whole list (in a loop).
Lists are particularly flexible for inserting, as well as removing links, without the
need to move elements around (Fig. 8.3). To do this, once we have created a new
link, we only need to modify the links at the relevant position:
elem *insert_elem(elem *p, unsigned int pos, int data){

1 The NULL pointer is used to define that it is not pointing to any address.
8.3 Linked Lists 101

-
?
- - - NULL

Fig. 8.3: Inserting a new item into a linked list.

if(p != NULL) {
unsigned int n = 0;
elem *newp = (elem *) calloc(1,sizeof(elem)),
*head;
head = pos ? p : newp;
/* find the insert position */
while(++n < pos && p->next != NULL)
p = p->next;
/* insert the element */
newp->next = p->next;
newp->data = data;
p->next = newp;
return head;
} else return NULL;
}
The following program demonstrates these principles:
int main(){
elem *head = NULL, *p;
int i = 0;

head = append_elem(head, 0);


printf("head: %d \n", head->data);
while(++i < 5) {
p = append_elem(head, i);
printf("added %d to list\n", i);
}

head = insert_elem(head, 2, -2);

do
printf("deleting %d from list \n", p->data);
while((p = remove_last(head)) != NULL);

return 0;
102 8 Memory Management

}
When this program is run, it will print the numbers appended to the list, insert
one new element, and then print the numbers deleted from it. Note how we proceed
by removing items from the end of the list, in this case:
$ ./list
head: 0
added 1 to list
added 2 to list
added 3 to list
added 4 to list
deleting 4 from list
deleting 3 from list
deleting 2 from list
deleting -2 from list
deleting 1 from list
deleting 0 from list
Other operations can be added to navigate, search, set and get elements, etc.
The example provided here is of a singly-linked list, which is the simplest kind.
It is also possible to add a double link (both forward and backward), which can
be more useful for some applications. The principle of linked lists is very useful in
applications where we want to work with a variable-size collection of data elements.

8.4 Conclusions

In this chapter, we have introduced some key mechanisms of memory management.


We have seen that it is possible to access large quantities of memory space, from
an area called the heap, to use them in a program. It is very important that we are
careful when allocating space that we avoid leaks, areas of unused or unreachable
memory that we have reserved for our programs but never managed to release. We
have also seen how dynamic memory allocation can be used in the creation of linked
lists that can grow and shrink as required. Memory management will also be very
important when we start dealing with file data, in the next chapter. We will see that
in many applications we need to set aside specific portions of memory to copy data
into for processing. Since we might not know how much of it we need, we will have
to use dynamic memory allocation.
8.4 Conclusions 103

Problems

8.1. Write a program that takes in any number of non-negative integers as command-
line arguments and sorts them in ascending order. Use dynamic memory allocation
and arrays, and check for valid inputs, and free the memory when finished.

8.2. Write a monophonic sine wave synthesis program that will read a sequence of
pitches in Hz from the standard input and play them in a sequence (each one of them
lasting for one second). Use a linked list to store the pitch data and check for EOF
(ctl-d) to signal the end of input.
Chapter 9
File Input and Output

Abstract This chapter expands our means of input and output by introducing file
operations defined by the standard C library. We first look at formatted text output
and then explore the principles of generic binary file access. The chapter concludes
with an application example of file IO that is supported by the sound and music
computing system Csound.

As with other types of IO, file access is not provided directly by the C language.
This type of service relies on libraries or system calls provided by the OS. The low-
level form of file access in UNIX-like systems is given by the open(), read(),
and write() (all declared in unistd.h). This is often not portable to other
platforms. However, where the C standard library is present, we can use a higher-
level interface provided by that library, which is more programmer friendly (and
portable). This chapter will concentrate on the major functions for file manipulation
found in the standard C library.

9.1 Standard C Library File IO

All file IO functions, data structures, macros and type definitions in the C library are
defined in stdio.h along with the other standard IO functions we have already
seen. They provide means for reading and writing text files and/or, more generally,
binary data files, such as sound and MIDI files.
The C standard [24] defines that any IO operation, whether it is directed to or
from various types of hardware, or from files on storage devices, is mapped through
logical data streams. Two distinct types of mapping are identified, text and binary.
The latter is an ordered sequence of characters that matches the internal data used
by the computer, whereas the former is a line-oriented sequence of characters, each
line being made up of zero or more characters terminated by a newline character.
Implementations may or may not distinguish between these two types, but they are
commonly treated separately. A stream can also have an orientation, which may

© Springer Nature Switzerland AG 2019 105


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_9
106 9 File Input and Output

be byte-oriented or wide-oriented. The orientation of a stream is determined by the


first use of either a byte-oriented IO function, or a wide-character IO function. In
this book, we will only discuss byte-oriented streams.
Independently of the type of file we want to open, we use fopen():
FILE *fopen(const char *filename, const char *mode);
This function opens a file stream defined by a FILE structure. The name of the
file to be opened is filename (which must be a valid name). The mode string
determines how the file may be accessed [24]:
• "r": open file for reading.
• "r+": open for reading and writing.
• "w" : truncate to zero length or create text file for writing.
• "w+": open for reading and writing. The file is created if it does not exist, other-
wise it is truncated.
• "a": open for appending, write-only. The file is created if it does not exist.
• "a+": open for appending, read and write. The file is created if it does not exist.
The stream is positioned at the beginning of the file for the reading and writing
modes and at the end of the file for the appending modes. The C standard [24] also
asks for the inclusion of the letter b in the case of opening non-text (binary) files.
Some compilers do not require this, making no distinction between file types. The
standard also provides for an exclusive mode, denoted by the letter x, which will
require a new file to created for the writing modes ("wx", "w+x"), and will make
the function return an error if the file already exists. With a file opened using the
append mode, all subsequent writes to the file are forced to the then current end of
file, regardless of any calls to fseek() or similar functions (see Sect. 9.3.1 for
these).
If the open operation is successful, fopen() returns a valid file stream handle.
This FILE* handle will be used with all other functions that operate on the open
file and it is opaque, i.e. should not be touched or changed directly. If fopen()
fails, it returns a NULL pointer so the return value must always be checked for this.
For example,
FILE *fp;

if ((fp = fopen("myfile", "r")) ==NULL){


printf("Error opening file\n");
}
To close a file, we use fclose(), whose prototype is
int fclose(FILE *fp);
This function closes the file stream associated with fp, which must be a valid
handle previously obtained using fopen(), and disassociates the stream from the
file. The fclose() function returns 0 if successful and EOF (the end-of-file con-
stant) if an error occurs. Any open file streams are also closed when the main()
function returns.
9.2 Text File Functions 107

The OS provides three open file streams that can be used with the file-writing
or reading functions. These correspond to the standard input, stdin (open for
reading); the standard output, stdout (open for writing); and the standard error,
stderr. Programs should not open or close these streams, as they are provided by
the system.

9.2 Text File Functions

A number of functions are provided for text file IO. First we have fputs() and
fgets(), which write and read a string to and from a file, respectively. Their
prototypes are:
int fputs(char *str,FILE *fp);
char *fgets(char *str, int num, FILE *fp);
The fputs() function writes the string str to the file stream fp. It returns
EOF if an error occurs and a non-negative value if successful. The null that termi-
nates str is not written. The fgets() function reads characters from the file fp
into a string str until num-1 characters have been read, a newline character is
encountered, or the end of the file is reached. The string is null-terminated and the
newline character is retained. The function returns str if successful or NULL if an
error occurs. Single-character functions are also available:
int fputc(int c,FILE *fp);
int fgetc(FILE *fp);
where the character in c gets written into the stream after conversion to unsigned
char, both functions returning the character written or read, or EOF if an error
occurred (or the end of the file was reached). A character can also be pushed back
into the stream using
int ungetc(int c,FILE *fp);
and subsequent reads to the stream after calls to this function will retrieve the pushed
characters in reverse order.
The two remaining text IO functions are fprintf() and fscanf(). These
functions operate in a similar fashion to printf() and scanf() except that they
work with files. Their prototypes are
int fprintf(FILE *fp, const char *fmt, ...);
int fscanf(FILE *fp, const char *fmt, ...);
and they read/write to/from an open file stream. Note that
int fscanf(stdin, const char *fmt, ...);
int fprintf(stdout, const char *fmt, ...);
are equivalent to printf() and scanf(), since we are using the stdin and
stdout streams, respectively, for input and output.
108 9 File Input and Output

The following is a simple example of a text-writing program:


#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) {


FILE *fp;
char buffer[1024];

fp = fopen(argv[1], "w");
if(fp != NULL) {
printf(" Type in your text (use 'end' to finish) \n");
do {
scanf("%s", buffer);
if(strcmp("end", buffer) == 0) break;
fprintf(fp, "%s ", buffer);
} while (1);
fclose(fp);
return 0;
}
else printf("could not open the file %s \n", argv[1]);
return 1;
}

9.3 Direct File IO Functions

The standard C library includes two general-purpose direct file IO functions,


fread() and fwrite(). These functions can read and write any type of data.
Their prototypes are:
size_t fread(void *buffer, size_t size, size_t num,
FILE *fp);
size_t fwrite(void *buffer, size_t size, size_t num,
FILE *fp);
The fread() function reads from the file fp num number of items, each of them
size bytes long, into buffer. It returns the number of items actually read. If this
value is 0, no objects have been read.
The fwrite() function does the opposite of fread(). It writes to the file fp
num number of items, each item size bytes long, from buffer. It returns the
number of items written. This value will be less than num only if an output error has
occurred. The buffer argument in these functions holds the address of a block of
memory with enough space to hold the data that will be read into or written from.
9.3 Direct File IO Functions 109

9.3.1 Reading/Writing Position

We can position the file stream reading/writing position to the start of the file using
rewind(). Its prototype is
void rewind(FILE *fp);
It is possible to place the stream pointer at a certain position in bytes in a file, by
using
int fseek(FILE *fp, long offset, int whence);
This will position the read/write pointer at the offset position (in bytes), rela-
tive to the value of whence parameter, which can be one of:
1. SEEK_SET: the offset is the absolute position from the beginning of file.
2. SEEK_CUR: the offset is the position from the current read/write pointer posi-
tion.
3. SEEK_END: the offset is calculated in relation to the end of the file. The offset
can then be negative or positive (extending the length of the file); the function
returns 0 if successful, or the constant EOF if not.
We can find the current position by using
int ftell(FILE *fp)
The position of a stream can also be manipulated via fgetpos() and
fsetpos():
int fgetpos(FILE * restrict fp, fpos_t *restrict pos)
int fsetpos(FILE *fp, const fpos_t *pos)
These work with an opaque object pos of type fpos_t, which is unspecified.
The first function records stream positions and the second can set the stream to an
earlier recorded position. It is not possible to increment or decrement the position
given by fgetpos(), but we can use it to position the stream with fsetpos().

9.3.2 Error Reporting

Diagnostics on IO operations are provided by three functions:


int feof(FILE *fp)
int ferror(FILE *fp)
void perror(const char *s)
The first of these reports on the end-of-file (EOF) indicator for the stream,
whereas the second checks for the error indicator, both returning non-zero if these
are set, or zero if not. The final function prints an error message to the standard er-
ror stream (stderr), with an optional prefix message taken from the string s. This
message will be relevant to the latest IO operation attempted by the program.
110 9 File Input and Output

9.4 File System Functions

The standard C library also includes means to manipulate the file system, so that pro-
grams can remove, rename, or create temporary files. Under the stdio.h header
file, we have:
1. The remove() function, which deletes a file, preventing any subsequent access
to it:
int remove(const char *filename);
2. The rename() function, which changes the name of a file from old to new:
int rename(const char *old, const char *new);
3. The tmpfile() function, which creates and open a temporary file in mode
wb+. This file is removed when the stream is closed:
FILE *tmpfile(void);
According to the standard [24], it should be possible to open a TMP_MAX number
of temporary files. This constant is defined in the header file.

9.5 Programming Examples

In this section, we look at two examples of file reading and writing. The first
is the implementation of a text-to-binary conversion program. This is followed
by a computer-aided composition application that is designed to work with the
Csound [7, 39] software.

9.5.1 The tobin Program

We now present the code for the tobin program, with which, in Chapter 4, we
were able to convert a stream of audio data as text-character floats into a sequence
of binary numbers (32-bit floats). The input is read from stdin and the output to
stdout (Fig. 9.1).
The code to realise this is minimal, it takes data from the input until the stream
is finished (EOF) and places it in the output, one number at a time:
#include <stdio.h>

int main(){
float f;
while(fscanf(stdin, "%f", &f) > 0)
fwrite(&f, sizeof(float), 1, stdout);
9.5 Programming Examples 111

stdin
text
?
float -
f
fscanf() fwrite()

binary
?
stdout

Fig. 9.1: ASCII to binary conversion in tobin.

return 0;
}

9.5.2 External Score Generation for Csound

Csound is a sound and music computing system and a domain-specific language [34],
which can be used in a variety of ways. One of these is to furnish a numeric score
for its instruments [36] to perform. Scores, alongside the sound synthesis code the
system uses, are provided via XML-like script files called CSD files. We can con-
figure these to call an external score generator program to provide a new numeric
score every time we run the CSD file through the system [39]. This allows us to use
the C language directly in computer-aided or algorithmic composition applications.
This is done using the bin attribute of the score tag in the CSD file (as demon-
strated below). This attribute names an external executable which is expected to
take in an input text file name as its first argument, and writes to another text file
whose name is the second argument. Csound will invoke this user-supplied program
passing these files as arguments. The input file will receive the contents of the score
section of a CSD file. This allows the program to receive any text parameters defined
there. The output of the program has to be a score in the standard numeric format,
which is written to the file named as the second argument. Csound will then use this
file as its score.
In the example below, the program will look for a single floating-point num-
ber in the score. With this in hand it will write 10 lines, each one containing an
i-statement [39] that will run instrument 1 defined in Csound code. The input param-
eter is used as the starting pitch (in octave.pitchclass notation) of a chromatic-scale
sequence:
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[]){


112 9 File Input and Output

int i;
FILE *fp;
char str;
float f;
if((fp = fopen(argv[1], "r")) != NULL){
fscanf(fp, "%f", &f);
fclose(fp);
}

if((fp = fopen(argv[2], "w")) != NULL) {


for(i=0; i < 10; i++) {
fprintf(fp, "i1 %d %d %f %f \n",
i, 1, 0.1+i/10.0, f+i/100.);
}
fclose(fp);
}
else fprintf(stderr, "could not open file \n");
return 0;
}
If the program above is compiled to a command named scoret, then the fol-
lowing Csound CSD code can be used with it:
<CsoundSynthesizer>
<CsOptions>
-odac
</CsOptions>
<CsInstruments>
0dbfs=1
instr 1
out oscili(p4,cpspch(p5))
endin
</CsInstruments>
<CsScore bin="./scoret">
8.00
</CsScore>
</CsoundSynthesizer>

9.6 Conclusions

This chapter has introduced the fundamental means to manipulate text and binary
files in a program. We saw how they are opened for reading or writing, and how we
can get or store data from or to them. We saw that the OS provides three special
streams that we can use to write to the standard IO in the same way as we write to
9.6 Conclusions 113

files, and we demonstrated this in our tobin program, which we used in Chapter 4
to convert from text to a binary representation so that our synthesis data could be
read by a sound editor. We will see in the next chapter how we can do this directly
via soundfiles.

Problems

9.1. Write a program that writes the command-line arguments to a file called test.txt.

9.2. Write a program that can open a text file (such as test.txt above) and print its
contents to the terminal.

9.3. Write a version of the tobin program that reads from a file and writes to
another. Take the names of the input and output files from the command line.

9.4. Write a version of the sine wave synthesis program in Chapter 6 that writes
directly to a binary file. Take as arguments the frequency and the output filename.
Chapter 10
Soundfiles

Abstract The specific case of soundfile IO is discussed in detail in this chapter.


Some principles of digital audio are outlined: sampling, digital-to-analogue and
analogue-to-digital conversion, data precision, channels, and basic operations. To
complement this discussion, a widely used library for soundfile IO, libsndfile, is
introduced.

In this chapter, we will be discussing the basic aspects of sound storage in com-
puter files. Soundfiles are very important for music programming, as they provide a
medium for manipulating audio in a computer. Historically, they were the first type
of support for computer music and until very recently they were the typical means of
input and output for a sound-generating program. Soundfiles provide a way of im-
plementing computer musical signal processing in a platform/device-independent
way, without the need to consider more complex issues relating to realtime perfor-
mance, audio device access, etc.

10.1 Digital Audio

We have seen in the examples of sound synthesis developed in Chapters 4 and 6 that
an audio waveform is treated by the computer as a sequence of numbers defining it
a regular points in time. This is a type of digital encoding called pulse code modu-
lation [36]. In addition to this, there are other ways to represent an audio waveform
in digital form, but these are not generally used directly in audio synthesis and pro-
cessing. Some of them are designed for data compression, reducing the size of the
information that is required to be stored or transmitted. In these applications, data
is converted from PCM into one of these formats as needed (and back to PCM for
manipulation). The process of encoding a waveform into a digital form is called
analogue-to-digital conversion, and its converse is digital-to-analogue conversion.
PCM encoding provides us with a transparent and straightforward way to treat a
waveform. It is based on the principle of periodic sampling (Fig. 10.1), that is, taking

© Springer Nature Switzerland AG 2019 115


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_10
116 10 Soundfiles

measurements of a waveform at regular time intervals, and quantising (Fig. 10.2),


which is finding an output number that will best represent the instantaneous value
of the waveform at the sampling point. Each number of a digital signal is called a
sample, and a sequence of these will make up a digital waveform that is the com-
putational form of the real-world acoustic waveform. This sequence can take many
forms: floating-point numbers, integers, ASCII-encoded (text) or binary. In the case
of the programs we developed earlier on, we were using a text encoding of floating-
point numbers, which we then translated to binary for storage or playback. This
form is the most common way of handling digital audio, although, as we have seen,
text can be used as well, for simplicity (and portability).

Fig. 10.1: Sampling a waveform (adapted from [36]).

A program that can open binary files for reading and writing can be used to
manipulate digital audio data directly. However, interpreting the contents of a digital
audio signal will depend on some knowledge about its characteristics. In particular
three aspects are significant:
1. How often the samples are taken: the sampling frequency.
2. How the samples are encoded: the sample precision.
3. How many channels the audio signal carries.

10.1.1 Sampling Frequency

The fundamental parameter that defines how we are supposed to interpret an audio
signal is the sampling frequency, or rate. This is actually a form of playback speed:
how fast the different numbers are supposed to exit the computer through the DAC.
10.1 Digital Audio 117

In synthesis, this will also determine the pitch of a signal, since changing the sam-
pling rate will speed up or slow down the playback. Normally, the sampling rate is
set as a constant, and we can then calculate all other parameters in relation to it. We
determine it in terms of samples per second (also written as Hz). The CD standard
demands a sampling frequency of 44,100 Hz, but it is also common to see higher
rates such as 48,000 and 96,000 used in production settings.
The choice of sampling frequency has two implications:
1. In accordance with the Sampling Theorem [58, 49], it determines the frequency
range of a system. No signal with frequencies over twice the sampling rate can
be encoded properly in a digital signal. Any such signals will be aliased to fre-
quencies below this threshold; that is, they will be indistinguishable from other
signals originally present at those frequencies.
2. The storage and data processing rate will increase with the sampling frequency.
Higher rates will demand more storage space, faster processing, faster transmis-
sion, etc.
The frequency threshold of half the sampling rate is known as the Nyquist fre-
quency and it is a very important constant in digital signal processing. The range of
frequencies below this threshold is also known as the digital baseband [61].

10.1.2 Sample Precision

Digital audio samples can be encoded in integral or floating-point formats [36]. The
type of encoding will determine how much precision is available to the quantiser to
represent the sample. For instance, 8-bit integers can be used to hold 256 different
values. The quantising stage of the ADC will divide the range of values of a wave-
form between its minimum and maximum into however many regions are available
in a given format (see the example in Fig. 10.2, where for a 5-bit number there are
32 distinct regions). This discretisation process will be more error prone if there are
fewer steps, and the result will include a higher level of noise [66]. Integral encoding
precision is the determined directly by the number of bits, and the maximum signal-
to-noise ratio is roughly defined as 6 dB per bit, improving as we increase the size
of storage (e.g. 48 dB for 8 bits, 96 dB for 16 bits, 120 dB for 24 bits) [48, 61]. The
performance of floating-point encoding is generally at least as good as 24-bit integer
for single precision, and much better in the case of double precision [36]. Note also
that increasing the number of bits in each sample will require more capacity for the
storage of an audio signal block.
For integral encodings, the maximum amplitude of a signal will also vary accord-
ing to the number of bits employed. For instance, for 8 bits, the maximum absolute
amplitude of a bipolar signal is 128 (a range of −128 to 127). In the case of 16
bits, this maximum is 32768. In the case of floating-point formats, the amplitude is
always expected to be in the normal range of −1.0 to 1.0. This is another reason
118 10 Soundfiles

31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0

Fig. 10.2: Linear quantisation into 32 regions (5-bit samples).

why it is preferable to handle audio signals as floating-point numbers, which can


then ultimately be scaled and converted to one of the integer formats if required.

10.1.3 Audio Channels

Finally, audio signals can hold one or more independent channels. When it comes
to computing a multichannel stream, there are two ways we may treat it:

1. In an interleaved form, whereby each sample point refers to a frame of samples,


one for each channel.
2. As completely separate single-channel data, in two separate locations, non-
interleaved.

The first form is fairly common. In this case the audio stream is made up of a
sequence of frames, and the sampling frequency then refers to a frame rate. Each
frame is composed of a series of samples in ascending channel order. In this case,
if we want to access channel n of N channels, we need to start with an offset of n
samples and then pick every Nth sample after that.
10.2 Basic Operations on Signals 119

10.2 Basic Operations on Signals

Some basic operations can be summarised as follows:

• Gain: gain scaling, or changing the amplitude of an audio signal is done by ap-
plying a multiplier (called the gain) to each sample in the stream, eg.:
out[n] = in[n] * gain;
If the gain value changes (slowly) over time, we can have an amplitude envelope
(or modulation, if the variation is periodic).
• Mix: mixing signals is equivalent to adding them together (summing):
out[n] = in1[n] + in2[n];
• Stereo pan: to place a signal between two stereo channels, we can apply pro-
portionally different amplitudes to each. This is called amplitude panning. For
instance, to place a signal at the left speaker, we apply 1 and 0 to L and R sam-
ples, respectively. For a midway placement, we use 0.5 and 0.5. For instance, a
simple pan control between 0 and 1 (L and R) could be implemented with this
code (Fig. 10.3):
left[n] = in[n]*pan;
right[n] = in[n]*(1.0 - pan);
While this algorithm does not provide equal-power panning from centre to left
or centre to right, it demonstrates the principle of amplitude panning in a simple
way.

in

(1 − p)- ×i ×ip
? ?
L R
Fig. 10.3: Simple amplitude panning flowchart.

When scaling or mixing two or more streams we have to be careful that the re-
sulting signal does not exceed the maximum amplitude for the given sample format.
120 10 Soundfiles

10.2.1 A Synthesis Example

The following example shows a simple synthesis program, which will generate a
single-channel soundfile containing a fixed-frequency sine wave using the method
demonstrated in Section 6.5. It illustrates the principles of digital signals outlined
above and produces a soundfile directly, as shown in Listing 10.1. Note that we
generate data in blocks rather than single samples, as it is in general more efficient
to do so [13],

Listing 10.1: Soundfile sine wave synthesis program.


#include <math.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv){

FILE *fpout; // output file pointer


float *audioblock; // audio memory pointer
int end, i, j; // dur in frames, counter vars
int sr = 44100; // sampling rate
int blockframes = 441; // audio block size in frames
unsigned int ndx = 0; // phase index for synthesis
float dur, freq; // duration, frequency
double twopi; // 2*PI

if(argc != 4) {
printf("usage: %s outfile dur freq \n", argv[0]);
exit(-1);
}

/* command line parameters */


dur = atof(argv[2]);
freq = atof(argv[3]);

/* set the value of 2*PI */


twopi = 8*atan(1.);
/* set the total duration in frames */
end = (int)(dur*sr);
/* open the file */
fpout = fopen(argv[1], "w");
/* allocate memory */
audioblock = (short *)
malloc(sizeof(short)*blockframes);

/* this is the synthesis loop */


10.2 Basic Operations on Signals 121

for(i=0; i < end; i+=blockframes){

for(j=0; j < blockframes; j++, ndx++){


/* calculate the samples of a sinewave */
audioblock[j] = (float)(0.5*
sin(ndx*twopi*freq/sr));
}
/* write to the output */
fwrite(audioblock,sizeof(float),
blockframes, fpout);

/* de-allocate memory and close file */


free(audioblock);
fclose(fpout);

return 0;
}
In order to interpret the audio data stored in the resulting file, we need to provide
the sampling rate, the encoded format, and the number of interleaved channels in
the stream, as well as the byte order (44100, 32-bit little-endian float, 1). Without
this information, it is hard to be interpret the raw data.

10.2.2 Byte Order

Raw soundfile data is generally not portable across multiple platforms. As we have
seen in Chapter 2, multi-byte numbers can be stored in different byte orders, de-
pending on the hardware. As we have seen in Sect. 2.1.1, little-endian ordering puts
the LSB first and then the remaining bytes in increasing order of significance. Big-
endian ordering puts the MSB first and the other bytes in decreasing significance
order. This is yet another reason to avoid the use of raw data as the sole means of
audio storage.

10.2.3 Self-Describing Soundfile Formats

The fact that sample data is meaningless without any information as to how it rep-
resents a digital signal points to the need for additional elements to be stored with
the sound itself. So far, we have been handling raw soundfiles, because we know
what to expect from the sample data. However, if we want to make our soundfiles
more flexible and portable, we will need to use a self-describing soundfile format.
122 10 Soundfiles

This will store along with the audio, information about the sampling rate, the num-
ber of channels, the sample width (precision), the number of sample frames in the
file and other useful information. Each soundfile type will also imply a certain byte-
ordering, which will adopted across all platforms. Programs handling these formats
will have to be prepared to read and write all this extra information alongside the
audio data in a standard binary form for a particular format. Supporting the huge va-
riety of soundfile types that are available to users is a significant issue for software
developers.

10.3 The libsndfile Library

The best way to handle different file formats is to use a dedicated library that can ma-
nipulate them seamlessly. Currently, libsndfile [40] is one of the best such libraries,
supporting several soundfile types with a transparent interface. All the different el-
ements that make up the various formats are hidden away and the library provides
a unified way of accessing all of them. There is no need to write code that targets a
specific format, as the library will take care of that for us.

10.3.1 Opening Files

The libsndfile application programming interface (API) provides a single function


to open files for reading or writing. This takes a name (or full path) string, an open-
ing mode SFM_READ, SFM_WRITE or SFM_RDWR, and a pointer to an existing
SF_INFO variable (defined, alongside all libsndfile functions, in sndfile.h):
SNDFILE *sf_open(const char *path, int mode,
SF_INFO *sfinfo);
It returns an opaque pointer1 to a SNDFILE structure. The reading or writing
operations will depend heavily on the contents of the SF_INFO variable, whose
type is the following structure:
typedef struct SF_INFO{
sf_count_t frames;
int samplerate;
int channels;
int format;
int sections;
int seekable;
} SF_INFO;

1 Opaque here means we will use it as a black box only, not accessing its contents directly.
10.3 The libsndfile Library 123

Each call to the open function should refer to a separate instance of this data
structure. If we are to open a file for reading, then we need to pass a pointer to
an empty variable of this type, which will then be filled with information on the
various parameters from the data in the file. If we are opening a file for writing, then
we need to fill the variable with the desired values for its members before calling
sf_open(). Not all structure members are relevant to our discussion here. We
need only be concerned with samplerate (sampling frequency), channels, and
format. While the first two are self evident and will carry the values for sampling
frequency and number of channels, the third requires some further explanation.
The format, in the case of libsndfile, is a code to determine two things: (a) the
soundfile format we want to write, or are reading, and (b) the sample and encoding
format used in storage. The first corresponds to the major format and the second,
to the subtype. We combine these options together using a bitwise OR (|). The
following list comprises a selection of the most important formats and subtypes
supported by libsndfile:

• Major formats:
SF_FORMAT_WAV /* Microsoft WAV */
SF_FORMAT_AIFF /* Apple/SGI AIFF format */
SF_FORMAT_AU /* Sun/NeXT AU format */
SF_FORMAT_RAW /* RAW PCM data. */
SF_FORMAT_PAF /* Ensoniq PARIS file format. */
SF_FORMAT_SVX /* Amiga IFF / SVX8 / SV16 format. */
SF_FORMAT_NIST /* Sphere NIST format. */
SF_FORMAT_VOC /* VOC files. */
SF_FORMAT_IRCAM /* Berkeley/IRCAM/CARL */
SF_FORMAT_W64 /* Sonic Foundry's 64 bit RIFF/WAV */
SF_FORMAT_MAT4 /* Matlab (tm) V4.2/GNU Octave 2.0 */
SF_FORMAT_MAT5 /* Matlab (tm) V5.0/GNU Octave 2.1 */
SF_FORMAT_PVF /* Portable Voice Format */
SF_FORMAT_XI /* Fasttracker 2 Extended Instrument */
SF_FORMAT_HTK /* HMM Tool Kit format */
SF_FORMAT_SDS /* Midi Sample Dump Standard */
SF_FORMAT_AVR /* Audio Visual Research */
SF_FORMAT_WAVEX /* MS WAVE with WAVEFORMATEX */
SF_FORMAT_SD2 /* Sound Designer 2 */
SF_FORMAT_FLAC /* FLAC lossless file format */
SF_FORMAT_CAF /* Core Audio File format */
• Subtypes:
SF_FORMAT_PCM_S8 /* Signed 8 bit data */
SF_FORMAT_PCM_16 /* Signed 16 bit data */
SF_FORMAT_PCM_24 /* Signed 24 bit data */
SF_FORMAT_PCM_32 /* Signed 32 bit data */
SF_FORMAT_PCM_U8 /* Unsigned 8 bit data (WAV/RAW) */
124 10 Soundfiles

SF_FORMAT_FLOAT /* 32 bit float data */


SF_FORMAT_DOUBLE /* 64 bit float data */
SF_FORMAT_ULAW /* U-Law encoded. */
SF_FORMAT_ALAW /* A-Law encoded. */
SF_FORMAT_IMA_ADPCM /* IMA ADPCM. */
SF_FORMAT_MS_ADPCM /* Microsoft ADPCM. */
SF_FORMAT_GSM610 /* GSM 6.10 encoding. */
SF_FORMAT_VOX_ADPCM /* OKI / Dialogix ADPCM */
SF_FORMAT_G721_32 /* 32kbs G721 ADPCM encoding. */
SF_FORMAT_G723_24 /* 24kbs G723 ADPCM encoding. */
SF_FORMAT_G723_40 /* 40kbs G723 ADPCM encoding. */
SF_FORMAT_DWVW_12 /* 12 bit Delta Width Var Word */
SF_FORMAT_DWVW_16 /* 16 bit Delta Width Var Word */
SF_FORMAT_DWVW_24 /* 24 bit Delta Width Var Word */
SF_FORMAT_DWVW_N /* N bit Delta Width Var Word */
SF_FORMAT_DPCM_8 /* 8 bit differential PCM */
SF_FORMAT_DPCM_16 /* 16 bit differential PCM */

A WAVE file with float (single precision) encoding is defined by the following
format code:
sfinfo.format = SF_FORMAT_WAV | SF_FORMAT_FLOAT;

10.3.2 Reading and Writing

The libsndfile reading and writing functions are defined in two ways:
• By the type of audio data buffer we are supplying to it.
• By how we are counting the data, in samples or in frames.
Tables 10.1 and 10.2 list the names of the functions for each of these categories.
Their general form is

sf_count_t sf_xxxxx_type(SNDFILE *sf, type *data,


sf_count_t n);

where xxxxx determines whether it is a write or a read function, and whether we


are counting in frames or samples. The argument sf is a handle to an open soundfile,
data is an array from which we will read or to which we will write, and n is the
size of the data in samples or frames, depending on the specific function employed.
The read/write functions will return the number of samples or frames read/written,
as sf_count_t, which is an integer type defined by the library to hold values up
to SF_COUNT_MAX.
10.3 The libsndfile Library 125

Table 10.1: libsndfile reading functions

type samples frames


short sf_read_short() sf_readf_short()
int sf_read_int() sf_readf_int()
float sf_read_float() sf_readf_float()
double sf_read_double() sf_readf_double()

Table 10.2: libsndfile writing functions

format samples frames


short sf_write_short() sf_writef_short()
int sf_write_int() sf_writef_int()
float sf_write_float() sf_writef_float()
double sf_write_double() sf_writef_double()

As we noted in Sect. 10.1.2, floating-point data will default to the normalised


(−1.0, 1.0) range, whereas the two integer formats will have a range that depend
on their minimum and maximum signed values. Regardless of the type of data we
are using when reading or writing, libsndfile will make sure it is converted correctly
to the format and rangesdefined by the subtype we are using in storage. It is also
possible to configure the behaviour of libsndfile so that the floating-point range is
not normalised by default.

10.3.3 Seeking

It is possible to move the reading or writing position to any existing position in the
file. We can do this using the sf seek() function, which will offset the current
position in a similar way to fseek(), but specifically in relation to the start of the
audio data:
sf_count_t sf_seek(SNDFILE *sndfile, sf_count_t frames,
int whence);
The offset is always calculated in frames, and the whence parameter can be ei-
ther SEEK_SET, SEEK_CUR or SEEK_END, determining that the offset refers to
the start of the waveform data, the current position, or the end of the data, respec-
tively.
126 10 Soundfiles

10.3.4 An Example Program

The following program opens an input soundfile and pans it into a stereo output.
The input and output formats will be the same, except for number of channels.
The program checks for a minimum number of arguments (three plus the program
name), that both files have been opened, and that the input is mono. If one of these
conditions is not true, it will exit with an error message.
The processing core is composed of this loop:
do {
cnt = sf_read_double(fin, inbuf, bframes);
for(i = j = 0; i < cnt; i++) {
outbuf[j++] = inbuf[i] * (1. - pan);
outbuf[j++] = inbuf[i] * pan;
}
sf_writef_double(fout, outbuf, cnt);
} while (cnt > 0);
where we read a number of frames of one channel into the array inbuf, which
is the input buffer. As we have seen in Sect. 6.5, this a block of memory we use
to keep data in temporarily before processing. Then we enter an inner loop, which
processes every single sample of the output, placing it in the two channels of the
output buffer, scaled appropriately to implement the amplitude panning (Fig. 10.3).
Note that while the input buffer counts using the variable i, the output uses j, which
increases by two in each iteration of this loop.
The output buffer is written to the open file. We only process and output as many
frames as we have read (cnt). Once the input data is exhausted, the program frees
the memory, closes the files and exits. The full program is shown in Listing 10.2.

Listing 10.2: Soundfile panning program.


#include <stdio.h>
#include <stdlib.h>
#include <sndfile.h>

int main(int argc, const char *argv[]){


const int bframes = 512; /* buffer size */
double *inbuf, *outbuf; /* buffers */
SNDFILE *fin, *fout; /* file ptrs */
SF_INFO info_in, info_out; /* format */
double pan; /* pan position */
if(argc > 3) {
if((fin = sf_open(argv[1],
SFM_READ, &info_in)) != NULL) {
if(info_in.channels == 1) {
info_out.format = info_in.format;
info_out.samplerate = info_in.samplerate;
10.3 The libsndfile Library 127

info_out.channels = 2;
if((fout = sf_open(argv[2],
SFM_WRITE, &info_out)) != NULL) {
size_t cnt, i, j;
inbuf = (double *) calloc(bframes,
sizeof(double));
outbuf = (double *) calloc(bframes*2,
sizeof(double));
pan = atof(argv[3]);
do {
cnt = sf_read_double(fin, inbuf, bframes);
for(i = j = 0; i < cnt; i++) {
outbuf[j++] = inbuf[i] * (1. - pan);
outbuf[j++] = inbuf[i] * pan;
}
sf_writef_double(fout, outbuf, cnt);
} while (cnt > 0);
free(inbuf);
free(outbuf);
sf_close(fin);
sf_close(fout);
} else {
sf_close(fin);
printf("ERROR: could not open %s \n",
argv[2]);
return 1;
}
} else {
sf_close(fin);
printf("ERROR: input %s not mono\n", argv[1]);
return 1;
}
} else {
printf("ERROR: could not open %s \n", argv[1]);
return 1;
}
} else {
printf("usage: %s input output pan \n", argv[0]);
return 1;
}
return 0;
}
128 10 Soundfiles

Compiling and linking

Since now we are using external libraries, and not only the C standard library, we
have to tell the compiler where to find the headers and the library. To compile and
link to libsndfile, first we need to know where it is installed. If it is in the system
directories, then we only need to add the linker flag -lsndfile, which will cause
the program to be linked to the library routines. If however, the library is not in-
stalled there, we need to indicate where its files are to be found. For headers, we
can give a directory to be searched for it with -I /path/to/includes, where
/path/to/includes should be replaced by the path to the directory where
sndfile.h is located. For library binaries, we need to do the same, but using
the -L flag instead. For example, if the library is installed in /usr/local/lib
and the headers in /usr/local/include, the full command will be
$ cc -o pan pan.c -I/usr/local/include \
-L/usr/local/lib -lsndfile

10.4 Conclusions

The libsndfile API is also very well documented; its website www.mega-nerd.com/
libsndfile contains excellent reference documentation on the programming interface.
We strongly advise readers to refer directly to this information as a complement to
the basic principles outlined in this chapter. Since the library is always evolving, the
details of any slight change in the interface or addition of new features will be fully
documented there. With this library under our belt, we are now ready to start writing
complete offline applications to process audio. This capacity will be enhanced by
realtime audio, which will be explored in the next chapter.

Problems

10.1. Write a program that synthesises two sine waves of different frequencies last-
ing one second, each one panned midway from centre to the left and right sides, pro-
ducing a raw binary soundfile with 16-bit 44,100 Hz samples. The program should
take three arguments: filename, left frequency and right frequency.
10.2. Write a program using libsndfile that changes the gain of an input file, writing
a new file as its output.
10.3. Write a program using libsndfile that mixes the two channels of a stereo file
into a mono file output.
10.4. Write a program for mixing soundfiles, with the following characteristics:
10.4 Conclusions 129

(a) Accepting any soundfile formats supported by libsndfile.


(b) Taking only uncompressed PCM format, in any (integer or floating-point) preci-
sion (8-bit (signed/unsigned), 16-bit, 24-bit, 32-bit, floats, doubles).
(c) Accepting only matching sampling rate values (print an error message other-
wise).
(d) Producing stereo files from mono and/or stereo input files; mono files should be
panned, stereo files are mixed as they are.
(e) Expecting a mix gain to be set for each soundfile.
Chapter 11
Realtime Audio

Abstract This chapter discusses the fundamental aspects of realtime audio program-
ming and access to sound devices. Two widely-used APIs are introduced and con-
trasted: Portaudio and the Jack connection kit. Programming examples are offered
for each, demonstrating realtime processing in C.

Realtime audio synthesis and processing depend on a number of components of


a computer system:
1. Hardware: the right kind of hardware containing a fast central processing unit and
peripherals, that can feed a digital-analogue converter with enough data to ensure
an uninterrupted audio stream. For audio processing, we also need an analogue-
to-digital converter, which will provide the source data for computation. The
hardware should ideally provide very small latencies (time delays) between input
and output, on the order of a few milliseconds. Some latency is inevitable as
data is processed in blocks rather than in single units (samples), but it should be
minimal.
2. Software: an operating system that can communicate with the ADC/DAC with
very little latency, which depends on fast and flexible switching of tasks, some-
times also referred to as realtime preemption; a suitable API to allow program-
mers to write applications that access the audio hardware (soundcards/devices)
directly.
In order to provide realtime audio capabilities to a program, we will need to call
on the services of system libraries that allow access to the audio devices. These are
platform-dependent: each OS will provide a different library to do the low-level de-
vice communication. In Linux, this is normally done by the ALSA (Advanced Linux
Audio Architecture) subsystem. In MacOS, the CoreAudio and AudioUnit frame-
works are responsible for this functionality. These libraries are also called hardware
audio layers (HALs), as they work very closely with the OS components that man-
age the audio devices. Programs can use these services directly or use higher-level
APIs that provide an intermediary layer. The advantage of operating at this level
is that the APIs will most likely be implemented across various platforms. In this

© Springer Nature Switzerland AG 2019 131


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_11
132 11 Realtime Audio

case, we do not need to rewrite any of the audio IO code when porting a program
from one OS to another. Among these APIs, we can cite Portaudio [5] and the Jack
Connection Kit [14].
Regardless of the choice of API, we will see that a number of things are con-
stant across different systems. Audio signals are, as we have seen in Chapter 10,
a sequence of frames of sample data. These will be produced as a stream by the
soundcard at the rate of fs frames per second (where fs is the sampling frequency).
Each sample in a frame will be encoded in some way, as an integer or floating-point
number, depending on the system options available. The job of a realtime audio
program is to pick up this data stream, process it as efficiently as possible and then
send a corresponding audio signal to the output device. The program has to deliver
enough data, at a speed that must exceed the sampling rate, to keep the stream con-
tinuous, without gaps. If it cannot keep up, the result will be drop-outs: the output
buffer will contain silent or garbage frames that will interrupt the audio waveform
with clicks and pops.
Buffering (i.e. placing the audio data in memory blocks for processing) is re-
quired for continuous and smooth IO operation. Generally speaking, the larger the
buffer size, the less likely that the stream will be interrupted by gaps in computation.
On the other hand, buffering introduces a degree of latency between input and out-
put, and for true realtime operation, we should attempt to limit this to a minimum.
IO latencies of over 20 ms are likely to be perceived by users, depending on the type
of processing applied. The amount of buffering required will depend significantly on
the computation load, on the OS, and on the audio subsystem. A well-tuned Linux or
MacOS computer should be capable of achieving latencies close to the millisecond
mark.
We can determine the latency l introduced by buffering as the total number of
buffer frames in the input (n) and the output (m) divided by the sampling frequency
fs :
m+n
l= (11.1)
fs
In addition to this latency, which is attributed to the program code, there can be
other latencies introduced by software and hardware buffers in the layers below the
user code. Generally speaking accessing the HAL directly should minimise these,
but that will depend on the OS and its audio subsystem.

11.1 Portaudio

In this section, we will introduce Portaudio1 as an example of a cross-platform real-


time audio IO library. This API allows users to write programs that can take advan-
tage of various lower-level audio host libraries implemented across different OSs. It

1 http://www.portaudio.com.
11.1 Portaudio 133

also supports interfacing with other higher-level systems such as Jack (in both Linux
and MacOs) and Pulseaudio (Linux) (Fig. 11.1). The Portaudio functions, constants,
and data structures are defined in its public header portaudio.h, which should
be included in any source code employing them.

- MS Windows APIs
- Jack
client - Portaudio - ALSA (Linux)
software
- Pulseaudio (Linux)

- Coreaudio (MacOS)

Fig. 11.1: Portaudio and its underlying APIs.

Prior to its use, we initialise the library with a call to Pa_Initialize() (the
API is defined in portaudio.h). If this call is successful, we can go ahead and
call other functions. The library defines a type PaError for error codes, and the
constant paNoError indicates success:
PaError err;
err = Pa_Initialize()
if(err == PaNoError) printf("Portaudio initialised\n");
If on the other hand an error is thrown, we can retrieve a diagnostic error string
with Pa_GetErrorText(err):
else printf("%s \n", Pa_GetErrorText(err));

11.1.1 Listing Devices

The library provides a means of listing existing devices in a system. We can get
the total number of logical devices (which are mapped to existing physical audio
devices) with Pa_GetDeviceCount(). Devices may be configured for input or
134 11 Realtime Audio

output only, or both (bidirectional). By checking the number of channels in a device,


we can tell whether it is capable of one or more directions. This is one of the fields
in the PaDeviceInfo structure,
typedef struct PaDeviceInfo
{
int structVersion;
const char *name;
PaHostApiIndex hostApi;
int maxInputChannels;
int maxOutputChannels;
PaTime defaultLowInputLatency;
PaTime defaultLowOutputLatency;
PaTime defaultHighInputLatency;
PaTime defaultHighOutputLatency;
double defaultSampleRate;
} PaDeviceInfo;
which holds other characteristics of a given logical device. We can query each device
listed by calling Pa_GetDeviceInfo() and passing the device number to it. The
following code demonstrates this:
ndev = Pa_GetDeviceCount();
for(i=0; i<ndev; i++){
info = Pa_GetDeviceInfo((PaDeviceIndex) i);
if(info->maxOutputChannels > 0)
printf("output device: ");
if(info->maxInputChannels > 0)
printf("input device: ");
printf("%d: %s\n", i, info->name);
}
From this list and the information provided, it is possible to choose one of the de-
vices by selecting its numerical index. The functions Pa_GetDefaultInputDev
ice() and Pa_GetDefaultOutputDevice() can also be used to retrieve the
indices of the respective default input and output devices.

11.1.2 Stream Parameters

Before opening a device, we will need to configure it with the desired stream pa-
rameters. This determines the characteristics of the audio signals we are going to be
processing. The parameters include the chosen device number, number of channels,
sample format, and estimated latency, and are kept in a PaStreamParameters
data structure:
typedef struct PaStreamParameters
11.1 Portaudio 135

{
PaDeviceIndex device;
int channelCount;
PaSampleFormat sampleFormat;
PaTime suggestedLatency;
void *hostApiSpecificStreamInfo; /* NULL */
} PaStreamParameters;
For example, if we wish to select the default devices for mono, using a single-
precision floating point data format, and with a buffer containing bufframes
frames, the data structures should be filled as follows:
PaStreamParameters inparam, outparam;
inparam.device = Pa_GetDefaultInputDevice();
inparam.channelCount = 1;
inparam.sampleFormat = paFloat32;
inparam.suggestedLatency = (PaTime)
(bufframes/sr);
inparam.hostApiSpecificStreamInfo = NULL;

outparam.device = Pa_GetDefaultOutputDevice();
outparam.channelCount = 1;
outparam.sampleFormat = paFloat32;
outparam.suggestedLatency = (PaTime)
(bufframes/sr);
outparam.hostApiSpecificStreamInfo = NULL;
Stream parameters are defined separately for the input and output streams. Note
that by employing a float format we imply that the audio data will range from
−1.0 to 1.0.

11.1.3 Opening Devices

We call Pa_OpenStream() to open devices for input and/or output, passing to it


the address of an opaque pointer to PaStream, which is the stream handle:
PaError Pa_OpenStream(PaStream** stream,
const PaStreamParameters *inputParameters,
const PaStreamParameters *outputParameters,
double sampleRate,
unsigned long framesPerBuffer,
PaStreamFlags streamFlags,
PaStreamCallback *streamCallback,
void *userData );
136 11 Realtime Audio

This is slightly different from what we have seen with other similar functions
(like for instance sf_open(), where the function returns an opaque handle, but
it works in a similar way. By passing the pointer address (a pointer to a pointer),
we allow the function to fill it with the correct pointer value and the net result is
the same: we end up with a handle for using in other functions. As can be seen, the
function returns an error code, and for this reason it has been designed to provide
the handle via a pointer.
The other parameters in Pa_OpenStream() are, in order:

• Stream parameters for input and output respectively. Devices can be opened for
input, output, or both. By supplying stream parameters for input and/or output,
we are determining how we want the streams to be opened. By passing a NULL
instead of an address to a PaStreamParameters variable, we are choosing
not to open the device for a given direction.
• Sample rate.
• Buffer size in frames.
• Stream options, via constants that can be combined with a bitwise OR.
• Callback, a function that will be invoked to process input and/or output buffers.
This is used only in asynchronous mode, otherwise it is set to NULL.
• Callback user data, a data structure that will be passed to the callback function,
it can be NULL if the callback is not defined.

For example, the following call opens devices for input and output streams, pass-
ing sr and frames as the sampling rate and buffer size, respectively. It does use
any special stream options and its IO mode is defined as synchronous (no callback
required).
PaStream *handle;
err = Pa_OpenStream(&handle,
&inparam,&outparam,
sr, frames,
paNoFlag,
NULL,NULL);
On return, the function will give an error code, which should be checked before
proceeding. If successful, the call will place a valid stream handle in handle. This
can be used to start audio IO through the following code line:
err = Pa_StartStream(handle);

11.1.4 Synchronous Mode

The synchronous IO operation is very similar to what we have seen for file reading
and writing. It is sometimes called the push form of audio IO. Functions are pro-
vided to take data from a stream and place it in a buffer, and conversely, to put the
11.1 Portaudio 137

contents of a buffer into a stream. They take the handle, a pointer to the buffer, and
the number of frames in it:
PaError Pa_ReadStream( PaStream* stream,
void *buffer,
unsigned long frames);
PaError Pa_WriteStream( PaStream* stream,
const void *buffer,
unsigned long frames);
The following code shows a direct-through example, where the data is copied
from the input to the output without any changes. It can be used to test the IO of a
system, as well as give an aural indication of the latencies involved. The function
Pa_GetStreamTime() can be used to get the current stream time in seconds
and check if we have reached the end of processing:
while(Pa_GetStreamTime(handle) < duration){
err = Pa_ReadStream(handle, buffer, frames);
if(err == paNoError){
err = Pa_WriteStream(handle, buffer, frames);
if(err != paNoError)
printf("%s \n", Pa_GetErrorText(err));
} else
printf("%s \n", Pa_GetErrorText(err));
}
Synchronous mode is blocking: the program will not continue until the read or
write operation has returned (this is also the behaviour in file IO). It is less respon-
sive and requires more buffering than the asynchronous mode, resulting in longer
latencies.

11.1.5 Asynchronous Mode

Using a callback is non-blocking and tends to be the recommended way to imple-


ment low-latency realtime audio. It is also called pull mode, because the system will
seek audio data when it needs it, rather than have it supplied regularly by a program.
At the core of this method, we have an audio callback whose signature is defined by

typedef int PaStreamCallback(


const void *input, void *output,
unsigned long frameCount,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void *userData );
138 11 Realtime Audio

where we have as arguments the input and output data buffers, the number of frames
in these buffers, a timestamp indicating the stream time of the buffer data, options
(flags), and a pointer to a user data structure variable, which is used to communi-
cate between the program and the callback. The callback is executed in a separate
thread2 , which is started and managed by Portaudio. Since this thread is running
under the same process as the main program thread, it will share resources with it,
such as memory.
The equivalent direct-through processing is implemented by the following call-
back:
int audio_callback(const void *input, void *output,
unsigned long frameCount,
const PaStreamCallbackTimeInfo *timeInfo,
PaStreamCallbackFlags statusFlags, void *userData){
int i;
float *inp = (float*) input, *outp = (float*) output;
for(i=0; i < frameCount; i++) outp[i] = inp[i];
return paContinue;
}
The callback should be written in such a way that it does not block execution and
does not perform any operations that might be too onerous, such as memory alloca-
tion, printing to terminal, reading/writing to files, etc. We call this approach realtime
safe. As a rule of thumb, we should only use code that involves signal processing
computation, so that the callback can be invoked regularly without compromising
the continuous operation of input and output. Any other types of action should be
placed in a different thread (e.g. the main program thread). If communication is
needed between the callback and the rest of the program, it should be done in a
non-blocking way to ensure smooth realtime operation, as will be demonstrated by
examples in this and later chapters.
Note that when employing an asynchronous IO approach, we will need to provide
means of keeping the program open while audio processing is happening. This is
because we are not directly calling the IO function, but instead the audio subsystem
is, in parallel to what is happening in the main() function. As we have noted above,
the two parts of the program are run on separate threads (the main program and the
Portaudio IO callback thread).
If the program falls through the main() function, for instance, it will exit before
there is a chance for the callback to start processing. As we have seen before, a
program will start at the top of this function and finish when it returns, so we have
to delay reaching the end until we are ready to quit. A simple means of achieving

2 These are sections of code that are made to execute in parallel. The audio callback function is
an example of a separate thread that is started and managed by the Portaudio library. There is also
dedicated support for programs to do this in their own code if required. This is provided by the
pthread library [22, 26].
11.1 Portaudio 139

this is to have a simple empty loop (maybe with a call to usleep()3 to avoid
excessive use of resources) that checks for time elapsed:
while(Pa_GetStreamTime(handle) < duration) usleep(1000);

11.1.6 Closing Up

The following sequence of calls can be used to stop processing, close the devices,
and terminate the use of the library:
Pa_StopStream(handle);
Pa_CloseStream(handle);
Pa_Terminate();

11.1.7 The todac Program

In Chapter 6, we discussed a program that took ASCII samples from the standard
input and placed them directly in the audio device. This program can easily be im-
plemented using Portaudio, following the principles outlined above. It uses the syn-
chronous/blocking IO mode, since it is more suited for picking data using a function
such as fscanf(), which is itself blocking. This program can be used with any
floating-point generating software. It can take as parameters the desired number of
channels and sampling rate, which should match what the input stream contains.
The full code for the program is shown in Listing 11.1.
As with libsndfile, we need to pass the name of the library, as well as its location,
to the compiler in the command line. With the library installed, the flag for Portaudio
is -lportaudio. Assuming the library exists in /usr/local/, the command
line will then be:
$ cc -o todac todac.c -I/usr/local/include \
-L/usr/local/lib -lportaudio

Listing 11.1: The todac program.


#include <stdio.h>
#include <stdlib.h>
#include <portaudio.h>
#include <math.h>
#define BUFFRAMES 4096

3 A system call defined in unistd.h that suspends processing for a number of microseconds (1

second = 1000000 microseconds).


140 11 Realtime Audio

void usage() {
fprintf (stderr,
"usage: todac [sr] [channels] < input\n");
exit(1);
}

int main(int argc, const char* argv[]){


PaError err;
PaStreamParameters outparam;
PaStream *handle = NULL;
int i, chn=1,bufsize,sr=44100, dev;
float *buf, out = 0.f;

if(argc > 1) sr = atoi(argv[1]);


if(argc > 2) chn = atoi(argv[2]);
if(argc > 3) usage();

err = Pa_Initialize();
if(err == paNoError){
dev = Pa_GetDefaultOutputDevice();
bufsize = BUFFRAMES*chn;
buf = (float *) malloc(sizeof(float)*bufsize);
memset(buf, 0, sizeof(float)*bufsize);
outparam.device = (PaDeviceIndex) dev;
outparam.channelCount = chn;
outparam.sampleFormat = paFloat32;
outparam.suggestedLatency = (PaTime)
(BUFFRAMES/(double)sr);
outparam.hostApiSpecificStreamInfo = NULL;

err = Pa_OpenStream(&handle,NULL,&outparam,
sr,bufsize,paNoFlag,
NULL, NULL);
if(err == paNoError){
err = Pa_StartStream(handle);
if(err == paNoError){
long cnt, i;
do{
cnt = 0;
for(i = 0; i < bufsize; i++) {
cnt += fscanf(stdin, "%f", &buf[i]);
}
if(cnt > 0) {
err = (int)
Pa_WriteStream(handle, buf, cnt/chn);
11.1 Portaudio 141

if(err != paNoError)
printf("write error: %s \n",
Pa_GetErrorText(err));
}
else break;
} while(cnt > 0);
Pa_StopStream(handle);
} else printf("%s \n", Pa_GetErrorText(err));
Pa_CloseStream(handle);
} else printf("%s \n", Pa_GetErrorText(err));
free(buf);
Pa_Terminate();
} else printf("%s \n", Pa_GetErrorText(err));
return 0;
}
Note that because we are using fscanf() amongst the realtime audio output
processing, this program is not realtime safe. If that function is not provided with
input for a long period, we will have interruptions in the audio stream. However, in
the simple applications for which it is designed, it performs reasonably well, and it
has the advantage of being conceptually very simple.

11.1.8 An Audio Effect

The next example implements an audio effect: amplitude modulation (or tremolo)
[15, 36]. The principle is straightforward; we take in an audio signal and multiply
it by a sine waveform. This makes the amplitude of the signal vary according to the
modulating wave. If the sine wave frequency is in the audio range (> 20 Hz), we
will have an amplitude modulation effect, which results in the sum and difference
of the input signal and sine wave frequencies. If the frequency is in the sub-audio
range, we will hear a tremolo (fluctuating amplitude). The amount of modulation
is controlled by an amplitude parameter a (Fig. 11.2). If this is 1, we have the full
effect. If it is 0, we have just the original input. The expression implementing this is

y(t) = x(t) (1 − a (0.5 + 0.5 sin (2π fmt))) (11.2)


where a is the effect amplitude in the [0,1] range and fm the modulation frequency
(Fig. 11.3).
This example employs a callback to enable low-latency IO and realtime safety.
All of the processing is implemented in this function. It uses a user data structure
UDATA to get the parameters from the main program and also to store the sine wave
generator time index from call to call:
int audio_fn(const void *input, void *output,
unsigned long frameCount,
142 11 Realtime Audio

1.0

0.5

0.0

−0.5

−1.0
500 1000 1500 2000

Fig. 11.2: Tremolo effect with a = 0 (black dots), a = 0.5 (blue), and a = 1 (red), using a sine wave
input.

2πfm t
0.5 1 in
?
0.5- +? - +?
i−a i - ×?
i
sine()
?
out
Fig. 11.3: Tremolo effect flowchart.

const PaStreamCallbackTimeInfo *timeInfo,


PaStreamCallbackFlags statusFlags,
void *userData){
int i;
UDATA *p = (UDATA *) userData;
float *inp = (float*) input, *outp = (float*) output;
float fr = p->freq;
float amp = p->amp;
float sr = p->sr;
unsigned long n = p->n;
for(i=0; i < frameCount; i++, n++)
outp[i] = inp[i]*(1
- amp*(0.5 + 0.5*sin(n*TWOPI*fr/sr)));
p->n = n;
return paContinue;
}
Note that the callback uses only signal processing code and that the operation
is fully non-blocking, as per the realtime requirement. The only function call is to
sin(), which incurs very little computational overhead. The full program is shown
in Listing 11.2. It can be built with the following compiler options:
11.1 Portaudio 143

$ cc -o tremolo tremolo.c -I/usr/local/include \


-L/usr/local/lib -lportaudio

Listing 11.2: Tremolo program.


#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <portaudio.h>
#include <math.h>

typedef struct udata {


float amp; // effect amplitude
float freq; // effect frequency
float sr; // sampling rate
unsigned long n; // time ndx
} UDATA;

int usage();

int audio_fn(const void *input, void *output,


unsigned long frameCount,
const PaStreamCallbackTimeInfo *timeInfo,
PaStreamCallbackFlags statusFlags,
void *userData);

int main(int argc, const char *argv[]){


PaError err;
const PaDeviceInfo *info;
PaStreamParameters inparam, outparam;
PaStream *handle = NULL;
int i, chn = 1, frames = 128, sr = 44100;
float duration;
UDATA parms;

if(argc > 3) {
parms.amp = atof(argv[1]);
parms.freq = atof(argv[2]);
parms.sr = sr;
parms.n = 0;
duration = atof(argv[3]);
} else return usage();

err = Pa_Initialize();
if(err == paNoError){
inparam.device = Pa_GetDefaultInputDevice();
144 11 Realtime Audio

outparam.device = Pa_GetDefaultOutputDevice();
inparam.channelCount =
outparam.channelCount = chn;
inparam.sampleFormat =
outparam.sampleFormat = paFloat32;
inparam.suggestedLatency =
outparam.suggestedLatency = (PaTime)
(frames/(double) sr);
inparam.hostApiSpecificStreamInfo =
outparam.hostApiSpecificStreamInfo = NULL;

err = Pa_OpenStream(&handle,&inparam,&outparam,
sr,frames,paNoFlag,
audio_fn, &parms);

if(err == paNoError){
err = Pa_StartStream(handle);
if(err == paNoError){
while(Pa_GetStreamTime(handle) < duration)
usleep(1000);
Pa_StopStream(handle);
} else printf("%s \n", Pa_GetErrorText(err));
Pa_CloseStream(handle);
} else printf("%s \n", Pa_GetErrorText(err));
Pa_Terminate();
} else printf("%s \n", Pa_GetErrorText(err));
return 0;
}

#define TWOPI 6.283185307179586


int audio_fn(const void *input, void *output,
unsigned long frameCount,
const PaStreamCallbackTimeInfo *timeInfo,
PaStreamCallbackFlags statusFlags,
void *userData){
int i;
UDATA *p = (UDATA *) userData;
float *inp = (float*) input, *outp = (float*) output;
float fr = p->freq;
float amp = p->amp;
float sr = p->sr;
unsigned long n = p->n;
for(i=0; i < frameCount; i++, n++)
outp[i] = inp[i]*(1
- amp*(0.5 + 0.5*sin(n*TWOPI*fr/sr)));
11.2 The Jack Connection Kit 145

p->n = n;
return paContinue;
}

int usage() {
fprintf (stderr,
"usage: tremolo amp freq dur\n");
return 1;
}
As indicated by the message in the usage() function, the program takes three
arguments, the amplitude, frequency, and duration (in seconds), the latter of which
determines how long the program will run for. Any process can have its execution
interrupted by sending a SIGINT signal to it, through typing the ctl-c key se-
quence at the terminal. Thus, if we wish, we can also stop the tremolo program
in this way, before its run time has elapsed.

11.2 The Jack Connection Kit

The Jack Connection Kit4 is another cross-platform API for audio IO. It is well sup-
ported on UNIX-like systems (Linux, MacOS), and available on Windows, although
its status on that platform is not as firmly established. Jack was originally designed to
overcome the shortcomings of the lower-level audio API on Linux (ALSA), which
was never very well designed to work as a user-level programming interface. In ad-
dition to this, Jack also provides a very robust inter-application routing mechanism.
This in fact has become its most popular feature, allowing users to connect diverse
programs together and use the system as a virtual studio. It has become the de facto
standard for professional audio applications realtime IO in Linux, and, to a lesser
extent, on MacOS. In fact, in systems where Jack is present, Portaudio can also use
it as one of its listed device sources and destinations.
Jack works as a client-server system. Applications that want to provide audio
IO connect to the server, registering input or output ports. These are then made
available to all other clients running in the system. Connections can be made pro-
grammatically in the client programs, or via patching (see Fig. 11.4, via a graphical
user interface, or a text-based command-line program). A fully-functional API5 is
provided for clients that are to be linked to the Jack library (-ljack). In the fol-
lowing sections, we outline the basic operations for starting clients, registering and
connecting ports, and processing audio.

4 http://www.jackaudio.org.
5 See http://jackaudio.org/api/index.html for its full reference manual.
146 11 Realtime Audio

Fig. 11.4: Jack patcher window on MacOS (JackPilot).

11.2.1 Opening a Client

A client program can connect to the server through the jack_client_open()


function, defined along with the rest of the API in jack.h:
jack_client_t* jack_client_open(const char *client_name,
jack_options_t options,
jack_status_t *status,
...)
which opens a client session with a server. Its parameters are

• client_name: this provides the name by which this client will be known to
the other clients in the server.
• options: a bitwise-OR combination of options:
– JackNullOption: no options.
– JackNoStartServer: do not attempt to start a Jack server if there is none
running.
– JackUseExactName: always use the exact name requested, otherwise Jack
may generate a unique one.
– JackServerName: connect to a specific server name, passed as an extra
optional argument (const char *).
– JackSessionID: pass a token to allow a session manager program to iden-
tify this client at a later time.
• status: if non-NULL, this provides an address for the server to return infor-
mation about the open operation.
• optional parameter: the Jack server name (if explicitly requested by the option).
11.2 The Jack Connection Kit 147

Given that we are connecting to a server rather than opening a device, there
is not much else we need to do. System parameters such as sampling rate, sam-
ple type, and channels are determined by the server. Jack defines each sample as
jack_default_audio_sample_t. Each client defines a certain number of
input and output streams, each one containing a single channel. So, for multichan-
nel audio, all we need to do is connect to different client ports. The sampling rate is
given by the server and we can query it using
jack_nframes_t
jack_get_sample_rate(jack_client_t *client)
where jack_nframes_t is an integral type also used to count frames in other
settings.

11.2.2 Registering Ports

Signal connections to other clients on the server are made through ports, which are
handled by opaque objects of type jack_port_t. In order for these to be made
available, we need to register them with Jack. This is done through the following
function
jack_port_t* jack_port_register(jack_client_t *client,
const char *port_name,
const char *port_type,
unsigned long flags,
unsigned long
buffer_size)
where a port on a given client is identified by a port_name string and should
be of a given type (JACK_DEFAULT_AUDIO_TYPE in this case). Options can be
passed via flags (as usual, more than one of these are to be bitwise-OR combined),
which define the characteristics of the port:
• JackPortIsInput: the port can receive data.
• JackPortIsOutput: the port can send data.
• JackPortIsPhysical: the port corresponds to some physical/hardware in-
put and/or output.
• JackPortIsTerminal: for an input port, this means that the data received
by it will not be passed out of the client; for an output port, this means that the
data sent out does not originate from any other port.
The buffer size parameter is only used in the case of non built-in ports (e.g. spe-
cial types of ports), and is ignored otherwise. This is the case for audio data ports6 ,
which are one of the standard port types. Once a port is successfully registered, we
obtain a handle to it, which can be used to read or write data to or from it.
6 Defined by the JACK_DEFAULT_AUDIO_TYPE port type.
148 11 Realtime Audio

11.2.3 The Processing Callback

Jack operates asynchronously, which means that we will need to supply a callback
function to the server for reading and/or writing audio data7 . This function has the
following signature:
typedef int (*JackProcessCallback)(jack_nframes_t
nframes, void *arg);
In the processing callback, the number of audio frames and the user data are
passed as arguments. This means that we will need to query the server for the loca-
tions of the input and/or output data. Since these are held by each port defined by
the client, we can use an API function to obtain the buffer pointers:
void* jack_port_get_buffer(jack_port_t *port,
jack_nframes_t nframes)
which returns a pointer to a location that can be written to, or that holds data that we
can read from. In the case of audio IO, the pointers are cast to the Jack floating-point
audio sample type (jack_default_audio_sample_t*), which can then be
used to access each sample in the buffer.
The client-defined JackProcessCallback() is registered with the server
using
int jack_set_process_callback(jack_client_t *client,
JackProcessCallback
process_callback,
void *arg)
which takes in the client handle, the callback name, and the location of the user data
arg to be passed to the callback. If successful, the registering function returns 0.
Once the callback is set, we can start processing audio. For this, we need to activate
the client, which is done through
int jack_activate(jack_client_t *client)
Note that, as in the Portaudio case, we will need to limit the code inside the
callback to non-blocking operations in order to ensure smooth realtime operation.

11.2.4 Connecting Ports

When a client is activated, it can connect to any ports in the server. From the client
program itself, we can name a port to connect to, either for input or for output. The
following function does this:

7 Other callbacks for a variety of operations can also be set; for more details, see http://jackaudio.
org/api/index.html.
11.2 The Jack Connection Kit 149

int jack_connect(jack_client_t *client,


const char *source_port,
const char *destination_port)
where ports are referred to by their full name. This is normally a concatenation of
the client and port names, as in client name:port name. For the ports defined by the
client, we can use the jack_port_name(const jack_port_t * port)
function to get the full name of a port. The physical ports of a server are often
named system:capture N for inputs and system:playback N for output, where N is
the channel number.

11.2.5 Closing a Client

When an application is about to exit, we should deactivate and then close its
client(s). This is done using
int jack_deactivate(jack_client_t *client)
and
int jack_client_close(jack_client_t *client)

11.2.6 Application Example

The following example creates a simple program with one input and one output port,
which applies a gain to the signal. It follows the principles outlined in the previous
sections:

1. A client is opened:
client = jack_client_open("MonoGain",
JackNoStartServer,
NULL);
2. Two ports are registered:
state.inport = jack_port_register(client, "input",
JACK_DEFAULT_AUDIO_TYPE,
JackPortIsInput, 0UL);
state.outport = jack_port_register(client, "output",
JACK_DEFAULT_AUDIO_TYPE,
JackPortIsOutput, 0UL);
3. A callback is set:
150 11 Realtime Audio

jack_set_process_callback(client, jackProcess,
(void*) &state);
4. The client is activated:
jack_activate(client);
5. The ports are connected:
jack_connect(client, "system:capture_1",
jack_port_name(state.inport));
jack_connect(client, jack_port_name(state.outport),
"system:playback_1");

The processing callback needs to access the ports to get the audio buffers, so we
define a data structure to hold them. This also holds the gain value that is supplied
by the user:
typedef struct UDATA {
jack_port_t *inport;
jack_port_t *outport;
float gain;
} udata;
The definition of the callback is fairly straightforward. The buffer pointers are
obtained and a loop is used to apply the gain to the input signal, writing the result to
the output:
static int jackProcess(jack_nframes_t nframes,
void *pp) {
jack_default_audio_sample_t *in, *out;
float gain;
int n;
udata *p = (udata *) pp;
in = jack_port_get_buffer(p->inport,
nframes);
out = jack_port_get_buffer(p->outport,
nframes);
gain = p->gain;

for (n = 0; n < nframes; n++)


out[n] = in[n]*gain;

return 0;
}
While the audio is being processed by the server, we need to keep the program
open. In order to do so, we check the current time and loop until a set duration has
elapsed:
11.2 The Jack Connection Kit 151

now = jack_get_time();
end += now;
while(time < end) {
usleep(500000);
time = jack_get_time();
printf("%.2f \n", (time-now)/1000000.);
}
Time is measured in microseconds (1/1,000,000 sec, as noted earlier). Alterna-
tively, we could have blocked the main program under scanf(), waiting for the
user to close the program by pressing any key. Once the set duration is reached, the
program proceeds to deactivate and close the client. The full code for the Jack gain
program is shown in Listing 11.3. Provided that Jack is installed in the system (e.g.
in /usr/local), we can compile it with the following command line:
cc -o jgain jgain.c -I/usr/local/include \
-L/usr/local/lib -ljack

Listing 11.3: Jack example program.


#include <jack/jack.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define MICROS 1000000

typedef struct UDATA {


jack_port_t *inport;
jack_port_t *outport;
float gain;
} udata;

static int jackProcess(jack_nframes_t nframes,


void *pp) {
jack_default_audio_sample_t *in, *out;
float gain;
int n;
udata *p = (udata *) pp;
in = jack_port_get_buffer(p->inport,
nframes);
out = jack_port_get_buffer(p->outport,
nframes);
gain = p->gain;

for (n = 0; n < nframes; n++)


152 11 Realtime Audio

out[n] = in[n]*gain;

return 0;
}

int main(int argc, const char **argv) {

if (argc < 3) {
printf("jgain gain dur \n");
}
else {
jack_client_t *client;
client =
jack_client_open("MonoGain",
JackNoStartServer, NULL);

if (client != NULL) {
udata state;
unsigned long end, time = 0, now;
state.gain = atof(argv[1]);
end = (unsigned long)
(atof(argv[2])*MICROS);

/* register input port */


state.inport =
jack_port_register(client, "input",
JACK_DEFAULT_AUDIO_TYPE,
JackPortIsInput, 0UL);

if (state.inport == NULL) {
jack_client_close(client);
printf("Could not open input port");
return -1;
}

/* register output port */


state.outport =
jack_port_register(client, "output",
JACK_DEFAULT_AUDIO_TYPE,
JackPortIsOutput, 0UL);

if (state.outport == NULL) {
jack_client_close(client);
printf("Could not open output port");
return -1;
11.2 The Jack Connection Kit 153

/* set process callback */


if(jack_set_process_callback(client,
jackProcess,
(void*) &state)
!= 0) {
jack_client_close(client);
printf("Could not set Jack callback");
return -1;
}

/* activate Jack */
if(jack_activate(client) != 0) {
jack_client_close(client);
printf("Could not start Jack processing");
return -1;
}

/* connect ports to system in and out */


if(jack_connect(client, "system:capture_1",
jack_port_name(state.inport))
!= 0)
printf("could not connect %s automatically "
"to system:capture_1 \n",
jack_port_name(state.inport));

if(jack_connect(client,
jack_port_name(state.outport),
"system:playback_1") != 0)
printf("could not connect %s automatically "
"to system:playback_1 \n",
jack_port_name(state.outport));

/* keep track of time */


now = jack_get_time();
end += now;
while(time < end) {
usleep(MICROS/2);
time = jack_get_time();
printf("%.2f \n", (float)(time-now)/MICROS);
}

/* close client */
jack_deactivate(client);
154 11 Realtime Audio

jack_client_close(client);
printf("closed Jack client \n");
return 0;

} else {
printf("Could not open Jack client\n");
return -1;
}

}
return 0;
}
The program, as indicated by the usage message, takes in the gain to be applied
and a duration, which will determine how long the program is to run for. In order
to execute this program, we also need the Jack server to be running as the program
will not be able to start the server by itself (the JackNoStartServer option has
been used). It is possible however to enable that option to allow programs to get the
server running if they need to, which might be more suitable in other applications.

11.3 Conclusions

This chapter has concentrated on the principles of realtime audio IO. We selected
a cross-platform library, Portaudio, and an audio server, Jack, as our main vehi-
cles for exploring audio processing. These allow programs to be easily ported from
one OS to another. We saw the two main modes of realtime IO operation, syn-
chronous (push) and asynchronous (pull). While the latter allows for more reactive,
low-latency, and realtime safe performance, the former is simpler conceptually, as it
follows similar principles to other types of IO such as file access. We presented three
examples, one demonstrating how we can read an ASCII stream from the standard
input and send it to a DAC, another showing a low-latency audio effect, and a third
demonstrating how to connect to a Jack server. Realtime audio is nicely comple-
mented by interactive controls, and the next chapter will introduce a very important
protocol that can be used to implement them.

Problems

11.1. Write a realtime-output sine wave synthesis program that takes the amplitude
and frequency as parameters, in two versions: synchronous and asynchronous.
11.2. Write a program using libsndfile and Portaudio to play back a soundfile.
11.3. Write a version of the tremolo program to work with the Jack server.
Chapter 12
Realtime MIDI

Abstract The MIDI protocol is presented in this chapter as one of the typical ways
in which realtime audio instruments can be controlled. The native MacOS API
CoreMIDI is introduced as a system-dependent means of accessing MIDI devices.
This is complemented by a discussion of cross-platform support for realtime MIDI,
which is provided by Portmidi or Jack.

MIDI (Musical Instrument Digital Interface) [47] is a long-established communi-


cation protocol. It can be used to control synthesisers and other musical equipment,
as well as a range of software applications. Most OSs provide some form of MIDI
support, some systems provide internal or built-in MIDI devices (either in software
form or as part of the sound hardware). In this chapter, we will study how to program
MIDI in C with the aim of developing realtime interactive applications.

12.1 The Protocol

The MIDI protocol has the following fundamental characteristics:


• It employs one-way transmission, from a MIDI OUT port to a MIDI IN port.
• The MIDI THRU port copies the data from the MIDI IN port.
• It uses 16 channels per port (or device).
• Start and stop bits frame an 8-bit byte of data (3125 10-bit bytes can be delivered
per second over a physical MIDI connection).
• It supports four channel modes:
1. Mode I : omni on/poly (omni mode): responds to any channel, polyphonically
2. Mode II: omni on/mono (mono mode): responds to any channel, monophoni-
cally.
3. Mode III: omni off/poly (multi mode): responds to specific channels, poly-
phonically.

© Springer Nature Switzerland AG 2019 155


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_12
156 12 Realtime MIDI

4. Mode IV: omni off/mono (mono mode): responds to specific channels, mono-
phonically.

12.1.1 Hexadecimal Notation Revisited

MIDI programs will often make use of hexadecimal constants, which we have al-
ready noted earlier in this book. Hexadecimal numbers are useful because each of
them has a 4-bit range (0–15, 16 states). They are notated 0–9 A–F, as shown in Ta-
ble 12.1. A byte can be written very compactly as two hexadecimal numbers. Some
examples are presented in Table 12.2.

Table 12.1: Hexadecimal numbers.

BASE 10 BASE 16
0–9 0–9
10 A
11 B
12 C
13 D
14 E
15 F

Table 12.2: Bytes in base-2, 16 and 10.

base 2 base 16 base 10


0000 0000 0x00 0
1111 1111 0xFF 255
0000 1111 0x0F 15
0001 0000 0x10 16
0111 1111 0x7F 127

12.1.2 MIDI Messages

The following is an outline of the main message types defined by the protocol. We
will be mostly interested in the channel messages, which are those that can be used
to control the realtime operation of an application.
12.1 The Protocol 157

1. Channel Messages (Fig. 12.1):


midi message: status byte + message byte1 + message byte21

• status byte: message type (4 bits) + channel (4 bits), always starting with a
set bit (1xxx xxxx):
– message type:
0x80 (NOTEOFF) – sent to signal a key up2
0x90 (NOTEON) – sent to signal a key down.
0xA0 (AFTERTOUCH) – encodes key pressure (monophonic).
0xB0 (CONTROL CHANGE) – sent by continuous controllers3 .
0xC0 (PROGRAM CHANGE) – sent to request a preset change.
0xD0 (POLY AFTERTOUCH) – key pressure (polyphonic).
0xE0 (PITCHBEND CHANGE) – sent by a pitchbend wheel.
– channel: from 0x00 (1) to 0x0F (16)

• message byte1, message byte2: these depend on message type and always
start with a 0 bit (0xxx xxxx). The range of each byte is limited to 0-127
(7 bits). Table 12.3 shows the parameter details for each message type.

status data 1 data 2

Fig. 12.1: MIDI channel message.

2. Global messages:
System exclusive messages: status byte + manufacturer’s ID + data
System-realtime messages.
System-common messages.

1 In C, we can use the unsigned char type to represent a MIDI byte.


2 Also NOTEON with data byte 2 (velocity) = 0.
3 Standard continuous controller numbers (data byte 1): 1 = modulation wheel; 2 = breath con-

troller; 4 = adjustable foot-pedal; 5 = portamento time; 7 = volume; 8 = balance; 10 = pan; 11 =


expression; and 121 – 127, channel-mode messages: reset, local control, all notes off, omni on,
omni off, mono on, and poly on, respectively).
158 12 Realtime MIDI

Table 12.3: Channel message types.


status byte data byte 1 data byte 2
NOTE ON note number key velocity
NOTE OFF note number key velocity
AFTERTOUCH amount –
POLYTOUCH note number amount
PITCHBEND amount (coarse) amount (fine)
PROGRAM CHANGE number –
CONTROL CHANGE number amount

12.1.3 Packing and Unpacking the Status Byte

To get the channel number or the midi message type from a midi status byte, we
use a bitmask with a bitwise logic AND (&) operator. The bitmask for extracting the
channel is 0x0F (or 0000 1111). The logic operation is
status_byte & 0x0F;
For example,
0000 1111 (mask)
& 1000 0001 (NOTEON, channel 2, 0x91)
-------------------
0000 0001 (channel 2, 0x01)
To combine a channel number and a message type to make up a MIDI status byte,
use a bit-wise OR (|) operator to combine the two numbers. For example, with a
message type, say NOTEON (0x90), and the channel number, say channel 9 (0x08),
we have
message_type | channel;

1001 0000
| 0000 1000
-----------------
1001 1000 (0x98)

12.2 MIDI Programming Basics

As in the case of realtime audio, MIDI programming in C is also platform-dependent.


Each system will have its own hardware and software interfaces, which will in gen-
eral be different and incompatible with each other. The MIDI messages and the
protocol of communication, of course, will be the same, but the means of sending
and receiving data will depend on the OS.
12.2 MIDI Programming Basics 159

A system that supports MIDI programming will include libraries (compiled bi-
nary code) and an exposed API to access these. Libraries are there to provide access
to and communication with hardware. As we have seen in the previous chapter, the
API is the public face of these libraries: the functions, data structures, etc that are
offered for applications that use MIDI. Often system-provided APIs are quite low-
level, i.e. they provide fine-grained functionality, which sometimes makes their use
more involved (i.e. more lines of code to achieve a particular effect). They will also
provide services to cover all aspects of MIDI use, often offering more than we need.
In modern operating systems, examples of such APIs are found in ALSA (Linux)
and CoreMIDI (MacOS). While it is sometimes advantageous or necessary to write
applications using system APIs, in most cases, it is probably best to use a higher
level API, such as Portmidi, which will have the characteristic of being cross-
platform. The portability of the code, plus the advantage of having to learn and
deal with only one API is a great incentive for this. However, it is useful to look a
little closer into a system API to understand a bit more about MIDI programming.

12.2.1 MIDI on MacOS

As an example of a system API, we will look at developing a program that outputs


MIDI using the CoreMIDI framework.

Frameworks

First, a note about terminology: on MacOS, system libraries and APIs are called
frameworks. These are present in the OS as directories containing the given (dynamic-
link) library, header files and other resources. The name of these directories are
given the extension .framework, which identifies them as such. Special MacOS-
specific compiler flags are used to link to them. For MIDI, MacOS offers the
CoreMIDI framework (located in /System/Library/Frameworks). Other
frameworks that will be used in MIDI programming are CoreAudio (for timing func-
tions) and CoreFoundation (for text strings). Header files should be in the format:
#include <framework/header.h>
For CoreMIDI, we have:
#include <CoreMidi/CoreMidi.h>
To link to the framework, we use -framework framework name as in
$ cc ... -framework CoreMidi
160 12 Realtime MIDI

The CoreMIDI API

CoreMIDI treats MIDI streams as separate destinations (for output) and sources (for
input). Sources and destinations are offered by the various physical MIDI devices
that a system can have. Each of these can have one or more streams. The full hier-
archy in CoreMIDI, defined in CoreMidi/CoreMidi.h, is shown in Fig. 12.2.

device (physical) - entity (one or more) - destination/source (one or more)

Fig. 12.2: CoreMIDI hierarchy.

Usually, the first thing we should attempt to do when learning about a MIDI API
is to find a way of searching the system for MIDI devices (or here, destinations
and sources). With CoreMIDI, as you would expect, it is possible to query a system
about its devices, the entities in each of these, and the destinations and sources in
each entity, which seems a little unwieldy.
Thankfully, there is also a means of just checking for all destinations and all
sources in a system, directly. Sources and destinations can also be virtual, i.e. created
by applications, and CoreMIDI provides means of creating these. Since these would
not be linked to any device, they would only appear on direct lists of sources/desti-
nations (another reason for using this method of querying).
In order to access a destination or source for IO, we need to create a MIDI client
for our application. This handles general aspects of communication with devices that
generally span the application lifetime. We will then create a port to process either
input or output IO for this client. In our example here, we will create an output port.
With this, we can then package and send MIDI messages to a destination. Note that
the port is also application-wide and we can use it to send MIDI data to separate
destinations. MIDI clients should be disposed of (i.e. closed) when we are finished
with them, whereas ports do not need to be explicitly closed.
In CoreMIDI, messages are packaged in MIDI packet lists. A MIDI packet con-
tains the given MIDI bytes plus a timestamp value that will indicate when the MIDI
message should be sent out. The timestamp unit is the host time, which can be ob-
tained from the time in nanoseconds4 using a utility function (in the CoreAudio
framework, CoreAudio/HostTime.g). We can also query the current host time
to synchronise messages correctly. A timestamp of 0 indicates send message imme-
diately. Time is kept in Uint64 (unsigned 64-bit integer) types. For instance, to
convert a time in milliseconds5 to a timestamp, we have (NANOS is 1000000)
now = AudioGetCurrentHostTime();
4 1/1,000,000 millisec.
5 1/1000 sec.
12.2 MIDI Programming Basics 161

timestamp = now+AudioConvertNanosToHostTime(NANOS*msec);
Packet lists can be built using functions provided by CoreMIDI. The steps are
1. Initialise the list:
cur = MIDIPacketListInit(mlist);
2. Add a packet with a message:
MIDIPacketListAdd(mlist,sizeof(buffer),
cur,timestamp,3,mess);

Once a packet list has been built, it can then be sent to a destination:
endpoint = MIDIGetDestination(dest);
MIDISend(mport, endpoint, mlist);
If a new set of MIDI messages is to be sent, we need to build a new packet list,
repeating the steps above, before sending it to the output. It is important that the
port and client are still open/active up until the last MIDI message timestamp, oth-
erwise some messages might not be sent. As a precaution, we can send NOTEOFF
messages for each note, with timestamp of 0, before closing the client, to stop any
hanging notes.
Finally a word about some of the types used by CoreMIDI functions. Strings are
expected to be placed in CFString objects (CoreFoundation.h), and there
are functions to convert to and from C strings (null-terminated character arrays).
MIDI messages are placed in unsigned char arrays, which in MacOS are de-
fined as Byte. These and other types used (such as those for clients, ports, packets
etc.) are fully discussed in the CoreMIDI reference documentation; please refer to
it for more details.

Example

The example in Listing 12.1 shows a simple program that demonstrates MIDI output
using CoreMIDI. The program plays a chromatic scale starting from middle C (note
number 60). It can be built with the following compiler options:
cc -o cmidiout cmidiout.c \
-framework CoreMidi -framework CoreFoundation \
-framework CoreAudio

Listing 12.1: CoreMIDI example.


#include <stdio.h>
#include <CoreMidi/CoreMidi.h>
#include <CoreAudio/HostTime.h>
#include <CoreFoundation/CoreFoundation.h>
#include <time.h>
162 12 Realtime MIDI

#define NANOS 1000000


#define MD_NOTEON 0x90
#define MD_NOTEOFF 0x80

int main(){
int k, endpoints, dest;
CFStringRef name = NULL, cname = NULL, pname = NULL;
CFStringEncoding defaultEncoding =
CFStringGetSystemEncoding();
MIDIClientRef mclient =
(MIDIClientRef) NULL; /* client object */
MIDIPortRef mport =
(MIDIPortRef) NULL; /* port object */
MIDIEndpointRef endpoint;
Byte buffer[1024];
MIDIPacketList *mlist = (MIDIPacketList *) buffer;
Byte mess[3];
MIDIPacket *cur = MIDIPacketListInit(mlist);
UInt64 timestamp, now, dur;
OSStatus ret;

/* MIDI client */
cname = CFStringCreateWithCString(NULL, "my client",
defaultEncoding);
ret = MIDIClientCreate(cname, NULL, NULL, &mclient);
if(!ret){
/* MIDI output port */
pname = CFStringCreateWithCString(NULL, "outport",
defaultEncoding);
ret = MIDIOutputPortCreate(mclient, pname, &mport);
if(!ret){
/* list destinations */
endpoints = MIDIGetNumberOfDestinations();
for(k=0; k < endpoints; k++){
endpoint = MIDIGetDestination(k);
MIDIObjectGetStringProperty(endpoint,
kMIDIPropertyName, &name);
printf("destination %d = %s\n", k,
CFStringGetCStringPtr(name, defaultEncoding));
}
/* select destination */
dest = 0;
printf("select destination number: ");
scanf("%d", &dest);
12.3 MIDI Programming with Portmidi 163

dur = 1000; /* 1000 ms */


/* fill MIDI packet list */
for(k=0; k < 12; k++){
mess[0] = MD_NOTEON;
mess[1] = 60+k;
mess[2] = 40;
now = AudioGetCurrentHostTime();
timestamp = now +
AudioConvertNanosToHostTime(NANOS*k*dur);
cur = MIDIPacketListAdd(mlist, sizeof(buffer),
cur, timestamp, 3, mess);
mess[0] = MD_NOTEOFF;
mess[1] = 60+k;
mess[2] = 40;
timestamp = now +
AudioConvertNanosToHostTime(NANOS*(k+1)*dur*2);
cur = MIDIPacketListAdd(mlist, sizeof(buffer),
cur, timestamp, 3, mess);
}
/* send messages */
endpoint = MIDIGetDestination(dest);
MIDISend(mport, endpoint, mlist);
/* wait for messages to play */
sleep(1+((k+1)*dur*2)/1000);
}
/* close MIDI client */
MIDIClientDispose(mclient);
if(name) CFRelease(name);
if(pname) CFRelease(pname);
if(cname) CFRelease(cname);
}

return 0;
}

12.3 MIDI Programming with Portmidi

While CoreMIDI provides a very complete API for MIDI programming, programs
using it will not be portable to other systems. For this reason, using a cross-platform
library that is placed at a slightly higher level might be more useful in certain sit-
uations. One such library is Portmidi [12], a MIDI counterpart to Portaudio, which
164 12 Realtime MIDI

provides a common interface to the different platform-dependent MIDI implemen-


tations.
A Portmidi program requires the following headers:
#include <portmidi.h>
#include <porttime.h>
Before Portmidi is used, we need to call Pm_Initialize() to initialise the
library. As part of this process, library code will query the system for existing logical
devices. These can then be searched for and listed. The total number of devices can
be found with Pm_CountDevices(). For each device registered with the library,
we can get its details, stored in a PmDeviceInfo data structure:
typedef struct {
int structVersion;
const char *interf; /*underlying API */
const char *name; /* device name */
int input; /* 1 if input */
int output; /* 1 if output */
int opened;
} PmDeviceInfo;
Using Pm_GetDeviceInfo() we can obtain the details of each MIDI device
in the system. The complete code for listing output devices is
int cnt, i;
const PmDeviceInfo *info;
if((cnt = Pm_CountDevices()) != 0){
for(i=0; i < cnt; i++){
info = Pm_GetDeviceInfo(i);
if(info->output)
printf("%d: %s \n", i, info->name);
}
} else printf("no device found\n");
which will print the name of all available devices to the terminal, allowing users to
choose one.

12.3.1 Timers

In order to guarantee the correct timing of MIDI messages, we will need to find
a means of keeping track of time. The Porttime library, which accompanies Port-
midi, offers a timer that can be used for that purpose. The timer is started using the
following code, which should be called before attempting to open a device:
Pt_Start(1, NULL, NULL);
12.3 MIDI Programming with Portmidi 165

Applications can choose to use their own timebase function. If so, this should be
passed to the library when a device is being opened, as a callback.

12.3.2 Opening Devices

As we have seen above, devices are identified using a numeric index. Similarly to
the process we have seen before in Chapter 11, a pointer to an opaque handle is
passed to the Pm_OpenOutput() function, which returns an error code that can
be used to check for success. The prototype for this function is
PmError Pm_OpenOutput( PortMidiStream** stream,
PmDeviceID outputDevice,
void *outputDriverInfo,
int32_t bufferSize,
PmTimeProcPtr time_proc,
void *time_info,
int32_t latency);
Note that the library offers distinct functions for each direction (input or output),
so a given stream can only be opened in one of them. The outputDriverInfo
is normally NULL, and the buffer size determines the amount of buffering used
for MIDI message output. Depending on the platform, Portmidi may not employ
a buffer, and may simply pass the data directly to the lower-level MIDI system
library. If the timing is not to be obtained from Porttime, then a timing callback can
be passed (as time_proc, with an associated data space time_info); otherwise
we just pass null pointers to both parameters. The latency field is used to add an
extra time offset to the output messages (in milliseconds), and is normally 0. As an
example, the following code opens an output device:
int dev;
PmError retval;
PortMidiStream *mstream;
retval = Pm_OpenOutput(&mstream, dev,
NULL,512,NULL,NULL,0);
if(retval != pmNoError)
printf("error: %s \n", Pm_GetErrorText(retval));
When Pm_OpenOutput() returns successfully, the handle to the MIDI output
stream is ready to be used.
166 12 Realtime MIDI

12.3.3 Output

To output a MIDI channel message, we can use the function Pm_WriteShort(),


which is designed for non-system-exclusive output, and thus is suited to our pur-
poses in this chapter. Its prototype is
PmError Pm_WriteShort(PortMidiStream *stream,
PmTimestamp when,
int32_t msg);
Taking an open stream, it outputs a MIDI message encoded as an integer. The
timestamp parameter is only used if we have defined a latency above 0 when opening
the device. Otherwise, messages are sent immediately. If timestamps are used, they
should be non-decreasing (i.e. the message sequence should be sorted in time before
they are passed to successive function calls). The encoding of channel messages is
assisted by the macro Pm_Message(), defined in portmidi.h as
#define Pm_Message(status, data1, data2) \
((((data2) << 16) & 0xFF0000) | \
(((data1) << 8) & 0xFF00) | \
((status) & 0xFF))
with which we can pack the status and data bytes of a message into an integer vari-
able. In addition to this, we can define another macro ourselves to pack a message
type and a channel into a status byte:
#define SBYTE(msg,chn) msg | chn
To send messages at the correct time, we can call the Pt_Time() function to
get the current device time and decide whether we need to output a message at that
time. For instance to send messages to play a note for 1 second, we can do the
following:
time = Pt_Time(NULL);
Pm_WriteShort(mstream, 0,
Pm_Message(SBYTE(MD_NOTEON,chan), note, vel));
while(Pt_Time(NULL) - time < 1000) usleep(100);
Pm_WriteShort(mstream, 0,
Pm_Message(SBYTE(MD_NOTEOFF,chan), note, vel));
In this particular example, all we do is wait until the time is right to send the
NOTEOFF message. In other applications, a more sophisticated time management
approach might be needed.
To close a MIDI output stream and finish using Portmidi, we can use the follow-
ing functions:
PmError Pm_Close(PortMidiStream* stream);
PmError Pm_Terminate();
12.3 MIDI Programming with Portmidi 167

Example

An example program is shown in Listing 12.2. It follows more or less the same lines
as the MIDI generator in Sect. 12.2.1, but also includes a program change message
that is sent before each NOTEON, selecting a different sound for each step of the
scale.

Listing 12.2: Portmidi output example.


#include <stdio.h>
#include <unistd.h>
#include <portmidi.h>
#include <porttime.h>

#define MD_NOTEON 0x90


#define MD_NOTEOFF 0x80
#define MD_PRG 0xC0
#define SBYTE(mess,chan) mess | chan

int main() {
int cnt,i,dev;
PmError retval;
const PmDeviceInfo *info;
PortMidiStream *mstream;
Pm_Initialize();

if(cnt = Pm_CountDevices()){
for(i=0; i < cnt; i++){
info = Pm_GetDeviceInfo(i);
if(info->output)
printf("%d: %s \n", i, info->name);
}
printf("choose device: ");
scanf("%d", &dev);
Pt_Start(1, NULL, NULL);
retval = Pm_OpenOutput(&mstream, dev,
NULL,512,NULL,NULL,0);

if(retval != pmNoError)
printf("error: %s \n", Pm_GetErrorText(retval));
else {
char chan = 0;
int prg = 0;
long time = 0;
for(i=60; i < 72; prg+=4, i++){
Pm_WriteShort(mstream, 0,
168 12 Realtime MIDI

Pm_Message(SBYTE(MD_PRG,chan), prg, 0));


time = Pt_Time(NULL);
Pm_WriteShort(mstream, 0,
Pm_Message(SBYTE(MD_NOTEON,chan), i, 120));
while(Pt_Time(NULL) - time < 1000) usleep(100);
Pm_WriteShort(mstream, 0,
Pm_Message(SBYTE(MD_NOTEOFF,chan), i, 120));
}
}
Pm_Close(mstream);
} else printf("No available output devices\n");

Pm_Terminate();
return 0;
}
Assuming that Portmidi is installed in /usr/local, we can use the following
command line to build this example:
$ cc -o midiout midiout.c -I/usr/local/include \
-L/usr/local/lib -lportmidi

12.3.4 Input

Most of the steps used in MIDI output can be retraced and modified for input.
Searching for devices is just a matter of checking the isInput member of the
device info structure. Opening the device uses Pm_OpenInput() instead of
Pm_OpenOutput, with similar parameters:
retval = Pm_OpenInput(&mstream,dev,NULL,512,NULL,NULL);

Polling for data

The main difference between input and output in terms of programming is that we
will need to be listening to the device for incoming messages. These are going to
be intermittent and asynchronous. So we need a method to do this in a clean and
efficient way. Portmidi implements polling, that is, querying the device for new
data, which tells the program whether it needs to go and read it. The function
Pm_Poll() returns true if there is data to be read, and false otherwise. We can
check it regularly and proceed to call Pm_Read() if we need to:
int Pm_Read(PortMidiStream *stream,
PmEvent *buffer,
int32_t length);
12.3 MIDI Programming with Portmidi 169

This function takes a stream and reads the incoming data into a buffer, which is
an array of length items. Each one of these is a Pm_Event:
typedef long PmTimestamp;
typedef long PmMessage;
typedef struct {
PmMessage message;
PmTimestamp timestamp;
} PmEvent;
The timestamp member will provide a non-decreasing value that can be used to
determine the sequence of events. Each message is defined, as before, as a single
item, and we can use the following macros to extract the individual MIDI status and
data bytes from it:
#define Pm_MessageStatus(msg) ((msg) & 0xFF)
#define Pm_MessageData1(msg) (((msg) >> 8) & 0xFF)
#define Pm_MessageData2(msg) (((msg) >> 16) & 0xFF)
The incoming data is copied into a user-supplied buffer. The number of messages
received is returned by Pm_Read() and can be used by the program to loop over
the array data to retrieve each individual item.

12.3.5 A MIDI Synthesiser

As an example of realtime interaction, we present here a very simple MIDI-


controlled synthesiser, which will respond to incoming NOTE messages and play
a sine wave monophonically. Note that, since it has only the bare minimum com-
ponents to make sound, it will not have any means of shaping the amplitude of the
sound over time (envelopes), or responding to pitch bend controls. However, it will
be simple enough to allow us to understand the principles developed in this chapter.
We will use both the Portmidi and the Portaudio libraries to implement MIDI and
audio IO.
The design of this program is as follows.
• The program is launched by the shell and will be kept running for 60 seconds
(once it starts listening to MIDI). The user can optionally pass a parameter to
keep the program open for a set duration in seconds.
• The user is asked to select a MIDI device from a list.
• The program uses the default output audio device.
• Callback audio is used to allow low-latency operation.
• A listening loop will keep the program open, polling for MIDI input:
– If MIDI data is received, its status byte is checked.
– NOTEON and NOTEOFF messages will be responded to by the program,
setting the amplitude and frequency of a sine wave generator (running in the
audio callback).
170 12 Realtime MIDI

In the main program, instead of solely counting out time (as in the audio ef-
fect example in Chapter 11), we will be listening for MIDI. When a message (or
messages) comes in, we will respond to it if it matches what we are looking for.
A NOTEON message supplies the current note number and velocity. A NOTEOFF
message sets amplitude to zero if it also matches the current note (to turn it off).
Because some devices send NOTEON with velocity (data byte 2) = 0 instead of
NOTEOFF, we need to check for that as well:
if(Pm_Poll(mstream)) {
unsigned char data1, data2, status;
cnt = Pm_Read(mstream, msg, 32);
for(i=0; i<cnt; i++) {
status = Pm_MessageStatus(msg[i].message);
data1 = Pm_MessageData1(msg[i].message);
data2 = Pm_MessageData2(msg[i].message);
if((status & TYPEMASK) == MD_NOTEON) {
udata.note = note = data1;
udata.vel = data2;
} else if(((status & TYPEMASK) == MD_NOTEOFF
|| ((status & TYPEMASK) == MD_NOTEON
&& data2 == 0 )) && note == data1) {
udata.note = note = data1;
udata.vel = data2;
}
}
}
Note and velocity data is shared between the main program and the callback,
which run in parallel. This is done through the user data structure that is passed to
the callback. Generally speaking, we have to be careful when data is shared by two
parts of a program that run in parallel, making sure that it is not accessed at the
same time or out of order. This is particular important if both parallel threads are
reading and writing to a memory location. It is less problematic if one of them is
only writing while the other is only reading. If the memory location is multi-byte
(anything bigger than a char), we might still have collision issues, but if it is a
single byte, it is likely that we can allow write-only, read-only parallel access with-
out having to protect the variable. However, in general if we need to have exclusive
access to an object, atomic operations are one of the basic mechanisms to do this,
as demonstrated in Sect. 12.4.1.
Thus, in the synthesis code implemented in the callback function, we have code
to access the data bytes (note, velocity) and convert them to amplitude and fre-
quency. The latter is converted using 12-tone equal temperament with A4 set to 440
Hz. Since A4 is MIDI note number 69, the expression to do this is, for a given note
number N,
N−69
f = 440 × 2 12 (12.1)
12.3 MIDI Programming with Portmidi 171

while the amplitude is just normalised to the range [0, 1.0]. The code for these oper-
ations and the sine wave synthesis is then
fr = 440.*pow(2., (p->note - 69.)/12);
amp = p->vel/128.;
for(i=0; i < frameCount; i++, n++)
outp[i] = amp*sin(n*TWOPI*fr/sr);
As we have mentioned above, this is a very simple and rough implementation
of synthesis. On NOTEON, sound will start immediately, and on NOTEOFF, it will
stop dead. If a NOTEON is followed by another NOTEON, the pitch will jump to
the next value, with no gliding or smoothing. All of these transitions will cause
clicks in the output waveform, which in more advanced examples we will be able to
avoid. The full code for this example is shown in Listing 12.3. Again, assuming that
Portmidi and Portaudio are installed in /usr/local, we can use the following
command line to build the program:
$ cc -o midisynth midisynth.c -I/usr/local/include \
-L/usr/local/lib -lportmidi -lportaudio

Listing 12.3: MIDI synthesiser example.


#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <portmidi.h>
#include <porttime.h>
#include <portaudio.h>

#define TYPEMASK 0xF0


#define MD_NOTEON 0x90
#define MD_NOTEOFF 0x80

typedef struct udata {


unsigned char vel;
unsigned char note;
float sr;
unsigned long n;
} UDATA;

int audio_fn(const void *input, void *output,


unsigned long frameCount,
const PaStreamCallbackTimeInfo *timeInfo,
PaStreamCallbackFlags statusFlags, void *userData);

int main(int argc, const char *argv[]) {


172 12 Realtime MIDI

int cnt,i,dev;
PmError retval;
const PmDeviceInfo *info;
PmEvent msg[32];
PortMidiStream *mstream;

PaError err;
PaStreamParameters param;
PaStream *handle;
int bufsize = 128, sr = 44100;
UDATA udata;
unsigned long end =
(argc > 1 ? atof(argv[1]) : 60)*1000;
unsigned char note = 0;

Pa_Initialize();
Pm_Initialize();

dev = Pa_GetDefaultOutputDevice();
param.device = (PaDeviceIndex) dev;
param.channelCount = 1;
param.sampleFormat = paFloat32;
param.suggestedLatency = (PaTime)
(bufsize/(double)sr);
param.hostApiSpecificStreamInfo = NULL;

udata.sr = sr;
udata.n = 0;
udata.note = 0;
udata.vel = 0;

cnt = Pm_CountDevices();
if(cnt == 0) {
printf("No available MIDI devices\n");
return 1;
}

for(i=0; i < cnt; i++){


info = Pm_GetDeviceInfo(i);
if(info->input)
printf("%d: %s \n", i, info->name);
}
printf("choose device: ");
scanf("%d", &dev);
12.3 MIDI Programming with Portmidi 173

err = Pa_OpenStream(&handle,NULL,&param,
sr,bufsize,paNoFlag, audio_fn, &udata);

if(err != paNoError) {
printf("Error opening audio output\n");
Pa_Terminate();
Pm_Terminate();
return 1;
}
Pt_Start(1, NULL, NULL);
retval = Pm_OpenInput(&mstream, dev, NULL, 512,
NULL, NULL);

if(retval != pmNoError) {
printf("error: %s \n", Pm_GetErrorText(retval));
Pa_CloseStream(handle);
Pa_Terminate();
Pm_Terminate();
return 1;
}

Pa_StartStream(handle);
while(Pt_Time(NULL) < end){
if(Pm_Poll(mstream)) {
unsigned char data1, data2, status;
cnt = Pm_Read(mstream, msg, 32);
for(i=0; i<cnt; i++) {
status = Pm_MessageStatus(msg[i].message);
data1 = Pm_MessageData1(msg[i].message);
data2 = Pm_MessageData2(msg[i].message);
if((status & TYPEMASK) == MD_NOTEON) {
udata.note = note = data1;
udata.vel = data2;
} else if(((status & TYPEMASK) == MD_NOTEOFF
|| ((status & TYPEMASK) == MD_NOTEON
&& data2 == 0 )) && note == data1) {
udata.note = note = data1;
udata.vel = 0;
}
}
}
}

Pa_StopStream(handle);
Pm_Close(mstream);
174 12 Realtime MIDI

Pa_CloseStream(handle);
Pa_Terminate();
Pm_Terminate();
return 0;
}

#define TWOPI 6.283185307179586


int audio_fn(const void *input, void *output,
unsigned long frameCount,
const PaStreamCallbackTimeInfo *timeInfo,
PaStreamCallbackFlags statusFlags, void *userData){
int i;
UDATA *p = (UDATA *) userData;
float *inp = (float *) input,
*outp = (float *) output;
float fr, amp, sr = p->sr;
unsigned long n = p->n;
fr = 440.*pow(2., (p->note - 69.)/12);
amp = p->vel/128.;
for(i=0; i < frameCount; i++, n++)
outp[i] = amp*sin(n*TWOPI*fr/sr);
p->n = n;
return paContinue;
}

12.4 MIDI on Jack

As introduced in Sect. 11.2, the Jack Connection Kit is an API and a media server
that can be used to connect applications to physical and software endpoints. As with
audio, the Jack server provides a space where applications can open clients, which
are made available for IO to/from other clients. The API for MIDI is very similar to
what has already been explored in the realtime audio case. Similar steps need to be
performed, namely:

1. Opening a client (jack_client_open().


2. Registering ports (jack_port_register()), using a port with its type set
to
JACK_DEFAULT_MIDI_TYPE.
3. Setting a callback (jack_set_process_callback()).
4. Activating the client (jack_activate()).
5. Optionally, connecting to other clients (jack_connect()).
6. When done, the client should be deactivated and closed (jack_deactivate()
and jack_client_close()).
12.4 MIDI on Jack 175

The main difference is that MIDI data has a different format from audio, and will
be accessed in the callback using a different means, although we will still look to get
the data from a port, as before. A MIDI event is encapsulated by the data structure
jack_midi_event_t, which has the following members:

• jack_nframes_t time: time reference for the MIDI event, in frames.


• size_t size: buffer size.
• jack_midi_data_t *buffer: MIDI message data bytes.

The size of the MIDI event will be determined by its message type. For channel
messages, it will be either two or three bytes. The first item in the buffer will be
the status byte, followed by one or two data bytes. To obtain an event from an input
port, we use:
int jack_midi_event_get(jack_midi_event_t *event,
void *port_buffer,
uint32_t event_index)
This retrieves an event, from a port_buffer, indexed by event_index.
When the process callback is called, there may be one or more events in the port
buffer. By incrementing the index, starting from 0, we can retrieve all events, one by
one; the function will return 0 on success. When there are no events left, ENODATA
is returned.
MIDI data is sent as individual bytes (jack_midi_data_t) to the output port.
This is done through
int
jack_midi_event_write(void *port_buffer,
jack_nframes_t time,
const jack_midi_data_t *data,
size_t data_size)
MIDI messages of data_size length should be written as a complete event
(e.g. status bytes followed by one or two message bytes), and can be sorted by a
time offset in frames. However, if offsets are given, messages need to be written
in ascending time order, as Jack will not sort them, and will not store out-of-order
events.
If a program is processing audio and MIDI at the same time (as in the MIDI
synth example in Sect.12.3.5), then it makes sense to pick up the MIDI input data
in the same processing callback as that used for the audio data. This will be an
optimal arrangement, which will not require any control data sharing between the
main program and the callback. Moreover, since the MIDI data coming in may also
have a time offset which will align the message to a specific sample in the audio
buffer (something that we did not provide for in the Portaudio/Portmidi example).
176 12 Realtime MIDI

12.4.1 Example

As we have seen in Chapter 11, a characteristic of Jack operation is that its pro-
cessing is asynchronous. Therefore, alongside the main program, we will have a
processing thread managed by the server that runs parallel to it. Because of this, if
we want to access the data that is sent to the client in our main program, we will
need to proceed carefully. In particular, we will want to avoid problems with access
from two separate threads to the same memory location. Equally important is to
ensure that any communication does not block the callback and that realtime-safe
operation is therefore ensured.
In the example in Sect.12.3.5, since we were sharing single bytes of data, where
one thread was writing and another one reading, we were able to implement a sim-
plistic approach. Although we could potentially have mismatching data bytes being
read in the callback thread, this is probably very unlikely.
At this point, however, it is worth introducing a more robust approach to dealing
with data being shared between two threads. The idea is still that the callback can
place MIDI data in memory and the main program can read it from that location,
but we will try to synchronise access to avoid concurrency issues known as data
races. For this, we will employ a circular buffer (or queue). This is a data structure
made up of an array which is written to and read from in a circular fashion, using a
first-in first-out (FIFO) access sequence. With a block of memory to write and read
data to/from, we will need for a single-writer, single-reader queue three counting
variables:

1. A writer position tracker.


2. A reader position tracker.
3. The number of items waiting in the queue.

The position trackers will be incremented modulo queue size, to implement cir-
cular access, so that when they reach the end of the array, their position is reset to
the start. The number of items will be incremented on the writer side and decre-
mented on the reader side, and will account for the items written but not read (Fig.
12.3). Since these two operations are not synchronised, we will need to use atomic
operations to ensure that the order of operations is strictly enforced. Atomic access
guarantees that only one side can modify the variable at one time, whereas ordinary
access cannot ensure this. So, if we are incrementing and decrementing a variable,
there is a possibility that the two operations may be attempted concomitantly, which
may lead to undefined results (due to a data race).
The C11 standard [24] defines the type qualifier _Atomic, which marks a vari-
able as having atomic access. Such a variable can then be used with the various
atomic functions provided by the header stdatomic.h. We will use three of
these:
unsigned int atomic_load(_Atomic unsigned int *obj)
unsigned int
atomic_fetch_add(_Atomic unsigned int *obj, int op)
12.4 MIDI on Jack 177

rp wp
? ?
-  items -

Fig. 12.3: Circular buffer.

unsigned int
atomic_fetch_sub(_Atomic unsigned int *obj, int op)
The first of these reads from the atomic variable, and the other two increment
and decrement its value, respectively. They will guarantee that the variable is only
accessed in the respective thread where they are called at any given time. Alongside
the item count, we will be able to increment, independently, the writer and reader
positions. The latter is only incremented if there are any items to be read, and the
former will also only be incremented if there is space available in the buffer. If there
is not, the data is discarded. In situations where there is no realtime pressure, we can
block until there is space; in this case, however, nothing should block the processing
callback, and so the function just carries on without writing to the buffer.
The following excerpt from the process callback demonstrates this. The variable
wp tracks the writing position, and items is a pointer to the atomic item counter.
Note that if the buffer is full, we just drop the data, but do not block the operation,
to ensure realtime safety:
while(jack_midi_event_get(&event,
jack_port_get_buffer(in,nframes),
i++) == 0) {
/* echo input */
jack_midi_event_write(
jack_port_get_buffer(out,nframes),
event.time, event.buffer, event.size);
/* check for overflow */
if(atomic_load(items) <
JACK_MIDI_BUFFSIZE) {
buf[wp] = event;
wp = wp + 1 != JACK_MIDI_BUFFSIZE ?
wp + 1 : 0;
atomic_fetch_add(items, 1);
}
}
Likewise, the reading side in the main program implements a loop that checks
whether any items are waiting in the buffer, reads them, increments the reader posi-
tion rp and decrements the atomic variable state.items:
178 12 Realtime MIDI

while(atomic_load(&state.items)) {
int size = state.buf[rp].size;
int offs = state.buf[rp].time;
jack_midi_data_t *mdata =
state.buf[rp].buffer;
...
rp = rp + 1 != JACK_MIDI_BUFFSIZE ?
rp + 1 : 0;
atomic_fetch_sub(&state.items, 1);
}
This simple example prints the MIDI data to the terminal, and copies its input
into the output. It runs for a set duration given in the command line. To compile it,
we need the presence of the Jack library and headers:
cc -o jmidi jmidi.c -I/usr/local/include \
-L/usr/local/lib -ljack
The complete source code for this example is shown in Listing 12.4.

Listing 12.4: Jack MIDI example.


#include <jack/jack.h>
#include <jack/midiport.h>
#include <stdio.h>
#include <stdatomic.h>
#include <unistd.h>

#define JACK_MIDI_BUFFSIZE 1024


#define MICROS 1000000

typedef struct UDATA {


jack_port_t *inport;
jack_port_t *outport;
jack_midi_event_t buf[JACK_MIDI_BUFFSIZE];
_Atomic unsigned int items;
unsigned int wp;
} udata;

static int jackProcess(jack_nframes_t nframes,


void *pp) {
udata *p = (udata *) pp;
jack_midi_event_t event;
jack_midi_event_t *buf = p->buf;
int wp, i = 0;
jack_port_t *in = p->inport;
jack_port_t *out = p->outport;
12.4 MIDI on Jack 179

_Atomic unsigned int *items = &p->items;

wp = p->wp;
while(jack_midi_event_get(&event,
jack_port_get_buffer(in,nframes),
i++) == 0) {
/* echo input */
jack_midi_event_write(
jack_port_get_buffer(out,nframes),
event.time, event.buffer, event.size);
/* check for overflow */
if(atomic_load(items) <
JACK_MIDI_BUFFSIZE) {
buf[wp] = event;
wp = wp + 1 != JACK_MIDI_BUFFSIZE ?
wp + 1 : 0;
atomic_fetch_add(items, 1);
}
}
p->wp = wp;
return 0;
}

int main(int argc, const char **argv) {

if (argc < 2) {
printf("jmidi dur\n");
}
else {
jack_client_t *client;
int rp = 0;
unsigned int items = 0;
unsigned long end, time = 0, now;

client =
jack_client_open("MIDIMon",
JackNoStartServer, NULL);

if (client != NULL) {
udata state;
unsigned long end, time = 0, now;
end = (unsigned long)
(atof(argv[1])*MICROS);
state.items = 0;
state.wp = 0;
180 12 Realtime MIDI

/* register input port */


state.inport =
jack_port_register(client, "input",
JACK_DEFAULT_MIDI_TYPE,
JackPortIsInput, 0UL);

if (state.inport == NULL) {
jack_client_close(client);
printf("Could not open input port");
return -1;
}

/* register output port */


state.outport =
jack_port_register(client, "output",
JACK_DEFAULT_MIDI_TYPE,
JackPortIsOutput, 0UL);

if (state.outport == NULL) {
jack_client_close(client);
printf("Could not open output port");
return -1;
}

/* set process callback */


if(jack_set_process_callback(client,
jackProcess,
(void*) &state)
!= 0) {
jack_client_close(client);
printf("Could not set Jack callback");
return -1;
}

/* activate Jack */
if(jack_activate(client) != 0) {
jack_client_close(client);
printf("Could not start Jack processing");
return -1;
}

now = jack_get_time();
end += now;
while(time < end) {
12.4 MIDI on Jack 181

time = jack_get_time();
while(atomic_load(&state.items)) {
int size = state.buf[rp].size;
int offs = state.buf[rp].time;
jack_midi_data_t *mdata =
state.buf[rp].buffer;
printf("%.2f : %d : ", (float)(time-now)/MICROS,
offs);
switch (*mdata & 0xF0) {
case 0x80:
printf("NOTEOFF");
break;
case 0x90:
printf("NOTEON");
break;
case 0xA0:
printf("POLYAFTOUCH");
break;
case 0xB0:
printf("CTLCHG");
break;
case 0xC0:
printf("PGMCHG");
break;
case 0xD0:
printf("AFTOUCH");
break;
case 0xE0:
printf("PBEND");
break;
}
printf(" : CHAN %d : ",
*mdata++ & 0x0F);
size--;
while (size--)
printf("%d :", *mdata++);
printf("\n");
rp = rp + 1 != JACK_MIDI_BUFFSIZE ?
rp + 1 : 0;
atomic_fetch_sub(&state.items, 1);
}
}
/* close client */
jack_deactivate(client);
jack_client_close(client);
182 12 Realtime MIDI

printf("closed Jack client \n");


return 0;
} else {
printf("Could not open Jack client\n");
return -1;
}
}
return 0;
}

12.5 Conclusions

This chapter concludes the first part of our journey, from the shin program to
realtime audio synthesis using the C language. We were able to cover all of the
language syntax and semantics, plus a few key libraries. As far as C is concerned,
this is of course only the beginning, as mastering it depends on quite a bit of practice,
as well as some knowledge about the right APIs for a particular job (if anything, to
stop us from reinventing the wheel, but also to be able to access some important
system resources). So it is absolutely essential to be able to consult documentation
(for instance, the system manual with the command man) and to follow it up. At
this point, we should have built enough understanding of how the language and the
systems that underpin it work to allow us to do that when we need to. In the next
part of the book, we will take a detour and move to a different language, C++, and
programming paradigm, object orientation. However, we will do this in a continuous
manner, introducing this new environment as a superset of what we have become
familiar with in this part of the book.

Problems

12.1. Using the MIDI synthesiser example as a starting point, implement an added
tremolo effect to the synthesis, whose amplitude (effect amount) is controlled by
the modulation wheel (controller number 1) and whose frequency is controlled by
another control change message, from a different controller number.

12.2. The MIDI synthesiser example produces a very simple waveform (a sine
wave), which is composed of a single harmonic. How could you add more har-
monics to this waveform? Design a program that would allow the user to control the
number of harmonics in the sound using the modulation wheel.
Part II
Object-Oriented Audio in C++
Chapter 13
Oscillators

Abstract This chapter discusses one of the fundamental components of computer


music instruments, the oscillator. It explores this first from the perspective of a sinu-
soidal signal generation, discussing the concepts of phase, frequency, and sampling
increment, and then introduces the principles of table lookup. Alongside this, we
deal with the foundations of object-oriented programming, demonstrating how they
can be employed to model sound computing components, such as the oscillator. As
part of this, we swiftly move from C to the C++ language, introducing some of its
basic elements.

Oscillators are used primarily to generate periodic signals, such as waveforms.


Starting with the simplest of signals, the sinusoidal wave, we will introduce some
key concepts that will allow us to design and implement such generators. As we
have seen before, sine waves can be generated by invoking the sin() function of
the standard C library (defined in math.h). This function takes an angle (or phase)
and computes its sine value, which is equivalent to the length of the opposite side of
a right triangle with its hypotenuse measuring 1. To generate a sine wave, we make
the angle increase at a given rate, determined by the ratio of the desired frequency f
and the signal sampling rate fs . Since the sine function is periodic in 2π , we will use
this to scale the phase values as they increase. The full expression for the varying
phase becomes 2π f fts , where t is the time in samples. This is translated into C code
as
s[n] = sin(2*pi*f*n/sr);
Such an implementation will work, as we have already shown, but only in cases
in which the frequency f does not change, e.g. in a glide/glissando, vibrato, etc.
We may have noticed this very clearly in the MIDI synthesiser example in Sect.
12.3.5, where a change in frequency from one note to another causes a click, before
stabilising as the frequency becomes fixed again. As the sine function takes in an
angle as input, we need to compute it accurately for each sample. To do that for an
arbitrarily-varying frequency, we need to integrate it, as in [36]

© Springer Nature Switzerland AG 2019 185


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_13
186 13 Oscillators
  
s(t) = a(t) sin 2π f (t)dt (13.1)

This allows the frequency to assume different instantaneous values at each sam-
ple. To do this integration in a digital signal, we keep an account of the previous
phase and add to it a sampling increment based on the currently calculated fre-
quency (scaled by 2fπs ). The C code becomes

s[n] = sin(ph);
ph += 2*pi*f/sr;
It is trivial to show that, for the fixed-frequency case, the two code fragments
are equivalent. However, if f varies, the first example will produce an incorrect
output. Thus, a C implementation of such function would have to take account of
the sample-by-sample phase values that are produced by the integration of the time-
varying frequency f (n). To make it more widely available to a program, we can turn
this into a function, but the current value of the phase would be kept externally to it,
and modified as a side-effect (Listing 13.1). The memory address of this variable is
passed to the function as a pointer:

Listing 13.1: C function implementing Eq. 13.1.


#define twopi 6.283185307179586
double sineosc(double a, double f,
double *ph, double sr){
double s = a * sin(*ph);
*ph += twopi * f / sr;
return s;
}
For each independent oscillator that we would like to have, we will need to pro-
vide a separate variable to hold the current phase. For example, two oscillators play-
ing two sine waves at 220 and 375 Hz would be programmed as follows:

Listing 13.2: Program using the function in Listing 13.1.


int main() {
double ph1 =0., ph2 =0.;
int i;
double sr = 44100.;
for(i = 0; i < sr; i++)
printf("%f \n", sineosc(0.2, 220., &ph1, sr) +
sineosc(0.4, 375., &ph2, sr));
return 0;
}
We can see that this is a little awkward, since we need to remember to keep track
of the phase of each oscillator. It would be much better if we could package up the
oscillator and the memory it needs (called its state) together in one programming
object. The good news is that we can. We could place the oscillator and the phase
13.1 Moving to C++ 187

variable in a new type defined by a data structure. C only allows functions in data
structures as pointers and so we need to do something like this:

Listing 13.3: Self-contained oscillator types.


#include <math.h>
#include <stdio.h>

typedef struct _osc_ {


double ph;
double sr;
double (*process)(struct _osc_ *, double, double);
} OSC;

#define twopi 6.283185307179586


double sineosc(OSC *p, double a, double f){
double s = a * sin(p->ph);
p->ph += twopi * f / p->sr;
return s;
}

int main() {
int i;
OSC osc1 = { 0., 44100., sineosc},
osc2 = { 0., 44100., sineosc};

for(i = 0; i < osc1.sr; i++)


printf("%f \n", osc1.process(&osc1, 0.2, 440) +
osc2.process(&osc2, 0.4, 375));
return 0;
}
This looks much better, as we have packaged everything together in one single
data type, which we can instantiate many times. We can improve on this, but in order
to do so, we will need to move language, to C++.

13.1 Moving to C++

C++ [62, 63] is, depending on which angle we approach it from, a completely dif-
ferent language to C, or an extension to it, as its name (C-increment) implies. It is
also a big language, which stands opposed to the simplicity (and elegance) of C. In
this and the following chapters, we will follow a route that takes it as an extended
version of the C language. We will not hope to cover every single aspect of the lan-
guage as we were able to do with C, but we will learn the most sane and proper, and
those that will allow us to program music applications conveniently. The language
188 13 Oscillators

devices will be introduced as we need them. Note also that the compiler command
to be used from now on will now be c++ rather than cc to reflect the change in
language. Most of the C programs we have seen before will also be valid C++ code.
We can continue using the C libraries and most of its syntax. In the case of the stan-
dard library, the only difference is that we normally employ the C++ versions of its
headers. These have an added ‘c’ prefix and no ‘.h’ extension. For example, the C
header file stdio.h becomes cstdio in C++.

13.1.1 C++ Structures

The first main extension we would like to introduce is a significant change in how
structures work to define new types:

1. Variables instantiated from data types defined by structures do not need the
struct keyword to precede them. Once they are defined, they can be instanti-
ated just as any other variables types are.
2. Functions are allowed in structures. These are called member functions or meth-
ods in this context.
3. Members can belong to instances of structures (objects), which is the general
case, or to the structures themselves (and to no specific instance in particular). In
this case they are marked as static.
4. Non-static methods may access directly all variables defined in the structure
(called member variables or attributes in this context).
5. Structures can contain a special method called a constructor, which is used to
initialise a variable (also called an object in this context). Constructors have the
same name as the structure and are declared with no return type. They can have
any number of parameters, like any other method (including zero). If the structure
does not declare a constructor, the compiler will supply a default one, with no
arguments and no function body.

So, with these extensions, we can rewrite our oscillator code more conveniently
in C++:

Listing 13.4: C++ version of Listing 13.3.


#include <cmath>
#include <cstdio>

const double twopi = 8. * atan(1.);


struct Osc {
double ph;
double sr;
Osc() : ph(0.), sr(44100.) { };
double process(double a, double f){
double s = a * sin(ph);
13.1 Moving to C++ 189

ph += twopi * f / sr;
return s;
}
};

int main() {
int i;
Osc osc1, osc2;

for(i = 0; i < osc1.sr; i++)


printf("%f \n", osc1.process(0.2, 440) +
osc2.process(0.4, 375));
return 0;
}
As can be seen, the code simplifies somewhat. The data type is more compactly
described and we do not need to pass in pointers to the function, or initialise a
function pointer. In the processing method, we can access and modify the struct
variables directly. The constructor declaration requires some explanation:
Osc() : ph(0.), sr(44100.) { };
In C++, every single type has a constructor. This includes the fundamental built-
in types we have already seen in the C language, which are also called trivial or
trivially constructed. So a double, will have a double(double x) construc-
tor built into the language, which constructs a double variable with initial value x.
We can invoke it by calling the name of the variable followed by the initialisation
parameter, e.g. ph(0.) initialises the double ph member variable. A construc-
tor then has the form:

struct-name ( argument-list ) : member-initialisation-list { body }

and the member initialisation list is a comma-separated series of calls to constructors


of each member variable. The function body and argument list can be empty (as in
the present example). We can also declare the constructor to take in parameters to
initialise the object:
Osc(double phs, double esr) : ph(phs), sr(esr) { };
If we declare an object with no initialisation parameters, the default construc-
tor that takes no parameters is used. If only a constructor that takes parameters is
provided, the object will be required to be initialised with those parameters (the
compiler will complain otherwise).
Also note in Listing 13.4 that the headers for C standard library functions are
named slightly differently in C++ (although the C headers would generally also
work here). We also introduced the const keyword, which is used to indicate that
a constant (a read-only object) is created, rather than a variable.
190 13 Oscillators

C++ structures are our first step into object-oriented programming, which, as we
will see, is a very convenient way of programming audio and music applications.
The idea is that we can create fully-fledged new types, from which any number of
objects can be instantiated and manipulated. The example in Listing 13.4 demon-
strates the idea fully: a type that encapsulates the model of a sine wave oscillator,
with a method to manipulate it (i.e. generate audio).

13.1.2 Overloading and Optional Parameters

Another feature of C++ that can prove very useful for us is the possibility of supply-
ing the same function name with different implementations for different argument
types. For instance, it is legal to declare
double process ();// no arguments
double process(double amp);// one argument
double process(double amp, double freq);//two arguments
For each one of these we will provide a separate implementation. We could,
for instance, modify our oscillator structure design to incorporate amplitude and
frequency as member variables, and then provide different implementations for fixed
or varying parameters:
struct Osc {
double fr;
double amp;
double ph;
double sr;
Osc(double a, double f) : amp(a), fr(f),
ph(0.), sr(44100.) { };
double process(double a, double f){
amp = a; fr = f;
double s = a * sin(ph);
ph += twopi * f / sr;
return s;
}
double process(double a) {
amp = a;
return process(amp, fr);
}
double process(){
return process(amp, fr);
}
};
The user can then decide which one is needed, depending on whether the fre-
quency, the amplitude, or both need to change. Constructors can also be overloaded,
13.1 Moving to C++ 191

if we want to create objects with slightly different parameter configurations, or a


default constructor in addition to a constructor taking parameters. In complement to
this, we can make some or all arguments have default values, which are used if a
parameter is not supplied:
double process(double amp = 0.5, double freq = 440.);
This can be used in a constructor to allow for some parameters to be optional; for
example,
Osc(double a, double f, double phs = 0.,
double esr = 44100.) :
amp(a), fr(f), ph(phs), sr(esr) { };
Optional arguments need always to be towards the right (or the end) of the pa-
rameter list. For instance, the first is not allowed to be optional if the second is not,
as the semantics would not be clear in this case.

13.1.3 Memory Management

C++ has three built-in memory management operators: new, delete, and delete[].
These replace the C library functions malloc() and free(). The two memory
management systems should not be used interchangeably, and in C++ we should
adopt the language standard operators.
An object can be dynamically allocated with the following syntax:
Osc *oscil = new Osc(0.5,440.);
Since this is a pointer, we need to use the correct syntax to access its members:
oscil->process();
When we are done with it, we dispose of the memory using
delete oscil;
One important reason for using new and delete is that this mechanism allows
for correct object construction in all cases. It also implements destruction, which
is the opposite process, when memory is disposed of and resources freed. As you
might expect, a structure will also have a special method to do this, called a destruc-
tor. We do not need to define this in many cases, unless we ourselves have allocated
memory or used any other resources that need to be freed (e.g. file handles, etc.).
The compiler will provide a default destructor for each structure that does not define
one.
However, if we need to implement this, the signature for this method is

∼struct-name ( )
192 13 Oscillators

that is, it is the structure name with a ∼ in front of it and takes no parameters.
Finally, we can also create arrays of objects dynamically using a slightly different
syntax:
double samples = new double[size]
where size is an integer variable (or a constant). Memory deallocation is effected
with the second version of delete:
delete[] samples.
We need to make sure that the correct version of this operator is used. With these
new C++ extensions, we can now proceed to designing a fully-fledged oscillator.

13.2 The Table Lookup Oscillator

The sinusoidal wave oscillator that we have been exploring so far has a couple of
limitations. It does not allow us to generate an arbitrary waveform, and it makes one
function call per output sample, which is not very efficient. So we can improve on
this by designing a more flexible and general algorithm: the table lookup oscillator,
which generates a vector of samples.
The idea of table lookup is that we have a memory block, which is a table of
values, containing the output of some pre-computed function (e.g. a sine or any
other shape). The table has a size, which is the number of values in memory and
we can read it (look it up) to get the value of a function given an input argument,
which is an index of a position in the table. In programming terms, we have an array,
which we initialise with a set of values, and the oscillator uses it instead of calling a
function directly.
The algorithm is defined by a couple of equations:

s(t) = a(t)T(θ (t) mod N) (13.2)

N
θ (t + 1) = θ (t) + f (t) (13.3)
fs
You will recognise that the function T (), the table lookup, replaces the sin()
function in our previous oscillator design. Also, because the phase θ (t) has to be
within the bounds of the table used, we need to apply a mod N operation to it, as
we perform the lookup (N as the table size). That will keep the index between 0 and
N − 1, if it is below or above this range. Since the function ranges over these bounds,
we will scale the frequency by Nfs instead of 2fπs . Also, given that we are looking up
an array, the index has to be a whole number. For this we need a floor operation
(x).
Now we have a couple of modifications to make to our previous oscillator code,
such as
13.2 The Table Lookup Oscillator 193

double process(double a, double f){


amp = a; fr = f;
double s = a * table[(int)ph]
ph += size * f / sr;
while(ph >= size) ph -= size;
while(ph < 0) ph += size;
return s;
}
to realise the table lookup oscillator, using a double table array as a function
table with int size pre-computed values. Both of these variables are assumed to
be in the scope of this method, placed inside the structure that will hold it.
To complete the algorithm, we will want to process a whole block of samples (a
vector) instead of a single sample per function call. This is a more efficient way to
proceed when computing audio [13]. Processing vectors will require us to loop over
the output array to fill it:
const double *process(double a, double f){
double incr = size * f / sr;
amp = a; fr = f;
for(int i = 0; i < vsize; i++){
s[i] = amp * table[(int)ph];
ph += incr;
while(ph >= size) ph -= size;
while(ph < 0) ph += size;
}
return s;
}
We are assuming that the array s exists inside the data structure (i.e. the object
holds its output), and that it has size int vsize (also a member variable). Note
also that since the frequency fr can change at most once every vsize samples, we
can move the calculation of the amount of phase update needed (the increment) to
outside of the processing loop. This saves a few operations per sample.
In this code we have also introduced a couple of programming devices we have
not yet used:
• C++ allows us to declare a local variable, whose scope is limited to the loop
body, in the for initialiser. Note that although we have not used this before, it is
a feature that is also present in the C99 standard.
• The function signature contains the const keyword. In this case, it means that
we are returning a pointer to const double. It does not mean that the pointer
itself is a constant, but that the data it is pointing at cannot be changed; the double
array returned is read only. This is good practice since we want to prevent the
oscillator output being modified externally.
We now have all the pieces that we need to create an Osc type that implements
a general-purpose table-lookup oscillator:
194 13 Oscillators

Listing 13.5: Table-lookup oscillator.


struct Osc {
double fr;
double amp;
const double *table;
unsigned int size;
double ph;
double *s;
unsigned int vsize;
double sr;

Osc(double a, double f, const double *t,


unsigned int sz, double phs = 0.,
unsigned int vsz = 64,
double esr = 44100.) :
amp(a), fr(f), table(t), size(sz), ph(phs),
s(new double[vsz]), vsize(vsz), sr(esr) { };

∼Osc() { delete[] s; }

const double *process(double a, double f){


double incr = size * f / sr;
amp = a; fr = f;
for(int i = 0; i < vsize; i++){
s[i] = amp * table[(int)ph];
ph += incr;
while(ph >= size) ph -= size;
while(ph < 0) ph += size;
}
return s;
}

const double *process(double a) {


amp = a;
return process(amp, fr);
}

const double *process(){


return process(amp, fr);
}
};
Note that in this code we have employed all the C++ devices we have learned so
far:

• Overloaded methods: process() can be called in three different ways.


13.2 The Table Lookup Oscillator 195

• Default parameters: the constructor has a number of defaults, so that the user
does not need to provide them in most cases.
• Read-only variables: the table should not need to be modified by the oscillator,
so we make it read-only. The output of process(), as we’ve outlined above,
is also read-only.
• The output vector is created dynamically, since we do not know at compile time
what size it will be. We use new to allocate it, initialising the pointer.
• Now the structure has some resources it needs to manage, so we have to supply
a destructor, which calls delete[] to free the array (otherwise we would have
a memory leak).

Since we have no built-in function table, we now need to supply one for this
object. Any periodic function will do, but, of course, if we are generating audio, we
should be trying to provide band-limited waveforms, rather than naı̈ve geometric
shapes. The simplest way is to use a Fourier series [18, 36, 56], summing sinusoidal
waves. The example below creates a table with two harmonics and uses an Osc
object to generate an output based on this:

Listing 13.6: Synthesis example.


#include <cstdio>
#include <cmath>

const double twopi = 8. * atan(1.);


int main() {
const unsigned int size = 10000;
double tab[size];
const double *out;
Osc osc(0.5, 440., tab, size);

for(int i=0; i < size; i++)


tab[i] = 0.5*(sin(i*twopi/size) +
sin(2*i*twopi/size));

for(int i = 0; i < osc.sr; i+=osc.vsize){


out = osc.process();
for(int j = 0; j < osc.vsize; j++)
printf("%f \n", out[j]);
}
return 0;
}
To build the program, first the Osc class needs to be added to the code in Listing
13.6, and then we can compile this file with the c++ command:
c++ -o osc osc.cpp
196 13 Oscillators

Figure 13.1 shows a plot of the output of this program. We can clearly see that
the presence of two partials creates a wave shape that is different from a simple sine
wave. The plot shows 200 samples, which is just short of two periods at 440 Hz.

0.4

0.2

0.0

−0.2

−0.4

0 50 100 150 200

Fig. 13.1: A plot of the oscillator output from Listing 13.6.

13.3 Conclusions

Oscillators are the workhorses of digital synthesis. The basic algorithm can be used
to produce any type of periodic waveform. It can be used for sampled-sound play-
back (if we replace the single-waveform table by a whole block of recorded sound)
and for envelope generation (if we use an envelope shape as the function table and
adjust the frequency to be the inverse of the envelope duration).
We have shown that oscillators have state and that keeping it packaged in a struc-
ture is a very good idea. To do this in a convenient form, we have upgraded our
implementation language from C to C++ and introduced some relevant program-
ming devices. We will continue on this path in the following chapters, adding some
more strings to our bow.

Problems

13.1. Write a program using the Osc structure that will produce a band-limited
sawtooth wave with a given frequency and amplitude given as arguments to the
program. Use either libsndfile or Portaudio to implement the audio output.

13.2. Modify the Osc structure to allow for (optionally) audio-rate amplitude and/or
frequency modulation. Write a program using two of these objects to implement
simple (sinusoidal) frequency modulation synthesis, taking the carrier and modu-
13.3 Conclusions 197

lator frequencies, index of modulation and the signal amplitude as arguments. Use
either libsndfile or Portaudio to implement the audio output.
Chapter 14
Interpolation

Abstract In this chapter we concentrate, on the signal processing side, on the con-
cept of interpolation and how it can be applied to produce better oscillators. We also
look at taking these synthesis components apart into its constituent elements, phase
generation and table reading. From a programming perspective, the discussion of
different kinds of oscillators allows us to introduce inheritance, and the concept of
polymorphism. We also explore a new way of handling addresses of objects, which
is provided by reference types in C++.

The table-lookup oscillator we introduced in the previous chapter is the simplest


one of its kind, and it is not as precise as we would have liked it. If we compare
the output of the original sine wave oscillator (using a direct call to sin() and
that of an oscillator reading a sine wave table, we will see that there are some small
differences. The main reason for this is that while the sin()) access translates an
angle defined in double precision to a double precision result, in the table lookup we
truncate the index to an integral value to be able to access the array memory. The
sine wave that is stored in the function table is sampled at N points (N is the table
size), and the error in the output will be inversely proportional to this size.
The solution to this problem is to be able to find intermediate values between
table positions, so that we do not need to truncate the position index to get a result.
For instance, if the index is 10.3, we need to be able to find a precise number that
sits in between the values of positions 10 and 11. In order to do this, we interpolate
[30]. While there are various methods we can apply to perform interpolation, the
most common is to use a polynomial. The higher the order of the polynomial, the
more precise the result will be, but this also increases computational complexity.
While there is a balance to be reached between output quality and efficiency, it is
understood that the low computational load of truncation does not justify its poor
precision and that we should use, at minimum, first-order interpolation. In the fol-
lowing sections, we will explore the principles of first and second-order polynomial
methods.

© Springer Nature Switzerland AG 2019 199


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_14
200 14 Interpolation

14.1 Linear Interpolation

Linear interpolation, which uses a first-order polynomial, finds a value that is situ-
ated on a straight line between two table values in adjacent positions. Conceptually,
if we have 10.3 as an index, we will mix 70% of position 10 with 30% of position
11 to get the result. The polynomial expression is

f (x) = ax + b (0 < x < 1) (14.1)


where a and b are coefficients calculated from table values at adjacent positions, x
is the fractional position between the two indices, and f (x) is the result we are inter-
ested in. It is easy to demonstrate that the coefficients can be computed as follows

a = y2 − y 1
(14.2)
b = y1

with y1 = T (θ (t) mod N) and y2 = (θ (t) + 1mod N), i.e. the values of two
adjacent lookup positions (for a given phase θ (t)). The extra cost is effectively one
extra multiplication and two sums:
const double *process(double a, double f){
double frac;
int posi;
amp = a; fr = f;
for(int i = 0; i < vsize; i++){
posi = (int) ph;
frac = ph - posi;
s[i] = amp * (table[posi] + frac*(table[posi+1]
- table[posi]));
ph += size * fr / sr;
while(ph >= size) ph -= size;
while(ph < 0) ph += size;
}
return s;
}
To facilitate computation, we will assume that the table will have an extra point
at the end, which is used when interpolating beyond the last point of the range. A ta-
ble can be constructed to this specification. Linear interpolation does not add much
computational load, and, as mentioned above, should be considered the basic oscil-
lator lookup method. Truncation should not be used unless absolutely necessary.
14.2 Cubic Interpolation 201

14.2 Cubic Interpolation

The next method of polynomial interpolation that is practical to adopt in table


lookup is of third order, also known as four-point interpolation. Here we take the
values of four points around the target index and trace a non-linear curve that will
pass through all of these points, and get its value at the required position. The poly-
nomial expression is

f (x) = ax3 + bx2 + cx + d (0 < x < 1) (14.3)


where, again, we have x as the fractional position between table indices. The poly-
nomial coefficients are obtained as follows:
1. Set f (−1) = y0 , f (0) = y1 , f (1) = y2 and f (2) = y3 , where yn = T (θ (t) − 1 +
nmod N).
2. Solve the system

y0 = −a + b − c + d
y1 = d
(14.4)
y2 = a + b + c + d
y3 = 8a + 4b + 2c + d

3. The coefficients are


a = (y3 − 3y2 + 3y1 − y0 )/6
b = (y2 − 2y1 + y0 )/2
(14.5)
c = −y3 /6 + y2 − y1 /2 − y0 /3
d = y1

As is obvious, there are many more operations involved in cubic interpolation.


Coefficient calculation is more complex, and there is also the need to raise the time
variable x to powers of 2 and 3. It is possible to factorise Eq. 14.5 to avoid repeated
operations and allow for some efficiency gains, but overall there is much more com-
putation involved in this method than in simple linear interpolation.
Considering these points, we can implement a cubic table-lookup oscillator as
follows:
const double *process(double a, double f){
double frac, a, b, c, d;
double tmp, fracsq, fracb;
int posi;
amp = a; fr = f;
for(int i = 0; i < vsize; i++){
posi = (int) ph;
frac = ph - posi;
a = posi == 0 ? table[0] : table[posi - 1];
202 14 Interpolation

b = table[posi];
c = table[posi + 1];
d = table[posi + 2];
tmp = d + 3.f * b;
fracsq = frac * frac;
fracb = frac * fracsq;
s[i] = amp * (fracb * (-a - 3.f * c + tmp) / 6.f +
fracsq * ((a + c) / 2.f - b) +
frac * (c + (-2.f * a - tmp) / 6.f) + b);
ph += size * fr / sr;
while(ph >= size) ph -= size;
while(ph < 0) ph += size;
}
return s;
}
Similarly to linear interpolation, we can expect the table to be extended by two
points beyond the nominal range to allow for interpolation beyond the table size.
We need, however, to protect the lookup from the case where the truncated position
is 0 and the n − 1 sample needs to be read. Higher-order interpolation methods can
be devised, but as we can observe, although they will increase the precision of the
lookup, the computational demands will grow significantly. Most of the applications
will probably be covered with linear or cubic interpolation. Fig. 14.1 shows a com-
parison of a test signal (a segment of a sine wave sampled at four points) and its
approximation by linear and cubic interpolation. We can see how cubic interpola-
tion does a good job of modelling the wave in between the two sample positions (1
and 2), while the linear curve is also acceptable in this case.

14.3 Inheritance

To implement the various versions of the table-lookup oscillator, we have two op-
tions: to provide a mode switch in which the object will be constructed to operate
with one of a number of table access algorithms; or, alternatively, to create sepa-
rate structures that will implement them. The second option is probably the clean-
est, since we can keep the different interpolation implementations in well-separated
components. However, once we decide for this, it would also useful to add as little
as possible to what we have already programmed, reusing as much as we can. How
can we do that beyond cut-and-paste? The answer is to adopt another aspect of the
object-oriented approach: inheritance, which is very well supported by C++.
This means that we can make a structure become a child or a derived structure
of an existing one, which is its parent or base. We can make the two share the at-
tributes that were defined in the original structure, and add new elements to it to
complement the process. The C++ syntax for a structure definition that inherits and
14.3 Inheritance 203

signal
1.0

0.8

0.6

0.4

0.2

0.0
0 1 2 3

linear interpolation
1.0

0.8

0.6

0.4

0.2

0.0
0 1 2 3

cubic interpolation
1.0

0.8

0.6

0.4

0.2

0.0
0 1 2 3

Fig. 14.1: A comparison of a signal sampled at four points, linear interpolation (between points 1
and 2), and cubic interpolation.

can access all members of a base structure is

struct name : base-name { member-declarations };

Let’s see what we could do with the present oscillator cases. Starting from our
existing Osc structure, we can define Osci (linear interpolation) and Oscc (cubic
interpolation):

Listing 14.1: Derived structures.


struct Osci : Osc {
204 14 Interpolation

Osci(double a, double f, const double *t,


unsigned int sz,
double phs = 0., unsigned int vsz = 64,
double esr = 44100.) :
Osc(a,f,t,sz,phs,vsz,esr) { };

const double *process(double a, double f);


const double *process(double a);
const double *process();
};

struct Oscc : Osc {

Oscc(double a,double f,const double *t,unsigned int sz,


double phs = 0.,unsigned int vsz = 64,
double esr = 44100.) :
Osc(a,f,t,sz,phs,vsz,esr) { };

const double *process(double a, double f);


const double *process(double a);
const double *process();
};
The inheritance diagram for these three structures is shown in Fig. 14.2. Note
that we have supplied a constructor for each structure, which calls the base structure
constructor (passing, in this case, all parameters to it, since we have no other mem-
bers specific to these derived structures). The sole reason we have created these
structures is provide new implementations to the processing methods in the base
structure, which we declare here (and can implement elsewhere). These methods
will hide the base structure ones, and take the place of them when an object of the
derived structure is used.

Osc
6

Osci Oscc
Fig. 14.2: Inheritance diagram for Osc, Osci, and Oscc.
14.3 Inheritance 205

14.3.1 Polymorphism

However, we can do better than this. Instead of hiding the base methods, we can
let the compiler decide which one to use, when it is most appropriate. Consider
this case: a pointer to Osc is used to hold a dynamically-allocated object of one
of its substructures. This is perfectly allowed by C++, since the child is just an
extension of the parent and so access to memory is safe. If we use this pointer to
access a process() method, however, the hiding mechanism will defeat us: the
base structure code is used, not the intended derived one. So reimplementing via
hiding is not a good idea as its semantics breaks down in some situations.
So, to improve on this, we use virtual methods, which allow the compiler to
safely select the relevant function. It is just a matter of marking the base structure
functions with the keyword virtual to warn that they might be reimplemented in
a child:

Listing 14.2: Virtual methods


struct Osc {

...
virtual ∼Osc() { delete[] s; }
virtual const double *process(double a, double f);
virtual const double *process(double a) {
amp = a;
return process(amp, fr);
}
virtual const double *process(){
return process(amp, fr);
}
};
Then, in the derived structures, the functions will not be hidden, but will instead
use the override mechanism. In this case, a pointer to the base structure will not nec-
essarily imply that the functions defined there will be used. It will all depend on the
actual type of the object that it holds. This feature of object-oriented programming
is called polymorphism. The derived object becomes a specialised subtype of the
base.

14.3.2 Oscillator Inheritance Tree

With this in mind, it makes sense to reorganise the three structures in the oscillator
inheritance tree to adopt these principles to reuse code as much as possible:
206 14 Interpolation

• In the base, we declare the processing ‘kernel’ as virtual, that is, the oscillator
code is to be reimplemented (specialised) in the derived structures. Let’s call this
method oscillator().
• In the base, we declare various interfaces to it, the overloaded process()
methods, which will call the actual processing code.
• In the derived structures, we just reimplement the processing ‘kernel’.
When an object of any of the three structures is created and calls the process-
ing methods, these will in turn call, through the virtual mechanism, the appropriate
oscillator code. The remodelled structures would look like this:
Listing 14.3: Table-lookup oscillator structures declaration (oscillators.h).
#ifndef _OSCILLATORS_H_
#define _OSCILLATORS_H_

struct Osc {

double fr;
double amp;
const double *table;
unsigned int size;
double ph;
double *s;
unsigned int vsize;
double sr;

virtual void oscillator();

Osc(double a,double f,const double *t,unsigned int sz,


double phs = 0.,unsigned int vsz = 64,
double esr = 44100.) :
amp(a), fr(f), table(t), size(sz), ph(phs),
s(new double[vsz]), vsize(vsz), sr(esr) { };

virtual ∼Osc() { delete[] s; }

const double *process(){


oscillator();
return s;
}

const double *process(double a, double f){


amp = a; fr = f;
oscillator();
return s;
}
14.3 Inheritance 207

const double *process(double a) {


amp = a;
oscillator();
return s;
}
};

struct Osci : Osc {


Osci(double a,double f,const double *t,unsigned int sz,
double phs = 0.,unsigned int vsz = 64,
double esr = 44100.) :
Osc(a,f,t,sz,vsz)
void oscillator(); // overrides Osc::oscillator()
};

struct Oscc : Osc {


Oscc(double a,double f,const double *t,unsigned int sz,
double phs = 0.,unsigned int vsz = 64,
double esr = 44100.) :
Osc(a,f,t,sz,vsz)
void oscillator(); // overrides Osc::oscillator()
};
#endif
In this code, we have not implemented the oscillator() ‘kernel’, but only
declared it. We will define these functions elsewhere. This is a design choice which
has a subtle implication. Any methods defined inside a structure definition are by
default inline: the compiler replaces any code that calls these by a complete copy
of the function, eliminating the function call (see Sect. 6.1.5). This has the potential
to speed up code, but also to make binary executables bigger. We tend to inline
short functions as the potential to improve performance trumps any small increase
in program size.
In the case of the oscillator methods, it is probably better to implement them
outside the structure as they are far larger in size and do a lot of work when called,
which then minimises any function invocation overheads. To do this, we write an
implementation file, generally with the extension .cpp, which will hold this code.
In this case, the structures should be defined in a header file so that they are made
accessible to programs (without having to copy the code to each one using it).
The code implementing a structure method needs to use a qualified name, which
has the form

struct-name :: method ( argument-list )

The oscillator implementation file will look like this:


208 14 Interpolation

Listing 14.4: Oscillator implementation (oscillators.cpp)


#include "Oscillators.h" // header

void Osc::oscillator(){
for(int i = 0; i < vsize; i++){
s[i] = amp * table[(int) ph];
ph += size * fr / sr;
while(ph >= size) ph -= size;
while(ph < 0) ph += size;
}
}

void Osci::oscillator(){
double frac;
int posi;
for(int i = 0; i < vsize; i++){
posi = (int) ph;
frac = ph - posi;
s[i] = amp * (table[posi] + frac*(table[posi+1]
- table[posi]));
ph += size * fr / sr;
while(ph >= size) ph -= size;
while(ph < 0) ph += size;
}
}

void Oscc::oscillator(){
double frac, a, b, c, d;
double tmp, fracsq, fracb;
int posi;
for(int i = 0; i < vsize; i++){
posi = (int) ph;
frac = ph - posi;
a = posi == 0 ? table[0] : table[posi - 1];
b = table[posi];
c = table[posi + 1];
d = table[posi + 2];
tmp = d + 3.f * b;
fracsq = frac * frac;
fracb = frac * fracsq;
s[i] = amp * (fracb * (-a - 3.f * c + tmp) / 6.f +
fracsq * ((a + c) / 2.f - b) +
frac * (c + (-2.f * a - tmp) / 6.f) + b);
ph += size * fr / sr;
14.4 Function Table Objects 209

while(ph >= size) ph -= size;


while(ph < 0) ph += size;
}
}
A program using these oscillator structures would need to include the header file.
When building the program, the implementation file should be compiled and linked
to the main program using the standard c++ command. Assuming the header file to
be in the same directory and the main() function in main.c, we have
$ c++ -o program main.c oscillators.cpp -I.
Alternatively, we can compile the two files separately into object code and then
link these separately:
$ c++ -c -o main.o main.c -I.
$ c++ -c -o oscillators.o oscillators.cpp -I.
$ c++ -o program oscillators.o main.o
This is often done in larger projects to avoid the need to recompile every single
file when only one of them has been modified. Build system programs such as make
are used for this purpose.
Note that we could go one step further in code reuse. The modulo operation, used
in all three oscillators, is exactly the same. We can remove it from the code replac-
ing it by a function defined in the base structure. As we have seen, functions defined
inside the data structure are treated as inline. Therefore, making this a separate func-
tion will not incur any overhead due to function calls. As we noted in Sect. 6.1.5,
we can also request that a given function is treated this way by using the inline
attribute, but this is not needed in this case.
A modification in the design of an existing structure, such as this one, where we
might move code around, is often called refactoring. We have done this twice in this
chapter: we have added virtual methods and reorganised the code into a processing
kernel and an interface. This is very common in object-oriented programming, and
we will keep doing this to refine the structures we are developing.

14.4 Function Table Objects

Now that we have embarked more incisively on an object-oriented way of designing


code, it might be useful to take a look at other components that could be modelled as
structures for more convenient use. Function tables, as used by oscillators, appear to
be a good target for this. Until now, they have been simple arrays with no particular
regard to their size or contents. It would be useful to package them into a new type
that would not only hold the data and its size but also construct the table properly
according to a given algorithm.
We also know much better than to create isolated one-off structures, so we should
start with a proper base structure design, which will be simple enough to accom-
210 14 Interpolation

modate the various subtypes that we might require later. Basically we need two
attributes, which are common to all of these:
1. The table data array.
2. The table size.

The simplest type of table, which would serve well as the base, employs a gen-
erating algorithm that just copies data from an array into it:

Listing 14.5: Function table structure


#include <cstring>

struct Func {
double *table;
int size;
Func(int siz, const double *in = NULL) :
table(new double[siz+2]), size(siz) {
if(in) {
memcpy(table, in, siz*sizeof(double));
table[siz+1] = table[1];
table[siz] = table[0];
}
}
∼Func() { delete[] table; }
};
Note that, in order to work with cubic and linear-interpolation oscillators, we
allocate two extra points and fill these with the first two positions in the table, ex-
pecting that the oscillator will wrap around the ends of the table in performance. We
only fill in the table if an input is supplied. Any Func-derived structures will in-
herit the basic attributes, but can be constructed differently. Oscillators using a table
object can then access its table pointer and size, which are packaged together. We
should derive the Func structure to implement the various waveforms we require.

14.5 Reference Types

We can also rewrite or add a new constructor to the oscillator structures to take in
table objects directly, rather than have to look for their pointers and sizes. Given that
we will need to pass a whole structure as a parameter to the constructor, we need to
be careful how we do this. Recalling that arguments are always passed to functions
by copy, we have two options:

1. Use Func as the argument type and then the whole object is copied into the
constructor. This is very wasteful as we do not need copies to be made.
14.5 Reference Types 211

2. Use Func* as the argument and manipulate the address of a table object, which
will just amount to copying a pointer.
Clearly, option 2 is much better as we should avoid at all costs copying structures,
either as arguments or as return types. The only drag with this is that we will need
to work with pointers to structures, addresses and a slightly different syntax. In
C++, there is a third alternative, which is to use a reference to an object. References
are similar to pointers in that we do not operate on an object directly, but through
another variable that is referring to it. The main differences between pointers and
references are:
• A reference binds to a single object at initialisation time; in that sense, it behaves
similarly to a constant pointer (i.e. T* const) in that you cannot change to
where it is pointed (but you can change the contents of the object that you are
referencing).
• It is not possible to have a NULL reference.
• The reference variable does not need to be dereferenced to access the object, we
can do this directly (i.e. no indirection operator is used).
A reference to an object of type T is declared and initialised as

T& reference = object;

We always need to initialise a reference to an existing object. For example,


Func tab(10000);
// make tabref refer to table
Func &tabref = tab;
// manipulate the object via the reference
for(int i = 0; i < tabref.size; i++)
tabref.table[i] = (double) i;
Most commonly, we use it to pass parameters to functions by reference rather
than by copy:
void swap(int &a, int &b) {
int tmp = a;
a = b;
b = tmp;
}
This is done without having to pass variable addresses and dereference pointers
to access the memory. The function can be called just by using1
int n = 1, m = 2;
swap(n, m);

1 In fact, a similar function, std::swap(), defined for arbitrary argument types, is provided by

the C++ standard library.


212 14 Interpolation

14.5.1 Copy Constructors

One of the typical uses of reference type arguments is in the declaration of an ex-
plicit copy constructor, for example
struct A {
...
A(const A& x);
};
where the argument may or may not be marked as const, but it is always of the
structure reference type.
Copy constructors are used to construct objects from other existing objects of
the same type. If not given explicitly, the compiler generates one implicitly for the
structure. However, in some cases, this is not suitable and a specially-written copy
constructor has to be provided. This is the case for structures that include external
resources (such as a dynamically-allocated memory block).
In fact, our Osc and Func structures would require one if we were to copy them,
or pass them as (non-reference) arguments to functions. Since we are not doing that
in the current use of these structures, we may sidestep the question. However, this
issue will need to be dealt with at some point if we are to make their code more
robust.

14.5.2 Object Reference Arguments

The use of reference types for arguments more generally is very welcome. For in-
stance, in the particular case of a typical constructor for the Osc structure, we could
have
Osc(double a, double f, const Func &tab,
double phs=0.0, int vsiz = 64,
double esr=44100.) :
amp(a), fr(f), table(tab.table),
size(tab.size),
ph(phs), s(new double[vsiz]),
vsize(vsiz), sr(esr) { };
The parameter type is const Func&, which means a reference to a const
Func, since we want it to be read-only (the table does not get modified at any
point). It is always good to let the compiler know what your intentions are: if you
are passing a reference and you will not going to modify the underlying object, use
const to make it read-only (the same applies to pointers). Since the table pointer
in Osc is also const, we have no problems initialising it with the table pointer
from a const Func&, as both are read only in this case. Note also that members
14.5 Reference Types 213

of a referenced object are accessed in the same way as before, without the need for
any special indirection syntax.
It is true that we could have modified Osc to hold a const Func& member in-
stead of a const double*. However that would have prevented us from chang-
ing the table we are using at some stage in the lifecycle of the object, since a refer-
ence cannot be assigned to, but a pointer can. Perhaps this is something we do not
want to do at this point. We may, for instance, want to add an Osc::SetTable()
method at some point. Additionally, if we were to use a table object, we would need
to modify the oscillator code to access the data, and this seems unnecessary now.
As a trivial example, we can modify the code in Listing 13.6 to use a function
table object and the new Osc constructor that takes it:

Listing 14.6: Synthesis example with table object


#include <cstdio>
#include <cmath>
#include "oscillators.h"
#include "func.h"

const double twopi = 8. * atan(1.);


struct SinTab : Func {
SinTab(int siz) : Func(siz) {
for(int i=0; i < size; i++)
table[i] = sin(i*twopi/size);
}
};

int main() {
const unsigned int size = 10000;
const double *out;
SinTab tab(size);
Osc osc(0.5, 440., tab);

for(int i = 0; i < osc.sr; i+=osc.vsize){


out = osc.process();
for(int j = 0; j < osc.vsize; j++)
printf("%f \n", out[j]);
}
return 0;
}
In this example we have created a very simple new type that holds a sine wave.
In a more developed context, we would expect that a function table structure im-
plementing waveforms such as this one would be more general, allowing for, say,
multiple harmonics rather than a single component (see also Prob. 14.1). In such a
scenario, the encapsulation of function tables as objects in a program is well worth
our while.
214 14 Interpolation

14.5.3 Self References

An object may, if required, reference itself through the use of the implicit member
variable this, which is a pointer to its type. This member holds the address of the
object in which it appears. We are allowed to employ it in any (non-static) method,
as well as in constructors. For example,
struct A {
int a;
int b;

// b is initialised to 0, the value of a


A() : a(0), b(this->a) { };

void set(int a) { // parameter a hides member a


// this pointer explicitly refers to member a
this->a = a;
}

// returns a reference to this class


A& ref() { return *this; }
};
References to self are very useful in a number of situations, and can be easily
facilitated through the this pointer mechanism.

14.6 Phase Generators and Table Readers

Oscillators are actually composite objects made up of three separate operations put
together:
1. Table lookup: the actual reading of the function table values.
2. Phase update: incrementing/decrementing the phase value.
3. Amplitude scaling: applying a gain to the values obtained from table lookup
before the output.
We can separate these into individual steps and model them as signal processing
objects. In some applications this can be useful as it enables certain types of ma-
nipulation that are generally not available for a single-block oscillator. For instance,
if we want to implement phase modulation, as opposed to frequency modulation,
we need to be able to generate the phase as a separate signal to which can apply
sample-by-sample offsets.
The two main components we need to implement are the phase generator, or
phasor, and the table reader. To allow for interpolation modes, we should actually
implement three types of the latter operator. Both phasors and table readers will
14.6 Phase Generators and Table Readers 215

have plenty of applications in synthesis and processing, which will make it worth
our while modelling them as structures.

14.6.1 The Phasor

A phase generator will produce a ramping signal going from 0 to 1 (or from 1 to 0)
at a given rate. It is represented by the following expression:
 
f (t)
θ (t + 1) = θ (t) + mod 1 (14.6)
fs
While this can be programmed recursively, it is more suitable to implement it as
a loop, as in

Listing 14.7: Phasor processing function


const double *Phasor::process(){
for (int i = 0; i < vsize i++) {
s[i] = phs;
phs += incr;
mod1();
}
return s;
}

We set the increment to be f f(t)s


, update the phase and apply a mod 1 to it. The
output will be a rising or falling naı̈ve (geometric) sawtooth that can be used as the
(normalised) phase of a periodic function. We can even use this signal directly if we
do not mind the aliasing distortion it produces. More commonly, though, we will
use it as the phase input to table reading.

14.6.2 Table Reader

A table reader object would basically allow a function table to be accessed through
a given index. There are two lookup modes: via raw index (varying from 0 to table
size) or normalised (varying from 0 to 1). There are also two ways to deal with
out-of-range indices:

1. Limiting: keep the phases within the table bounds.


2. Wrap-around: jump back from the ends of the table, implementing effectively a
generalised modulo operation.

Here’s how a skeleton TableRead structure would look like:


216 14 Interpolation

Listing 14.8: TableRead constructor


struct TableRead {
const double *table;
double phs;
bool nrm;
bool wrp;
unsigned int vsize;
double sr;

// constructor
TableRead(const Func &tab, double phase = 0.,
bool norm = true, bool wrap = true,
unsigned int vsz=64,
double sr = 44100.) :
table(tab.table), phs(phase),
nrm(norm), wrp(wrap),
vsize(vsz), sr(esr) { };

// process method taking phase indices


const double *process(const double *ndx);
...
};
In this example, we have also introduced a new built-in type: bool, which can
be 0 or 1, and can also take the constants true or false. They are very useful as
binary switches. In this case, they turn the normalised lookup and wrap-around on
and off, in that order.

14.7 Conclusions

This chapter has introduced, from the perspective of signal processing and audio
programming, the important concept of interpolation, which is not only used in
table lookup oscillators, but alsp, as we will see, in many other contexts. From the
perspective of coding practice, we have introduced the twin ideas of inheritance
and polymorphism, which are very useful to create relationships between types that
emphasise common elements and allow us to reuse code. The advantage of this is
that we can implement a feature only once and in one place, which will benefit code
maintenance, bug fixing and improvement. The mechanism of reference variables
was also discussed, which will allow more transparency and simplicity for passing
arguments by reference rather than copy.
14.7 Conclusions 217

Problems

14.1. Derive a structure from Func that implements a Fourier series-based table, to
allow for waves with any number of harmonics of different amplitudes, and an over-
all phase offset. Write a program to demonstrate its use (with libsndfile or Portaudio
for output).

14.2. Design and implement frequency and amplitude modulation support for the
oscillator structures, maximising code reuse via refactoring.

14.3. Implement a phasor structure to go around the phasor algorithm of Listing


14.7. Write a program to use it to produce a sine wave.

14.4. Implement the three table reader structures for truncation, linear interpolation,
and cubic interpolation methods, using the same principles and layout introduced
for the oscillator cases and using constructor signature as shown in Listing 14.8.
Chapter 15
Envelopes

Abstract This chapter discusses envelopes as an important component of computer


instruments, which allow the shaping of synthesis and processing parameters over
time. Their basic principles are derived from the ideas of interpolation discussed
in the previous chapter. Two fundamental types are explored, linear and exponential
envelopes, and a complete class example is provided to illustrate the discussion. The
chapter also introduces the concept of data hiding and access control. This is com-
plemented by a look at C++ operator overloading. We finish off with an interface
design for a sound output class.

A key component in audio synthesis and processing is the envelope generator.


This implements time functions that can be used to modify parameters such as am-
plitude and frequency. Most of the interesting and musical sounds are never static
over time, and thus we need a way of making them vary. As a minimum require-
ment, we need to be able to shape the amplitude of a tone so that it does not click
when we start and stop it. For this, we define one of many types of functions that
can produce smoothly-changing gain values, which are then applied to the sound.
As these will apply an external, enveloping, form to the signal waveform, we call
them by the generic name envelopes [36].

15.1 Envelope Generators

Envelopes can be drawn using a variety of mathematical formulae. However in order


to simplify their specification, we tend to employ a piecewise approach, i.e. we split
the total time function into segments and generate each curve separately. There are
two fundamental methods that we can use to generate these: linear and exponential.

© Springer Nature Switzerland AG 2019 219


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_15
220 15 Envelopes

15.1.1 Linear Envelopes

A linear segment is created using the exact same first-order polynomial we em-
ployed for interpolated table lookup. In fact, generating the envelope is nothing
more than interpolating between two points. As we may recall, a linear function is
defined by

f (x) = ax + b (15.1)
In this case, we make our time position x vary between 0 and 1, and we define
the coefficients a and b as the linear interval we want to cover and the starting point,
respectively:

a = y 1 − y0
(15.2)
b = y0

where yn are the extreme position values in this segment (counting from 0). The
expression we will use then becomes, with t as the time in samples, and xn the time
in samples corresponding to the value of yn ,
t − x0
f (t) = y0 + (y1 − y0 ) (15.3)
x1 − x 0
For a single segment starting from time 0, the expression simplifies considerably.
We could also use an iterative method where we calculate an increment that is added
to the current output, making the envelope generation very efficient:

y1 − y0
i=
d (15.4)
y(t + 1) = y(t) + i

In this case d = x1 , but, more generally we could calculate d as the segment du-
ration x1 − x0 and subtract x0 from t to offset it. For long, multi-segment envelopes,
it is very important that we hold on to the start position of each segment, instead of
just applying the recursive formula from the start. In other words, we reset the start
of each portion to the previous end position and apply the iteration from there.
With this in mind, we could design a processing function for a single-segment
linear envelope as:

Listing 15.1: Linear envelope.


void generate() {
for (int i = 0; i < vsize; i++) {
s[i] = y;
if (cnt < x1) {
y += incr;
15.1 Envelope Generators 221

cnt++;
}
} else y = y1;
}
Note that once the envelope segment time count (cnt) reaches the required du-
ration, we sustain the last value generated. It will be useful, however to be able to
reset and retrigger the envelope generation, and thus we could have something like
this in an envelope object,

Listing 15.2: Retriggering method.


virtual void retrig() {
cnt = 0;
y = y0;
incr = (y1 - y0) / x1;
}
plus another method to reset parameters to other values, if necessary.

15.1.2 Exponential Envelopes

While linear curves are very simple to generate, they are not perceptually accurate
if employed to control amplitude or frequency. This is because we take notice of
changes in terms of ratios rather than differences. For instance, a jump of 100 Hz
from 100 to 200 Hz is heard as an interval of an octave, which is perceived as the
same change as that from 500 to 1000 Hz or 1000 to 2000 Hz. What matters is
that we have a ratio of 2:1 between these frequencies. Applying a linear envelope
to control frequencies will translate to a non-linear perception of parameter change.
This is also the case with amplitude envelopes, although there is more tolerance for
the use of linear envelopes (especially for onsets) in these applications.
So, in order to address these issues, we can propose an exponential curve gener-
ator as an alternative to the linear function used before. This is defined by

f (x) = ax b (15.5)
As before, the time position x varies between 0 and 1, and the coefficients a and
b are the ratio we want to cover and the starting point, respectively:

y1
a=
y0 (15.6)
b = y0

Some limitations are naturally imposed by this formula: the envelope end point
values cannot be 0 (or smaller), as they will stop the formula working (and, in the
222 15 Envelopes

case of y0 , a 0 leads to a singularity). So we have to protect the envelope from that


by either checking for this condition or adding a very small number to each end
point value.
As with the linear case, it is possible to calculate the envelope efficiently by
employing a multiplier in an iterative process:

 1
y1 d
m=
y0 (15.7)
y(t + 1) = y(t) × m

As we can see, all we have needed to was to transform the value difference into
a ratio and the multiplication into an exponentiation. This gets translated to the
following envelope generator C++ method:

Listing 15.3: Exponential envelope.


void generate() {
for (uint32_t i = 0; i < m_vframes; i++) {
s[i] = y;
if (cnt < x1) {
y *= incr;
cnt++;
} else y = y1;
}
}
At the end of the segment, the envelope sustains its target value. Similar retrigger-
ing and resetting can be implemented for this envelope. The two envelope methods
can be designed to share/reuse code through inheritance. The choice of linear or
exponential curves will depend on the application: exponential envelopes produce
more realistic amplitude decay curves and frequency glides. Onsets may sometimes
sound better with linear segments, and other control signal applications may require
linear changes. A comparison of linear and exponential envelopes is shown in Fig.
15.1, where we should note that both envelopes start at a non-zero point, which, as
we saw, is a requirement for exponential curve calculation.

15.2 Access Control and Classes

In previous chapters, we introduced the idea of protecting parts of our new data
structures by using the concept of read-only parameters and return types, which are
a way of building robustness into our code, a form of defensive programming. We
now want to extend this further by putting forward the idea of data hiding, which is
enabled in C++ by its code access mechanisms.
15.2 Access Control and Classes 223

linear segment

0.5

0.0
0.5
exponential segment

0.5

0.0
0.5

Fig. 15.1: Comparison of linear and exponential envelope segments.

In all of the types we have designed so far, it is possible to freely access any of
their data members, and do whatever we want with them. This is acceptable in a
small software project and might save us some lines of code. In a medium-to-large
complexity project, especially involving more than one programmer, or targeting a
wider use base (e.g. a library), is dangerous. We should attempt to protect our code
from unwanted modification as much as possible.
When designing a new type, we need to be clear about what is the internal repre-
sentation and what is the public interface. As a rule of thumb, we should not expose
the type attributes (its member variables) to direct access, but should regulate it
through a member function. In object-oriented programming, we will have a prolif-
eration of getter/setter methods to provide this interface (of course we do not need,
nor it is desirable, to have a means of accessing all attributes).
The C++ language specification allows for three types of access control in new
types, using specific keywords:

1. private: all members declared following this keyword are only accessible or
visible to methods inside the structure to which they belong.
224 15 Envelopes

2. protected: all members declared following this keyword are only accessi-
ble or visible to methods inside the structure to which they belong, or to any
substructures derived from it.
3. public: all members declared following this keyword are fully accessible from
outside the structure to which they belong.
In addition to this, we can use the friend qualifier to allow other classes or
functions to access private (or protected) code. Structures have all their members
public by default. Another new type specifier in C++ is class, which is used in
the same way as struct but has its members private by default. In fact, the name
class is the more usual term for a type in object-oriented programming:
• A class is a kind-of thing, the model, type, or embodiment of it.
• An object is a thing, an instance of it.
Within this context, all structures (even the C ones used earlier on) are classes.
We have avoided this terminology until now, but we can adopt it more generally
from this point onwards.
In terms of syntax, we have
class T {
// private members
protected:
// protected members
public:
// public members
};
We can use the access declarations in any order, the only rule is that they will
override the access rules defined before them and act on any members defined after
them. In the case of derived classes, the following applies:
• class X : private Y – all public and protected base class X members
become private members in the subclass Y .
• class X : protected Y – all public and protected base X class members
become protected members in the subclass Y.
• class X : public Y – all protected base class X members become pro-
tected members in the subclass, and all public members are also public in the
subclass Y.
Classes defined with the class keyword have private inheritance by default and
those defined using struct use public inheritance if this is not specified.

15.2.1 Namespaces

Another mechanism in C++ that allows more robustness in symbol naming is the
principle of namespaces. This is mostly used to prevent name clashes and to help
15.2 Access Control and Classes 225

programmers make sure that the function, class, etc. that is being used is the correct
one. Namespaces are defined using the keyword namespace and can apply to a
range of declarations by enclosing these inside a block:
namespace mine {
void f(int i);
const int d = 1;
class T {
...
};
}
To access names defined in a namespace, we can use a qualified name,

namespace :: name

For example:
mine::f(mine::d);
mine::T obj;
We can also employ the using statement,
using namespace mine;
to import the namespace fully into the current context (which can be a file, function,
etc.). A very common namespace we will see in many examples is std, which
identifies symbols from the standard C++ library (see, for instance, Sect. 15.3.1).

15.2.2 A Line Class

Following these principles, we now give an example of a desirable access control


for one of the signal processing classes we are considering in this chapter. A Line
class, modelling the one-segment linear envelope can be designed as follows:

Listing 15.4: Linear envelope class


#include <cstdint>

class Line {

protected:
double m_y;
double m_y0;
double m_y1;
uint32_t m_x1;
double m_incr;
226 15 Envelopes

uint64_t m_cnt;
uint32_t m_vframes;
double *m_vector;
double m_sr;

/** process the output vector


*/
virtual void generate() {
for (uint32_t i = 0; i < m_vframes; i++) {
m_vector[i] = m_y;
if (m_cnt < m_x1) {
m_y += m_incr;
m_cnt++;
} else m_y = m_y1;
}
}

/** set the increment


*/
virtual void update() {
m_incr = (m_y1 - m_y0) / m_x1;
}

public:
/** Line constructor \n\n
start - start value \n
end - end value \n
time - duration(s) \n
vframes - vector size \n
sr - sampling rate
* /
Line(double start = .0, double end = 1.,
double time = 1.,uint32_t vframes = 64,
double sr = 44100.)
: m_y(start), m_y0(start), m_y1(end),
m_x1(time * sr), m_incr((end-start)/(time*sr)),
m_cnt(0), m_vframes(vframes),
m_vector(new double[vframes]),
m_sr(sr) {};

virtual ∼Line() {
delete[] m_vector;
}

/** process and return the output vector


15.2 Access Control and Classes 227

as a read-only array.
*/
const double *process() {
generate();
return m_vector;
}

/** retrigger
*/
void retrig() {
m_cnt = 0;
m_y = m_y0;
update();
}

/** reset and retrigger


*/
void reset(double start, double end, double time) {
m_y0 = start;
m_y1 = end;
m_x1 = time * m_sr;
retrig();
}
};
Note that we have a clear access control, separating the hidden (protected) mem-
bers and the public interface that can modify it (nothing else can). The choice of
protected instead private members is made to allow derived classes to be built upon
this, reusing code as much as possible. We have nominated overridable methods
very clearly based on where we see scope for specialisation: in line generation and
in increment update. Making these protected allows us to have a well-defined fixed
interface but with the option of specialising the signal processing operations inter-
nally.
Another style matter is the choice to prefix each member variable with an m_ so
we can clearly see what is local to the function or to the object as a whole. Finally,
we are being somewhat more definite about the numeric types we are using. In the
<cstdint> header, which we have already seen in Chapter 2, there are a number
of short-hand type definitions for integers, in terms of signedness and size, which
we are taking advantage of here. Anything that is clearly never going to be negative
will use an unsigned type. For variables that may need an extra range, we also make
them 64-bit wide.
228 15 Envelopes

15.3 Operator Overloading

Since we are on the path to creating fully-fledged new types, we should try to make
them behave as much as possible like built-in ones (as these, on the other hand, are
all considered as classes as well). The compiler provides some support for simple
operations such as copying (assigning) objects. However, for many types of manip-
ulation, we will need to define them explicitly. We might wonder, for instance, what
the meaning of standard language operators (such as, for instance, the arithmetic
ones) is when used with an object of a given class. The answer is, of course, that it
is up to us to define this by overloading the operator for our new type.
The way to go about it is to declare a public method named using the following
syntax:

return-type operator op ( arguments )

where op is the operator we want to overload. Here is a trivial example,


Listing 15.5: Overloading arithmetic operators
class MyInt {
int val;
public:
MyInt(int x) : val(x) { };
const MyInt &operator+=(const MyInt &y) {
val += y.val;
return *this;
}
MyInt operator+(const MyInt &y) {
MyInt x(*this);
x += y;
return x;
}
};
This class overloads the binary addition (+) and the unary increment (+=) opera-
tors. Note the use of the this pointer. This is an expression containing the address
of an object which allows a pointer to itself for self-reference applications. With it,
in the addition operator, we create a local object as a self copy and return it by value.
In the increment operator, we use it to return a constant reference to itself so that we
can chain operations.
With this class, as defined above, we can write the following code:
MyInt a(1), b(1), c(0);
c = a + b;
Various other operators can be overloaded and we will see how we can use this
mechanism to our benefit to allow for some easy-to-use syntax with signal process-
ing objects. Depending on the class, we may need to provide assignment operators
15.3 Operator Overloading 229

as well, since the compiler-generated one is sometimes not suitable. Classes that
allocate external resources (such as the one in Listing 15.4) are among these, as
the copy operator will need to make sure these are dealt with properly. However,
we will actively avoid these types of operations, which in most cases involve non
realtime-safe code. If copying an object requires, for instance, that memory is freed
and re-allocated, this is not to be done in realtime-critical sections, where audio
computation is performed. In the examples above, the unary increment is generally
safe, as we are only manipulating references. However, the binary addition is not:
its use may lead to copying of data that might be problematic in a realtime audio
context.

15.3.1 Standard IO Revisited

At this point, it is useful to revisit standard IO processes to see if there are other
ways that we can do this in a more object-oriented way. The standard C++ library
has a number of facilities that provide an object-oriented interface to common IO
operations.
The iostream classes in the library model various ways in which input and output
streams can be handled. In the particular case of standard IO, the standard C++
library provides three objects of these classes to facilitate the process defined in the
iostream header:
• std::cout – standard output, equivalent to stdout,
• std::cin – standard input, equivalent to stdin
• std::cerr – standard err, equivalent to stderr
The classes std::istream (input streams) and std::ostream (output)
overload operator>> and operator<<, respectively, which are used to input
and output data. Formatted IO is provided through a series of overloaded opera-
tors of these two kinds for various types. For instance, to access stdout, if we use
std::cout:
cout << "Live Long and Prosper.\n";
we can concatenate various objects to put them into the stream:
double a = 2.;
cout << "this is a constant: " << 32
<< "and the value of a var: " << a << '\n';
Note how we have employed a mixture of strings, numeric and character con-
stants and a variable to build a formatted stream that is sent to stdout. As another
example, we could use std::cout to write a simple test program for the Line
class in Listing 15.4:
#include <iostream>
230 15 Envelopes

int main() {
Line line;
for(int i=0; i < 44100; i+=64)
std::cout << line.process()[0] << "\n";
return 0;
}
For input, to get data from stdin we can use the other operator:
double a;
cin >> a;
Note that we can provide an overloaded operator for these stream operations for
our new types. This is done through the mechanism of friend functions. These are
free functions1 that are granted direct access to private or protected data in a class.
An operator can be given this access to allow the class data to be put in a stream.
Adding to the example in listing 15.5, we have:

Listing 15.6: Overloading stream operators


class MyInt {
...
friend std::ostream
&operator<<(std::ostream &os, const MyInt &i) {
return (os << i.val);
}
friend std::istream
&operator>>(std::istream &is, MyInt &i) {
return (is >> i.val);
}
};
allowing the MyInt class to interact with iostream objects:
MyInt a(1), b(1), c(0);
cin >> a;
cin >> b;
c = a + b;
cout << c << "\n";

15.4 An Audio Output Class

The ideas we have discussed so far can be applied to provide us with an interface for
audio output. We would be able to design a generic class that can be implemented
with a range of backends (e.g. libsndfile for soundfiles, Portaudio, Jack, or another
1 In C++, free functions are those defined outside a class, i.e. not belonging to any particular object.
15.4 An Audio Output Class 231

similar library for realtime audio, and so on). As long as we have our interface
correctly set up, the implementation should be straightforward2 .
The typical attributes we should be expecting for a sound output class are the
sampling rate, number of channels, processing vector size and buffer size. A buffer
memory should be allocated to accumulate the data before we send it to its destina-
tion. A handle will also be needed to refer to this destination (whether it is an open
file or a device) and we will need a counting variable, as well as a total frame count.
If we have different types of output, we should also include a mode switch.
In this particular design, the class constructor would take care of any initiali-
sation needed, opening devices or files, and setting up all the necessary elements
to stream audio out. The destructor would, among other things, close any streams
and/or devices, and do any required de-initialisation.
The basic operational method for this class would take a block of audio frames
and write it to the buffer (write()), invoking the destination write function once
that is full. The class declaration for SoundOut is shown in Listing 15.7.
Listing 15.7: Audio output class
class SoundOut {
double m_sr;
uint32_t m_nchnls;
double m_vsize;
const char *m_dest;
uint32_t m_mode;
uint32_t m_cnt;
uint32_t m_framecnt;
void *m_buffer;
uint32_t m_bsize;
void *m_handle;

public:
/** SoundOut constructor
dest - output destination
nchnls - number of channels
sr - sampling rate
vsize - vector size
bsize - IO buffer size
*/
SoundOut(const char *dest,
uint32_t nchnls = 1,
double sr = 44100.,
uint32_t vsize = 64,
uint32_t bsize = 1024);

/** SoundOut destructor


2 Such an implementation is provided, for instance, in AuLib, which is discussed in Chapter 17.
232 15 Envelopes

*/
∼SoundOut();

/** Writes sig (vsize frames) to the output


destination. Returns number of samples
written, or an error code.
*/
uint32_t write(const double *sig);
};
Note that the interface defined in Listing 15.7 is general enough to allow for
various backend implementations, and is not dependent on any particular audio IO
library. As an example, we would use it as follows ("dac" interpreted as realtime
audio output):
int main() {
const unsigned int size = 10000;
const double *out;
SinTab tab(size);
Osc osc(0.5, 440., tab);
SoundOut("dac");

for(int i = 0; i < osc.sr; i+=osc.vsize)


out.write(osc.process());
return 0;
}

15.5 Conclusions

In this chapter, we have introduced the principle of linear and exponential envelope
generators, alongside several object-oriented concepts and their realisation in C++.
At this stage, we have amassed a handful of audio DSP classes, a collection that is
already starting to look like a small code library (oscillators, phasor, table readers,
envelopers, table generators, sound output). The idea of designing such a library
a little more thoroughly using established C++ standards will start to become a
central preoccupation for us in the following chapters, as we progress through the
algorithms and object-oriented programming.

Problems

15.1. Declare and implement a one-segment exponential envelope class, derived


from Line. Write a simple program demonstrating its use.
15.5 Conclusions 233

15.2. A common type of multi-segment envelope generator is the attack-decay-


sustain-release (ADSR) generator, which is a four-stage envelope using linear
curves. The first stage starts from 0 and leads to the maximum amplitude, followed
by a decay segment from the maximum to the sustain amplitude. The sustain pe-
riod holds the amplitude steady until the release is triggered. The trigger for the
release stage will make the envelope jump to release from any of the earlier stages.
In this final segment, the amplitude moves from whatever value it had at the trigger
time to 0. Design and implement an ADSR envelope class. Write a simple program
demonstrating its use.

15.3. Implement the audio output class using a backend of your choice (libsndfile,
Portaudio, Jack, standard IO).
Chapter 16
Filters

Abstract In this chapter we look at how filters are constructed, concentrating on in-
finite impulse response types, typically found in computer music applications. After
an introduction to the main signal processing aspects of filters, we explore the imple-
mentation of first-, second-, and fourth-order filters. From the programming side, we
introduce the concept of templates, which offer compile-time support for polymor-
phism. This leads into a discussion of the standard C++ library, and, in particular,
of container classes provided by it, which will be very useful in the development of
an object-oriented library for audio processing.

So far, we have been able to generate audio signals, modify their amplitude and
frequency using envelopes, and choose what kind of waveform source we want for
this. We can use frequency modulation to produce time-varying timbres [36]. It is
also possible to put together a large set of sinusoidal oscillators, each one modelling
a separate partial, to achieve a similar level of timbral manipulation. In this chapter,
we will introduce a third means of processing waveforms, through filters.
Filters are modifiers that can be used to shape the spectrum of an input sound
in terms of the amplitudes and phases of its constituent partials. Their effect on
amplitudes at various frequencies is called the amplitude response, while the various
delays they can add to signals at various frequencies are collectively called the phase
response.
Digital filters are implemented using a mix of direct and delayed signals. They
can be classified in terms of [36]:
1. Their effect on an input:
• Low pass (LP): cuts (reduces) the amplitude of components above a certain
frequency, called the cutoff frequency, and are also known as high cut. The
region above the cutoff frequency is called the stop band.
• High pass (HP): cuts the amplitudes below the cutoff frequency (also called
low cut). The stop band is therefore below this.
• Band pass (BP): passes a given band around a centre frequency. The stop-band
regions are outside this band.

© Springer Nature Switzerland AG 2019 235


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_16
236 16 Filters

• Band reject (BR): cuts a given band around a centre frequency.


• All pass (AP): passes all frequencies unaltered in amplitude, but affects the
phases of an input signal (that is, adding delays to it).
2. Their order:
• First order: employs delays of only one sample.
• Second order: employs delays of up to two samples.
• Higher orders: employs a longer delay or delays, of several samples.
3. Their structure:
• Feedforward: uses a combination of an input and its delayed copies; also
known as finite impulse response (FIR).
• Feedback: uses a combination of an input and delayed copies of its own output
(i.e. it is recursive), and possibly input delays as well; also known as infinite
impulse response (IIR).

While FIR filters are stable and can be designed to have attractive features such
as a linear phase response, where the same amount of delay is applied to all fre-
quencies, they are not very flexible for many musical applications. In this chapter,
we will concentrate on the implementation of IIR filters, leaving the discussion of
their feedforward counterparts to Chapter 18.

16.1 Feedback Filters

Feedback filters come in various forms, and are usually packaged in such a way
that we can control and modify their actions using standard parameters such as
frequencies and bandwidths. We will tend to define them in terms of first or second-
order sections, but higher orders can be achieved by serial connections. Each filter
design will have different characteristics and applications. We will start by looking
at the simplest of them, the low or high-pass tone controls, then move on to second-
order designs and complete the discussion with fourth-order resonant filters.

16.1.1 First-Order Tone Filters

These are filters with very smooth and gentle amplitude response curves. They will
tend to shape an input sound in a very light way, and can be likened to tone controls
in consumer hi-fi equipment. Their structure combines an input signal with a one-
sample delay (hence they are first-order filters) of their output. We can define these
filters with the following equation:

y(t) = ax(t) − by(t − 1) (16.1)


16.1 Feedback Filters 237

where x(t) and y(t) are the filter input and output signals, respectively, and y(t − 1)
is the output delayed by one sample (Fig. 16.1).

a
?
x(t) - ×n- +n - y(t)
6
−b - ×n 1-sample delay 

Fig. 16.1: First-order tone filter flowchart.

The filter will assume a low-pass or a high-pass amplitude response depending


on the coefficients a and b. For a low-pass filter, we have

 
f
r = 2 − cos 2π
fs
 (16.2)
b = r2 − 1 − r
a = (1 + b)

where f is the cutoff frequency and fs is the sampling rate. It is very easy to flip the
filter into high-pass mode by modifying b:

 
f
r = 2 + cos 2π
fs
 (16.3)
b = r − r2 − 1
a = (1 + b)

These filters can be implemented with a common engine or kernel that would
look like this:
virtual const double *filter(const double *sig) {
for (uint32_t i = 0; i < m_vframes; i++) {
m_del = m_a * sig[i] - m_b * m_del;
m_vector[i] = m_del;
}
return m_vector;
}
238 16 Filters

A processing function would then invoke this kernel following the appropriate
setting of low-pass or high-pass values for each coefficient. These do not need to be
updated at each sample (unless some sort of audio-rate modulation is required), and
in fact only need to be computed if the cutoff frequency has changed. So, we can
track its latest value and then decide whether we need to recalculate a and b, using
the following functions:
void ToneLP::update() {
double costh = 2. - cos(2. * pi * m_freq / m_sr);
m_b = sqrt(costh * costh - 1.) - costh;
m_a = (1. + m_b);
}

void ToneHP::update() {
double costh = 2. + cos(2. * pi * m_freq / m_sr);
m_b = costh - sqrt(costh * costh - 1.);
m_a = (1. + m_b);
}
Low-pass tone filters can also be used for smoothing control signals. One such
application is in the computation of RMS (root-mean-square) estimates of a signal.
The principle behind this is that the LP filter performs an averaging of an input,
which would be equivalent to the type of operation employed in the ‘mean’ part of
the RMS method. The root and square elements can be replaced by taking the abso-
lute value off the signal (rectification) before filtering. We could re-implement the
filter kernel to include a rectification operation through an inline function rect()
so that the modified tone LP design can be used directly as an RMS estimator:
virtual const double *filter(const double *sig) {
for (uint32_t i = 0; i < m_vframes; i++)
m_vector[i] = m_del = m_a*rect(sig[i]) - m_b*m_del;
return m_vector;
}

16.1.2 Second-Order Filters

First-order filters exhibit an amplitude rolloff of −6 dB per octave in their stop


band, whereas second-order filters are more selective, with a steeper rolloff of -12
dB. Another feature of these filters is that it is possible to define both band-pass and
band-reject amplitude response curves (which is not possible in first-order designs).
As we have mentioned above, these filters feature two-sample delays, and may also
include feedforward signal paths.
The simplest second-order design is called the resonator, which is defined by the
following equation:
16.1 Feedback Filters 239

y(t) = ax(t) − b1 y(t − 1) − b2 y(t − 2) (16.4)


The coefficients b1 and b2 are determined from the basic filter parameters, the
centre frequency fc and bandwidth B:

 
π
R = exp B
fs
 
4R2 fc (16.5)
b1 = − cos 2π
1 + R2 fs
b2 = R 2

The coefficient a is an input scaling gain that is used to keep the filter under
control. It prevents the filter amplitude from increasing out of control when the
bandwidth is very small (high resonance) [61]:
  
b1
a = (1 − R2 ) sin cos−1 (16.6)
2R
This scaling can be left out if we have another means of controlling the filter
output amplitude (e.g. via balancing, which we will discuss later in this chapter).
The filter kernel can be implemented as:
const double *Reson::filter(const double *sig) {
double y;
for (uint32_t i = 0; i < m_vframes; i++) {
y = sig[i] * m_scal - m_b[0] * m_del[0]
- m_b[1] * m_del[1];
m_del[1] = m_del[0];
m_vector[i] = m_del[0] = y;
}
return m_vector;
}
Resonators can be slightly modified by adding a two-sample delay feedforward
path (Fig. 16.2):

y(t) = ax(t) + a2 x(t − 2) − b1 y(t − 1) − b2 y(t − 2) (16.7)


In order for us to implement this filter, we can rearrange it as a pair of equations,
which allows the delays to be shared between the feedback and feedforward paths.
The first one of them is the original feedback filter, and the second implements the
feedforward section:

w(t) = ax(t) − b1 w(t − 1) − b2 w(t − 2)


(16.8)
y(t) = w(t) + a2 w(t − 2)
240 16 Filters

a
?
x(t) - ×i - +i - +i - y(t)
6 i
×6 −b2
−b1 - ×i
 1-s delay 61-s delay 

Fig. 16.2: Resonator filter flowchart.

For the coefficient b2 , we have the choice of two values: −R or −1. The re-
arranged filter kernel has the following inner loop:
for (uint32_t i = 0; i < m_vframes; i++) {
w = sig[i] * m_scal - m_b[0] * m_del[0]
- m_b[1] * m_del[1];
y = w + m_a[2] * m_del[1];
m_del[1] = m_del[0];
m_vector[i] = m_del[0] = w;
}
More generally, we can talk of a second-order filter section, which includes both
feedback and feedforward delays, with five coefficients, for each one of the delays
and for the direct signal. With this in place, we can determine the values of the
coefficients for different types of filter package. The second-order section equations
are:

w(t) = x(t) − b1 w(t − 1) − b2 w(t − 2)


(16.9)
y(t) = a0 w(t) + a1 w(t − 1) + a2 w(t − 2)

The filter structure denoted by these equations is known as direct form II (DF
II). Alternatively, we have DF I, which uses separate delays for the feedback and
feedforward paths, in a single expression:

y(t) = a0 x(t) + a1 x(t − 1) + a2 x(t − 2) − b1 y(t − 1) − b2 y(t − 2) (16.10)

The two forms are generally equivalent (within a certain numeric range), and so
we normally implement the DF II version as it slightly more economical. For this,
we only need to modify the alternative resonator implementation slightly by adding
one extra term in the second equation and the a0 coefficient, and removing the input
scaling:
for (uint32_t i = 0; i < m_vframes; i++) {
16.1 Feedback Filters 241

w = sig[i] - m_b[0] * m_del[0]


- m_b[1] * m_del[1];
y = m_a[0] *w + m_a[1] * m_del[0]
+ m_a[2] * m_del[1];
m_del[1] = m_del[0];
m_vector[i] = m_del[0] = w;
}
With this filter kernel, we can provide several filter recipes to realise different
types of curves, low-pass, high-pass, band-pass, band-reject, and all-pass. For ex-
ample, a family of designs that is used widely in music applications is given by
digital versions of the classic analogue Butterworth filters. The coefficients for the
  −1
various response curves for these are shown in Table 16.1, with L = tan π ffs
  −1
and M = tan π Bfs [15].

Table 16.1: Butterworth filter coefficients.

coefficient LP HP BP BR
  −1   −1
a0 1 + (2)L + L 2 1 + (2)L + L2 (1 + M)−1 (1 + M)−1
 
a1 2a0 −2a0 0 −2 cos 2π ffs a0
a2 a0 a0 −a
0  a0
b1 2(1 − L2 )a0 2(L2 − 1)a0 −2M cos 2π ffs a0 a1
     
b2 1 − (2)L + L2 a0 1 − (2)L + L2 a0 (M − 1)a0 (1 − M)a0

16.1.3 Fourth-Order Filters

A fourth-order filter will have a stop-band rolloff of -24 dB/octave. It is possible to


construct such filters by connecting two second-order sections in series. Indeed, any
higher-order filter can be achieved in this way. One typical design for fourth-order
structures is the low-pass resonating filter, which is made up of a series of four first-
order IIR sections with a feedback path connecting the output of the filter back into
its input [36]. This can be used to emulate, for instance, the classic ladder filters
employed in analogue synthesisers.
For example, the following equations define a fourth-order resonant low-pass
filter [20]:
242 16 Filters
    
x(t) − 4ry4 (t − 1) y1 (t − 1)
y1 (t) = y1 (t − 1) +V g tanh − tanh
V V
    
y1 (t) y2 (t − 1)
y2 (t) = y2 (t − 1) +V g tanh − tanh
V V
     (16.11)
y2 (t) y3 (t − 1)
y3 (t) = y3 (t − 1) +V g tanh − tanh
V V
    
y3 (t) y4 (t − 1)
y4 (t) = y4 (t − 1) +V g tanh − tanh
V V

where the constant V is determined by the physical characteristics


 ofthe system (it
refers to the thermal voltage of transistors), and g = 1 − exp − 2πfs f . This filter is
computationally more complex than the previous ones we have seen, given the non-
linear distortion terms using the tanh function (which acts as a soft overload limiter).
Not that we can reduce the number of calls to this function if we cache (e.g. place
in memory) the repeated terms in Eq. 16.11. Also, to provide better stability, a first-
order averaging FIR can be placed in the feedback path between the output and input
of the filter. This filter is given by the simple equation

x(t) + x(t − 1)
y(t) = (16.12)
2
A number of variations on this basic fourth-order design exist, for instance using
other means of non-linear distortion in the signal path, but preserving the general
structure of four first-order filters in series.

16.1.4 Balancing

Filters can dramatically change the amplitude of an input sound, either by reducing
or by increasing it. This can happen for instance, if we have a bank of filters con-
sisting of a series connection, where each one might reduce further the amplitude of
parts of the spectrum that are already very soft. Or we might run the resonator with-
out input scaling and squeeze the bandwidth to a point where the output overloads.
For these applications (and others) we might want to have a way of balancing the
output amplitude according to a comparator signal (e.g. the pre-filter audio). This
can be easily achieved with a pair of RMS estimators, as follows:

RMS(c(t))
y(t) = x(t) (16.13)
RMS(x(t))
where RMS() is the estimator and c(t) a comparator feed. The only consideration
here is that we have to protect this operation from the case where the RMS amplitude
of x(t) is 0, to avoid a singularity. This can be done by a check or by adding a
very small value to the denominator. The balance operation can also be used as
16.2 Templates 243

an envelope follower, as it will apply the extracted time-varying amplitude of one


sound into another.

16.2 Templates

Now we turn back to C++ to look at some new features that will be useful in our
programming of DSP operations. In order to facilitate further code reuse, in the spirit
of object-oriented programming, the language allows us to create families of entities
(types or functions) from a single prescription. This is done through templates, as
follows:
1. A template is defined using the keyword template followed by a parameter
list and the template body:

template< parameter-list > definition ;

The parameter list can be made of types (classes etc) and non-types (e.g. vari-
ables). The former are declared by the keywords typename or class and the
latter by the variable type (which can be, for instance, an integral, pointer, or
reference type).
template<typename T> class X { T var; }
2. A template is instantiated by passing arguments to match each one of the tem-
plate parameters:

template-name< parameter-list > name;

X<int> a;

Templates are commonly used to define classes that are similar in structure but
depend on different types. For example, we could define a class that will hold an
array of an arbitrary type. For this, we need a type and a non-type parameter, to
define the array basic cell and how many of them we want:
template<typename T, uint64_t N> class MyArray {
T data[N];
public:
MyArray(T init) {
for(uint32_t i; i < N; i++)
data[i] = init;
}
const T& operator[] (uint64_t n) const {
return data[n]; }
T &operator[] (uint64_t n){
244 16 Filters

return data[n]; }
};
This template provides the internal data storage (private data) with a construc-
tor/initialiser, and the [] operator to access each member in the usual way. Note
that there are two operators defined: one for read access (which returns a const T
reference and is marked const, telling the compiler it does not modify the object it
belongs to) and another for writing (which returns a reference to a memory location
whose contents can be modified):
int main(){
MyArray<int,10> f(0);
f[0] = 1;
cout << f[0] << "\n";
return 0;
}
Note that we can also use any user defined type, such as for instance one of the
classes we designed earlier on:
MyArray<Osci,2> fm(0.5, 440, sine);
...
fm[1].process(amp, fm[0].process(ndx));
It is also possible to define function templates, which can be applied to arbitrary
types, generating a family of functions:
template <typename T>
void message(T a) {
std::cout << "Message: " << a << std::endl;
}
Function templates need to be instantiated before use. This can be done explicitly,
e.g,
template void message<const char*>(const char*);
or, in many cases, this can be done implicitly at the time the function is called:
message("hello");
where the type is inferred directly from the argument, and therefore we do not need
to supply it. It is very common for template functions to be instantiated implicitly,

16.2.1 Templates in the Standard C++ Library

The standard C++ library provides an extensive collection of templates for all sorts
of applications (which also includes the previously-seen iostream classes). These
can be accessed by including the relevant headers, as well as the namespace qualifier
16.2 Templates 245

std. As part of this library, we have a number of container template classes that can
be used very conveniently to create dynamically-allocated objects of arbitrary types.
The advantage of using these is that we will eliminate the need to manage memory
directly, by delegating this task to the container class. Objects of these types will
take care of allocating and deallocating memory automatically as they go in and out
of scope.
Using standard library containers allows us to avoid the need for new and
delete, and, as a consequence, we will not in general be required to define de-
structors for our classes. This is particularly important as there are some complex
issues associated with these, which we have avoided discussing so far. It is gener-
ally accepted that the presence of a destructor performing a non-trivial task (such
as freeing resources) also requires the explicit definition of other special member
functions. These are: a copy constructor (which allows classes to be copied prop-
erly, Sect. 14.5.1); a copy assignment operator (which allows classes to be assigned,
Sect. 15.3); and a move constructor and assignment operator (both of which optimise
copy/assignment operations1 ).
We have already noted in Sect. 15.3 that if a class holds external resources, we
would need to define an assignment operator for it to handle these properly. In fact,
the explicit definition of any of these five methods requires that the full complement
should be given [63], since it implies that their compiler-generated versions are not
suitable for the class design. By taking them out of our class definition, we can safely
ignore this issue, which will reduce the complexity of the code structures we will be
using. The added benefit is that we can concentrate more fully on the algorithms.
The fundamental container template we need to know about is std::vector
(whose definition is found in <vector>). This is a wrapper around a dynami-
cally allocated array (very much in the direction of the array template example, but
more complete and flexible). It should perform nearly as efficiently as an ordinary
dynamically-allocated C array, so we will be able to replace all our audio data vec-
tors with it. The template takes a type argument and the class constructor a size
(which can be a variable), or an initialiser expression (inside brackets { }):
#include <vector>
...
std::vector<int> data(size);
For instance, from now on, we should declare
class Proc {
protected:
...
std::vector<double> m_vector
public:
Proc(..., uint32_t vframes, ...) : ...,

1 The move operation was introduced in C++11 to allow better performance when copying non-
trivial classes. It is beyond the scope of this text to discuss it. Readers are referred to [63] for a
complete description.
246 16 Filters

m_vector(vframes);
...
};
and forget about the destructor, as we will not need it any more. The vector object
will take care of all the allocation and deallocation of resources behind the scenes.
The vector class has a number of important methods:

• operator=: assigns a vector to another vector.


• assign(): assigns a value to an element.
• at(): accesses specified element with bounds checking .
• operator[]: accesses specified element without bounds checking.
• front(): first element.
• back(): last element.
• data(): returns a pointer to the data vector.
• size(): returns the number of elements.
• clear(): clears the vector.
• resize(): resizes the vector (which may involve reallocation).

Once we have a vector defined this way, we can treat it more or less like any ar-
ray. This is because it includes a square-braces operator (operator[]()), which
allows us access through the usual array index symtax. Therefore we do not need to
modify much of the code we have been using so far in order to use a vector object:
it is almost a drop-in replacement.
Vectors also provide iterator members, which can be used to traverse the con-
tainer. The following methods return iterators to an object:
iterator begin();
const_iterator begin() const;
and
iterator end();
const_iterator end() const;
An iterator can be used to walk through the array and access it through derefer-
encing:
int main(){
std::vector<int> v{1,2,3,4,5,6,7,8,9,10};
for(std::vector<int>::iterator i = v.begin();
i < v.end(); i++)
cout << *i << "\n";
return 0;
}
As you can see, the iterators in this case wrap up pointers to the underlying data
type, pointing to the beginning and end of the vector. For some applications, these
are not necessary (we could use just as well a counting variable and the vector size),
16.3 Conclusions 247

but for others, they will be useful. For instance, to copy memory, we will now use
std::copy:
std::copy(src.begin(), src.end(), dest.begin());
and, to fill in an array, we have
std::fill(dest.begin(), dest.end(), value);
Both operations are defined in the header file <algorithms>, which also con-
tains a number of other useful functions.

16.2.2 Range-Based Loops

For containers of this kind, which include iterators, C++ provides a variation on the
for loop syntax called a ranged-based for. This is defined as:

for ( range-declaration : range-expression ) body

The range declaration provides a suitable local variable, which will be set to
the various elements of the range expression. This needs to be a suitable object
that provides begin() and end() methods, such as one from a vector class, or
alternatively an array (but not one that is dynamically allocated, since the compiler
needs to know the range of objects to iterate over). For example,
int main(){
std::vector<int> v{1,2,3,4,5,6,7,8,9,10};
for(int i : v) std::cout << i << "\n";
return 0;
}

16.3 Conclusions

In this chapter, we have introduced the idea of filters and shown how they can be
implemented, with a number of examples of different types. These DSP operations
are central to sound synthesis and processing, so it is very important to be able
to understand how they work and how they are programmed. Following this, we
looked at further ideas from C++ that will enhance our capacity to program in an
object-oriented way. Templates can be very useful for a number of applications.
In particular, they feature very strongly in the standard C++ library. From this, we
explored a very useful container, the vector class, and demonstrated how it can be
used to simplify memory management for our classes. From now on, we can avoid
having to allocate memory directly and can use standard library containers to pro-
vide support for this. At this point, we should be looking at refactoring completely
248 16 Filters

our existing code library to take advantage of these new ideas and provide robust
support for DSP programs. This task will be taken up in the next chapter.

Problems

16.1. Draw a flowchart for second-order DF I and DF II filters (Eqs. 16.10 and 16.9)
and for a fourth-order low pass filter (Eq. 16.11), using the same format as in Figs.
16.1 and 16.2.

16.2. Implement the ToneHP, RMS, and ToneLP classes using the snippets of code
provided. Make two of them derive from the other.

16.3. Reimplement the oscillator classes using std::vector. Write a program to


test these.
Chapter 17
AuLib

Abstract This chapter explores the design of a class library for computer music in-
strument development. We review some of the principles outlined in earlier chapters
and explore the object-oriented principles that are relevant to the implementation of
this library. A tour of the existing classes is offered, alongside a fully worked-out
application example.

At this stage, we are just about ready to develop a library of classes for audio
processing and synthesis, which we will call AuLib [35]. In this chapter, we will
look at a design that will take advantage of the ideas sketched in the preceding
discussions of object-oriented programming and signal processing components. We
will take some time to reflect on the ideas exposed earlier and summarise them to
provide the principles for the layout of this class library.
Similar work has been explored in earlier projects with cognate aims. The Sound
Object (SndObj) library [38] was released in 1998 as one of the first generally avail-
able free and open-source general-purpose C++ class libraries for audio processing.
The original SndObj code was mostly based on pre-standard C++ [62], evolving
later to include other developments from C++98 and C++03. Another early C++
toolkit was the Synthesis Toolkit (STK) [11], which was mostly oriented to synthe-
sis with physical models. The SndObj library included not only signal processing
classes but also support for cross-platform realtime audio and MIDI, before this was
provided by dedicated libraries such as Jack, Portaudio, and Portmdi. The project
aimed to encompass all the general-purpose audio use cases [32] in the time and
frequency domains [64]. In the following decades since these early projects, many
other C++ object-oriented audio projects were developed, as the language became
firmly established as the pre-eminent platform for mid-to-low level sound and music
computing.
For the reasons already implied in previous chapters, the OOP paradigm is very
useful as the fundamental model for a DSP library design. Function-only libraries,
such as the one explored in [31] are useful insofar as they expose the algorithms
in a simple form, in which they can be studied and played with. However, such an
approach is not robust enough to be used more generally.

© Springer Nature Switzerland AG 2019 249


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_17
250 17 AuLib

With the advent of C++11 [23], the language has reached a stable state, support-
ing a flexible approach to audio programming. In comparison to the C language,
C++ is still a system of large proportions, lacking some of the qualities of small
languages, such as lightness, simplicity, and, in the case of C, ubiquitous portabil-
ity (although it is present on a great number of platforms, it is not as universally
supported by devices). C++ has been described as having “the power, the elegance,
and the simplicity of a hand grenade”1 . These caveats notwithstanding, we should
be able to proceed with this language as our mainstay language for musical signal
processing.
As mentioned above, the previous chapters in Part II of this book provide a good
background to the development of AuLib. The main motivation is to create a sim-
ple, lightweight platform for the study of, and research into audio DSP algorithms,
taking advantage of the newer, more established, C++ standards. This will also give
us the added benefit of developing efficient code that can be easily packaged and
deployed in general-purpose applications. After all, we have to ensure that the pro-
grams we write can be realistically applied.
Within the scope of this book, the fundamental aim is to provide wrappers for
algorithms, where the audio processing code can be easily accessed, studied, and
modified. By using a thin interface layer, we will attempt to support easy connec-
tivity of objects, as well as a class hierarchy that emphasises common components
and code reuse. The library code attempts to adhere strictly to C++14 [25] standards
and best practice, as it aims to provide an example of robust software design.
In this chapter, we will present the library design within the context of audio
programming systems in general. We will outline the decisions taken in its devel-
opment, some of which are a reinforcement of ideas we have already rehearsed
in earlier chapters. Following this, we will take a tour of the library and its cur-
rent constituent classes. We will explore it from the perspective of signal gen-
eration, processing, and input and output. The source code is available from the
repository at https://github.com/aulib/aulib. A developer’s reference is provided at
http://aulib.github.io. Full programming examples at the end of this chapter and in
Appendix A illustrate some typical applications.

17.1 Object-Oriented Audio Systems

It can be claimed that some form of object-oriented programming has been present
in sound and music computing since the very beginning [34, 52, 55]. One way to
look at the pioneering work by Max Mathews in MUSIC III [41, 42] and IV [43] is to
say that he was designing incipient object-oriented systems for music programming.
In those systems, it is possible to liken the concept of instruments and their instances
to classes and objects. These principles hace remained and evolved in some shape
or form in all of their direct successors, MUSIC V [44], MUSIC 11 [65], cmusic

1 Quote attributed to Kenneth C. Dyke, 5 April 1997.


17.2 Library Design 251

[48], MUSIC 4C [4], and Csound [39], as well as in their indirect successors such
as Pure Data [54].
It is interesting to note that even programming systems that support other
paradigms, such as functional programming in FAUST [50], have backends writ-
ten with, or are compiled to, OOP-based languages. This confirms much of what
we have been stating in previous chapters: that object-orientation provides ways of
modelling audio components in a very robust manner. This provides some confi-
dence for us to embark on the design of a class library, whose details are outlined in
the next section.

17.2 Library Design

The library design borrows from a number of sources which have shown the best
practice in the implementation of audio programming code. One of the guiding as-
pects was to allow a good deal of flexibility in the construction of classes, instead
of mandating the presence of specific components via an abstract base class with
a number of empty virtual methods. Instead, processing methods may or may not
exist in a derived class, depending on what they are supposed to implement. They
can be given any name, although the established informal nomenclature is to call
them process().
The principal base class in the library is AudioBase. This is subclassed to im-
plement all the different audio-handling objects, from synthesis/processing to signal
buffers, function tables, delay lines, and audio IO. The layout of AudioBase is in-
formed by a number of basic design decisions that underpin the principles of AuLib.

17.2.1 Stateful versus Stateless Representations

One of the basic motivations for AuLib is to place self-standing algorithms, previ-
ously implemented as free functions, within an object that allows the safe-keeping
of internal states. Let’s explore this idea, by reviewing the simple example of a sinu-
soidal oscillator that we explored in Chapter 13. As we have noted, their algorithm
is basically described by:
  
s(t) = a(t) sin 2π f (t)dt (17.1)

We have also seen that a C implementation of such a function would have to take
account of the sample-by-sample phase values that are produced by the integration
of the time-varying frequency f (t). Typically, in a sane implementation, the current
value of the phase would be kept externally to the function, and modified as a side-
effect (Listing 17.1).
252 17 AuLib

Listing 17.1: C function implementing Eq. 17.1.


double sineosc(double a, double f, double *ph,
double sr){
double s = a * sin(*ph);
*ph += twopi * f / sr;
return s;
}
where twopi is an externally-defined constant, set to 2π .
While this is entirely appropriate to demonstrate and expose the oscillator algo-
rithm for study, it is clearly not robust enough to be incorporated into a library. Quite
rightly, users would expect to be able to use such functions to implement multiple
oscillators, in banks, or for amplitude or frequency modulation. In this context, a
programmer could inadvertently supply a single phase address to a series of calls
to such functions when implementing a bank of oscillators and would clearly fail
to get the intended result. While it could work when carefully employed, such as
stateless presentation of the algorithm is clearly incomplete.
While there are ways of describing a sine wave oscillator in a stateless or purely
functional fashion, once we are committed to defining the computation in a stateful
form, we need to provide a means to keep an account of the current state. Clearly, a
self-contained oscillator will need to maintain the last computed value of the phase,
as the algorithm contains an integration. For this, we can wrap the whole algorithm
in a class that models its state and the means to get an output sample.
A minimal C++ class, similar to some of those discussed in Chapter 13 can be
used to implement such oscillator is shown in Listing 17.2.

Listing 17.2: C++ class implementing Eq. 17.1.


struct SineOsc {
double m_ph;
double m_sr;

SineOsc(double ph, double sr) : m_ph(ph), m_sr(sr) {};

double process(double a, double f){


double s = a * sin(m_ph);
m_ph += twopi* f / m_sr;
return s;
}
};
With an object-oriented implementation, the stateful description of the algorithm
is complete and provides enough robustness for use in a variety of contexts. Like-
wise, if we look across the various types of DSP operations that a library would
hope to implement, we will see all sorts of state variables involved. This provides
enough motivation for the wrapping of such algorithms in C++ classes.
17.2 Library Design 253

17.2.2 Abstraction and Encapsulation

In fact, by clearly describing an algorithm as having a state and a means of com-


puting its output, we are abstracting the DSP object as a specific data type. This
encapsulates all the kinds of operations we would expect to be able to apply to
such an object. What are the things we would like any DSP algorithm to contain? It
would be useful for instance for it to hold its output so that we only need to compute
it once. Basic attributes such as the sampling rate and the frame size (number of
channels in an interleaved signal) would also be essential.
Additionally, we have noted before, in Chapter 13, that processing should not be
limited to frame-by-frame computation (as in the minimal example of the oscilla-
tor in Listing 17.2). It has been firmly established that this is not the best practice
for efficient audio computation [13]. Therefore as we have already become used to
seeing, a block of frames, which may vary in size, is generated for each call of a
processing method. A means of registering whether the object is in an error state
would also be useful for program diagnostics. In this formulation, a class that mod-
els a generic audio DSP object would contain the following attributes (Listing 17.3)

Listing 17.3: Attributes of the audio DSP base class

class AudioBase {
protected:
uint32_t m_nchnls;// no of channels
uint32_t m_vframes;// vector size
std::vector<double> m_vector;
double m_sr;// sampling rate
uint32_t m_error;// error code
...
};
These are protected so that no unintended modification is allowed. This class is
for all practical purposes a wrapper around an audio vector (of double floating-point
samples). Methods for basic manipulation are also added: scale, offset, modulation,
mixing, and sample access are provided through overloaded operators. Setting and
getting samples off the vector are also provided (single channel samples, full blocks,
etc.), and to modify the vector size, as well as methods to get the value of the object
attributes. It is important to take good care in the design of the base class, as this
will pay good dividends as the library is developed.

17.2.3 Code Reuse

Since we have embraced, for good reasons, the object-oriented approach, it is very
useful to take advantage of inheritance, as well as composition. For this reason,
254 17 AuLib

the class hierarchy has been designed from the most general to the most specific,
although overall the tree is not very deep (six levels at most). As an example, the
ResonZ class shows how the reuse of code can be employed. In Fig. 17.1, we see
that it is subclassed from a series of parents.

Fig. 17.1: The ResonZ class and its parents.

At the top level, Iir implements the basic second-order IIR filter engine in direct
form II (eq.17.2), with externally-defined coefficients.

w(t) = x(t) − b1 w(t − 1) − b2 w(t − 2)


(17.2)
y(t) = a1 w(t) + a1 w(t − 1) + a2 w(t − 2)

The LowP class holds a frequency parameter to calculate Butterworth low-pass


coefficients; BandP adds a bandwidth attribute and re-implements the calculation
of coefficients for a Butterworth band-pass configuration; ResonR re-implements
the coefficient computation for a resonator with an extra zero at R; and ResonZ
just sets the a2 coefficient to −1, otherwise using the coefficient update code from
its parent.
This shows an example of how each subclass represents a small modification of
its parent, with most of its code reused. Another benefit is that if a modification
17.2 Library Design 255

needs to be made (e.g. a bug fix), it does not need to be reproduced at several places
(which opens the door to introducing small errors at these different locations).
Code reuse through composition is also employed throughout the library. For
example, the Delay class (see Sect. 18.2) holds an AudioBase object that im-
plements its delay line, using the inlined access methods provided in that class. The
Balance class, which implements envelope following and signal amplitude bal-
ancing, is made up of two Rms objects that are used to measure the RMS amplitude
of input signals. Rms itself is a specialisation of a first-order low-pass filter class. In
another example, the TableSet class, which is a utility class for the band-limited
oscillator class BlOsc, is made up of a vector of FourierTable objects contain-
ing waveform tables.

17.2.4 Connectivity

Some special attention needs to be given to the ways in which objects can be easily
connected with each other forming high-level signal processing graphs. It is also
important to consider how library objects can interact with code from other libraries
(both in C and in C++). There are two major ways in which we could connect objects
together in a graph:

1. Through raw pointers to data: these are presented in the form of const double*
arguments to allow signals from other libraries and non-AuLib sources to be in-
serted as inputs to processes. These, in turn, also return a const double*
to the object vector so that they can be sent to other destinations. This type of
connectivity is unsafe from a C++ perspective, as it requires the programmer to
carefully observe and match the vector boundaries, although it is commonplace
within a C-language context.
2. Through object references: processing methods also allow connections to and
from const AudioBase& variables, which provides more safety since vec-
tor boundaries are checked before access. They are the preferred way to pass
signals from one library object to another. For convenience, classes overload
operator() as a shortcut for processing methods using object references. This
allows a function-like composition of operations, as in, for instance,
out(obj1(obj2(in()));

While there is no mandatory way in which this is enforced in derived classes,


an informal convention is to provide two processing methods as part of the pub-
lic interface, in addition to the overloaded function operator. One of them would
deal with data pointers (producing and/or consuming arrays) and the other would
use object references as input and/or output. This shown in Listing 17.4, where an
AudioBase-derived class is laid out. These public methods delegate to a private
virtual DSP method, which does the actual processing for the object and may be
overridden in a derived class. In general, the AuLib design follows best practice of
256 17 AuLib

avoiding virtual methods in the public interface. The only exception to this is in the
definition of signal arithmetic operators (in AudioBase), where implementation
simplicity is the main concern. Appendix A discusses further details on deriving
new classes from AudioBase.
Listing 17.4: Processing methods and their connectivity in an AudioBase-derived
class.
class Proc : public AudioBase {
virtual const double *dsp(const double *sig);
public:
const double *process(const double *sig) {
return dsp(sig);
}
const Proc &process(const AudioBase &obj) {
if(obj.vframes() == m_vframes &&
obj.nchnls() == m_nchnls) {
process(obj.vector());
} else m_error = AULIB_ERROR;
return *this;
}
const Proc &operator()(const AudioBase &obj) {
return process(obj);
}
...
};
No blocking operations (and/or resource allocation) should take place in a pro-
cessing method. This is also the case for all inline vector manipulation methods
(operators, etc.) provided by the base class, which are all realtime safe (cf. Sect.
15.3). It is particularly important for the library design to be defensive with regard
to these aspects, so that developers are not led into inadvertently writing code that
may turn out not to be good for realtime use.

17.3 A Tour of the Library

Most of the library classes sit im the main AudioBase tree. Figures 17.2 and
17.3 show some of these, in terms of generators, input and output, function tables,
and processor classes. Many of these originated through refactoring of the code
discussed in earlier chapters.
The library classes can be loosely categorised as follows: processing (signal gen-
erators and processors, i.e. they implement process() methods); function tables,
holding mostly constant buffers; and input/output, which allow some form of audio
IO through read() or write methods. In addition to these, a number of spe-
cialised classes have been designed for high-level control of signal processing and
17.3 A Tour of the Library 257

Fig. 17.2: AuLib class hierarchy: generators, input and output, and function tables.

synthesis. Most of these classes take full advantage of the facilities and design of
the AudioBase class.

17.3.1 Signal Generators

The signal generators in AuLib include standard table-lookup oscillators, sampled-


sound and band-limited waveform oscillators, a phase generator, table readers, and
envelope generators. The SamplePlayer class takes a buffer/function table con-
258 17 AuLib

Fig. 17.3: AuLib class hierarchy: processors.

taining recorded samples and plays it back with pitch and amplitude control either
in a loop or as a single-shot performance. It can handle multichannel sample tables,
producing multichannel output, and uses linear interpolation for table lookup.
The library supports a number of function table classes (derived from a basic
model given by FuncTable), which hold waveforms, envelopes, or signal sam-
ples. These can be read by oscillators or by table lookup objects, whose indices can
be derived from any signal. A phase generator connected to a table reader imple-
ments an oscillator algorithm.
The BlOsc class implements band-limited waveform synthesis using waveta-
bles stored in a TableSet object. This contains a set of band-limited tables that
are selected according to the desired fundamental frequency. Currently, TableSet
supports classic waveforms (such as sawtooth, square and triangle) constructed us-
17.3 A Tour of the Library 259

ing FourierTable objects. However, the mechanism can be expanded to handle


generalised band-limited waveforms.
The library contains single-segment linear and exponential signal generators,
which can be triggered and reset. Extending these, a generalised multi-segment plus
release envelope class Envel is provided. It uses a utility class, Segments, that is
used to set up a segment list that can be shared among several envelopes (and also
used for envelope tables). A pre-packaged four-segment envelope ADSR is derived
from it as a convenient way to create simple envelopes. The release segment in these
classes is triggered by a specific method (release()), which makes the envelope
jump immediately to that stage.

17.3.2 Signal Processors

AuLib includes a basic set of signal processing classes. Seven types of second-order
and two first-order (low- and high-pass) filters are present, alongside root-mean-
square detection and signal balancing.
As we will see in the next chapter, the Delay class implements fixed or vari-
able delays (depending on the choice of overloaded processing functions), with or
without feedback. It can implement comb filters, flangers, vibrato and chorus ef-
fects. Derived from it, we have a high-order all-pass filter and a general-purpose
finite impulse response filter (implementing direct convolution, which is discussed
in Chapter 18). Delay objects can be tapped by Tap (truncating) or Tapi (inter-
polating) processors.
Some signal-processing utilities are present. A channel extractor, Chn, takes an
interleaved multichannel input and outputs a requested channel. This is needed to
allow access to single channels for objects that are designed to manipulate mono
signals only. A signal bus is provided by SigBus, which can be used as a mixing
buffer with scaling and offset. Completing these, there is an equal-power panning
class, Pan, that produces a stereo output from a mono input signal.
In addition to these time-domain processing classes, AuLib provides support for
streaming spectral processing using the short-time Fourier transform and its deriva-
tive, the phase vocoder. Stateless free functions for complex and real-input discrete
Fourier transform (using a radix-2 algorithm) are implemented from first principles.
Chapter 19 provides a detailed discussion of these techniques as implemented in
AuLib.

17.3.3 Audio Input and Output

A basic audio IO facility is provided as part of the library, through the SoundIn
and SoundOut classes. This is to allow programs to be written without the need
to access external libraries directly, rather than to provide a complete cross-platform
260 17 AuLib

IO solution. The interface is fairly agnostic as far as its implementation is concerned.


Currently, it provides a frontend to libsndfile [40], for soundfile IO; Portaudio [5],
for realtime device IO; and std::iostream for standard text IO. It is imple-
mented asynchronously, and it is capable of low-latency audio (at least as far as the
underlying service allows it).
Users of the library do not actually depend on these two IO classes. For instance,
applications would place the processing classes directly in an audio system callback
(e.g. through Jack), without the use of any AuLib IO object. Equally, a processing
graph based on library objects can be incorporated into a variety of settings, such as
embedded hardware, mobile devices, etc.

17.4 Synthesis and Processing Control

AuLib includes support for controlling sound synthesis and processing at a higher
level. This is provided by the following classes:
• AuLib::Note: this class provides support for composing signal processing
graphs. It can be subclassed to provide a container object that will model a note
on an instrument, with a well-defined control interface.
• AuLib::Instrument: this is a template class that takes in a Note-derived
class to create an instrument based on it. This class is responsible for instantiating
and controlling note objects.
• AuLib::MidiIn: this is a MIDI controller class that takes in instruments, lis-
tens for MIDI input, and dispatches control data to them. This class currently
uses Portmidi, but as with audio IO, the backend implementation can be changed
to a different MIDI library if needed.
• AuLib::ScorePlayer: this a score controller, also taking in instruments and
dispatching the control data from an AuLib::Score object to them.
These classes are built to take advantage of the framework given by AudioBase,
so that data can be passed seamlessly from control to processing objects, whose out-
put can be tapped very flexibly.

17.5 An AuLib Instrument

To complete the discussion, we present a full programming example demonstrating


some AuLib classes within a C++ OOP design. For this, we choose to implement
a signal processing instrument that will take an input with any number of channels,
apply a feedback delay effect and produce a stereo output with the input sources
spread evenly between the two channels.
The structure of this instrument, for a single channel, is shown in a flowchart
in Fig. 17.4. Multiple channels will share the SoundIn, SigBus and SoundOut
17.5 An AuLib Instrument 261

objects, but will feature separate Chn, Delay, and Pan objects. The program takes
in input and output names (source and destination), plus the delay time and feedback
gain as arguments:
$ delay <src> <dest> <delay time> <feedback>
The key signal processing object in this program is provided by the Delay class,
which is discussed in Sect. 18.2 in the next chapter. It takes an input and puts it
through a delay line, feeding its output back into it, scaled by a gain. Depending
on the delay time, the effect can consist of a series of echoes (long delays) or of a
resonating filter (short delays). The feedback gain needs to be less than 1, otherwise
the output will grow out of control.

SoundIn
?
Chn

?
Delay
?
+i

?
Pan

?
SigBus
?
SoundOut

Fig. 17.4: Signal flowchart for a single input channel in the delay program.

The complete program is shown in Listing 17.5. It uses an AuLib::SoundIn


object to access its input (line 38), whose source may be a soundfile, the default
soundcard ("adc"), or the standard input ("stdin"). The choice of input is taken
as the first argument of the program, and if the object cannot be constructed without
errors, the program exits. This input determines the number of channels that will be
used in the instrument. Since this is dependent on the source, we will dynamically
create vectors of objects to process each channel of input. This demonstrates yet
another use for the std::vector container of the Standard C++ library class.
For each channel we will require a channel reader, a delay, and a panner object.
The vectors holding these are constructed in lines 44–51. Note that delay times and
262 17 AuLib

feedback gain are taken from the third and fourth arguments in the command line.
In order to prevent any blow-up, we coerce the feedback gain to be a non-negative
number less than 1.
Finally, a single AuLib::SigBus object is employed to accumulate the out-
puts of each separate channel (line 53) and feed the AuLib::SoundOut output
(line 55), which also takes its destination from the command line (second argument),
with similar options to the input (soundcard, soundfile, or standard output). In order
to facilitate the processing of each channel, we create an ordered list of channels
as an integer vector, which can then be used in a range-based for loop that iterates
over this list. The std::iota function is used to fill the vector with the channel
numbers.

Listing 17.5: Delay example program.


1 #include <Chn.h>
2 #include <Delay.h>
3 #include <Pan.h>
4 #include <SigBus.h>
5 #include <SoundIn.h>
6 #include <SoundOut.h>
7 #include <iostream>
8 #include <numeric>
9 #include <vector>
10 #include <cstdlib>
11 #include <atomic>
12 #include <cmath>
13 #include <csignal>
14
15 using namespace AuLib;
16 using namespace std;
17
18 // handle ctrl-c
19 static atomic_bool running(true);
20 void signal_handler(int signal) {
21 running = false;
22 cout << "\nexiting...\n";
23 }
24
25 int main(int argc, const char **argv) {
26
27 if (argc > 4) {
28 // audio input
29 SoundIn input(argv[1]);
30 if(input.error() != AULIB_NOERROR) {
31 cout << "error opening input\n";
32 return -1;
17.5 An AuLib Instrument 263

33 }
34 // input channels
35 vector<Chn> chn(input.nchnls());
36 // delay lines
37 double fdb = fabs(atof(argv[4]));
38 vector<Delay> delay(input.nchnls(),
39 Delay(atof(argv[3]), fdb < 1.0 ? fdb : 0.99,
40 def_vframes, input.sr()));
41 // stereo panning
42 vector<Pan> pan(input.nchnls());
43 // mixing bus
44 SigBus mix(1./input.nchnls(), 0., false, 2);
45 // audio output
46 SoundOut output(argv[2], 2, def_vframes,
47 input.sr());
48 if(output.error() != AULIB_NOERROR) {
49 cout << "error opening output\n";
50 return -1;
51 }
52 uint64_t end = input.dur() + 5*output.sr(), t = 0;
53 // list of channels
54 vector<uint32_t> channels(input.nchnls());
55 iota(channels.begin(), channels.end(), 0);
56 signal(SIGINT, signal_handler);
57
58 cout << Info::version();
59
60 while (t < end && running) {
61 input();
62 for(uint32_t channel : channels) {
63 chn[channel](input, channel + 1);
64 delay[channel](chn[channel]);
65 pan[channel](delay[channel] += chn[channel],
66 (1 + channel)*input.nchnls()/2.);
67 mix(pan[channel]);
68 }
69 t = output(mix);
70 mix.clear();
71 }
72 return 0;
73 } else
74 cout << "usage: " << argv[0]
75 << " <source> <dest> <delay> <feedback>\n";
76 return 1;
77 }
264 17 AuLib

The processing loop, lines 60–71, processes one vector of audio at a time and
continues processing until the input ends or a ctl-c (SIGINT) signal is sent
to the program (for which a signal handler is registered in line 65). We employ
the function-call operators of each object to access their DSP operations (e.g.
input(), delay[channel]()), and each channel is processed in the inner
range-based for loop. We take advantage of the overloaded operator+= (in line
74) as a means of adding dry and wet-effect signals in the pan processing input.
Note that the mix object is called to accumulate the outputs of each channel and is
then cleared after use.

17.6 Conclusions

This chapter has described the design of a simple, lightweight audio DSP library in
C++, based on the principles developed in earlier chapters. The main motivation is
to provide a platform to develop and collect algorithms for the study, teaching and
research in audio programming. The library classes are effectively thin wrappers
that envelop succinct and efficient implementations of DSP operations. The code
has been designed to be robust enough for general-purpose deployment in audio
processing applications. The next chapters in the book will employ the library while
exploring specific DSP algorithms. A general reference for the library is given in
Appendix A The library developer’s manual can be found at https://aulib.github.io,
and the source code repository URL is https://github.com/aulib/aulib.

Problems

17.1. Write a version of the MIDI synthesiser program presented Chapter 12 using
AuLib classes and including an envelope.

17.2. Derive a class from AuLib::AudioBase to implement a fourth-order low-


pass filter as described in Chapter 16 and [20]. Write a simple test program to
demonstrate it.
Chapter 18
Delay Line Processing

Abstract In this chapter we explore the concept of delay lines and their applications,
as found in computer music instruments. Following an introduction to the main as-
pects of delay line programming, we explore the implementation of fixed, variable
and multitap delays. The convolution operation and finite impulse response filters
are discussed as an extreme example of tapping a delay line at multiple points. From
the programming perspective, we introduce the principles of lambda functions and
closures, which provide elegant means of implementing certain types of computa-
tion.

Delay lines are employed in a significant number of audio signal processing ap-
plications. Given that the methods we have been exploring so far are intrinsically
linked to the timing of the samples in a waveform, this is not surprising. In fact,
we can group all the techniques and algorithms we have discussed up to now under
the time-domain designation. In this chapter, we will expand our understanding of
what it is to apply delays to an input signal, as well as of the results of manipulating
both the amounts of delay (delay times) and the mix of direct and delayed signals.
A whole category of related effects can be derived from these principles.
In general, we can distinguish between two groups of delay effects, according
to whether they are based on fixed or variable delay times. In the first case, we
have primarily the IIR filter equations we have already studied in Chapter 16, which
employ mostly very short delays of one to a few samples, and the echo and rever-
beration effects that we will see in the present chapter. These, instead, are based on
much longer delays, which can be up to several seconds long. The variable-delay
effects also often employ delay times that will be modulated within a wide range,
from zero to several milliseconds. These take advantage of side effects of varying
the delay time that can have pitch and timbre modification consequences.
Regardless of the type of delay effect, if it requires longer delays, it should use a
basic algorithm to apply the expected time delay to the signal. This is called a circu-
lar buffer and the one to be employed here is based on a variation on the principles
we have already encountered in Chapter 12, Sect. 12.4. While there the require-
ments were to have a queue to keep two threads synchronised, here we only want

© Springer Nature Switzerland AG 2019 265


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_18
266 18 Delay Line Processing

a sequence of samples to be held in memory for a certain period of time, while the
expected delay time elapses. It turns out that the most efficient way to do this for a
large number of samples is also to employ a circular buffer.

18.1 Circular Buffers

Any signal delay is only meaningful if it has a time reference to which it is going
to be compared. For instance, if we are writing audio samples to disk or to memory,
then the time delay is just an offset in relation to a given frame position. If we are
sending the data in realtime to a DAC, then, while the delay is just still a sample
offset, it can also be conceived as diverting the signal to a memory location, from
which we will read some time later. This is in fact the most general solution to
implementing delays, regardless of the destination of the audio signal (realtime or
not). So what we need is a queue where samples go in at one end and come out
at the other end after a certain number of sampling periods. This is what we call a
FIFO queue (Fig. 18.1).

in - - out
Fig. 18.1: FIFO queue.

Naturally, we might think that such a queue should be implemented as follows:


out = fifo[fifo_size-1];
for(int i = fifo_size-1; i > 0; i--)
fifo[i] = fifo[i-1];
fifo[0] = in;
In other words, at each sampling period, we pop the head of the queue, and then
move, one by one, each sample in the queue one position forward. Then we push
in a new sample to the first position that is now vacant. While this is correct from
the point of view of the inputs and outputs and the FIFO principles, it is hardly very
efficient from a computational perspective. For each sample processed, we have to
move N numbers around (N being the delay time in samples). It is optimal only for
a few samples, and not practical for delays of several milliseconds.
The main observation that we can draw from looking at the FIFO layout is that
we only need to care about pushing one sample in and popping another out from a
buffer. This means we only need to replace one position in memory, if we accept that
it can be read in a circular fashion. What we do then is to move the nominal head/tail
position of the queue along the memory block, rather than move its contents. This is
18.2 Fixed-Delay Effects 267

shown in Fig. 18.2. It only works if we are able to wrap around the end of the buffer,
but we have been doing this all the time in table-lookup oscillators, which employ
the same reading principle, without the overwriting part.
Interestingly, with a simple modification of the naive FIFO code above, we can
achieve an efficient circular-buffer implementation:
out = fifo[rp];
fifo[rp] = in;
rp = rp != fifo_size - 1 ? rp + 1 : 0;
Note that, as if by magic, the processing loop has been disposed of. That is right:
with this algorithm to impose a delay of N samples, we only need three opera-
tions: pop, push, and update. It has constant time complexity as opposed to being
dependent on the delay size. All we needed to do is to keep track of the replace-
ment position. In the simplest cases, we only need to keep track of this position, as
the writing operation will always be preceded by the reading operation in the same
buffer location. However, in the case of variable delays, we will need to keep track
of the reading and writing indices separately.

in
?
-

?
out
Fig. 18.2: Delay line circular buffer.

18.2 Fixed-Delay Effects

Fixed-delay effects employ the simplest cases of circular-buffer queues. The delay
time is determined by the size of the buffer, ansd therefore we will allocate memory
according to the requested delay. This can be simply computed as the product of the
sampling rate and the delay time. Once this is done, as part of the object initialisa-
tion, we will not change it (after all, it is fixed). The effect is given by the operation
268 18 Delay Line Processing

of reading and writing to the circular buffer, as shown above, with the following
order of operations:
• The output is read from the buffer at the current position.
• The input is written to the buffer at the same location.
• The position index is incremented, circularly (modulo the delay size).
The following processing function implements these steps for a delay-line class1 :
const double *AuLib::Delay::dsp(const double *sig) {
for (uint32_t i = 0; i < m_vframes; i++) {
m_vector[i] = m_delay.set(sig[i], m_pos);
m_pos = m_pos == m_delay.vframes() - 1 ?
0. : m_pos + 1;
}
return vector();
}
where the member variable m_pos is responsible for keeping the current index.
The delay line is an AuLib::AudioBase object (see Listing 18.1), and the
AudioBase::Set() method is used to set the delay line sample at a given posi-
tion, returning the old sample stored in it (steps 1 and 2 above).
AuLib::Delay is the base class for all delay-based objects. It encapsulates a
delay line, which is implemented as an AuLib::AudioBase object, as shown in
Listing 18.1. As with other classes in the library, it was designed in such a way that
it could be specialised for the various different types of delay effects we will explore
in this chapter, maximising code reuse. For this reason, you will see in the public
interface some prototypes for variable delay lines, which are also implemented by
this class and will be discussed later.

Listing 18.1: The AuLib::Delay class.


/** Fixed or variable delay line with optional feedback
(Delay, Comb filter, Flanger)
*/
class Delay : public AudioBase {

virtual const double *dsp(const double *sig,


double dt);
virtual const double *dsp(const double *sig,
const double *dt);
virtual const double *dsp(const double *sig);

protected:
double m_fdb;

1 From now on, examples will be given directly from code in AuLib, please refer to it for the

complete classes.
18.2 Fixed-Delay Effects 269

uint32_t m_pos;
AudioBase m_delay;

public:
/** Delay constructor \n\n
dtime - delay time \n
vframes - vector size \n
sr - sampling rate
*/
Delay(double dtime, double fdb,
uint32_t vframes = def_vframes,
double sr = def_sr)
: AudioBase(1, vframes, sr), m_fdb(fdb), m_pos(0),
m_delay(1, 1, sr) {
m_delay.resize_exact(dtime >= 0. ? dtime * sr : 1);
}

/** delay a signal sig for a fixed time


*/
const double *process(const double *sig) {
return dsp(sig);
}

/** delay a signal for dt seconds


*/
const double *process(const double *sig, double dt) {
return dsp(sig, dt);
}

/** delay a signal for dt seconds and with optional


feedback fdb
*/
const double *process(const double *sig, double dt,
double fdb) {
m_fdb = fdb;
return dsp(sig, dt);
}

/** delay a signal for delay time taken from the


signal dt
* /
const double *process(const double *sig,
const double *dt) {
return dsp(sig, dt);
}
270 18 Delay Line Processing

/** delay a signal for delay time taken from


the signal dt and with optional feedback fdb
*/
const double *process(const double *sig,
const double *dt,
double fdb) {
m_fdb = fdb;
return dsp(sig, dt);
}

/** delay a signal in obj for a fixed time


*/
const Delay &process(const AudioBase &obj) {
if (obj.vframes() == m_vframes &&
obj.nchnls() == m_nchnls) {
process(obj.vector());
} else
m_error = AULIB_ERROR;
return *this;
}

/** delay a signal in obj, optionally for dt seconds.


*/
const Delay &process(const AudioBase &obj,
double dt) {
if (obj.vframes() == m_vframes &&
obj.nchnls() == m_nchnls) {
if (dt < 0)
process(obj.vector());
else
process(obj.vector(), dt);
} else
m_error = AULIB_ERROR;
return *this;
}

/** delay a signal in obj, optionally for dt seconds


and with feedback fdb.
*/
const Delay &process(const AudioBase &obj,
double dt, double fdb) {
m_fdb = fdb;
return process(obj, dt);
}
18.2 Fixed-Delay Effects 271

/** delay a signal in obj for dt sec with variable


delay time sig
*/
const Delay &process(const AudioBase &obj,
const AudioBase &dt) {
if (obj.vframes() == m_vframes &&
obj.nchnls() == m_nchnls &&
dt.vframes() == m_vframes &&
dt.nchnls() == m_nchnls) {
process(obj.vector(), dt.vector());
} else
m_error = AULIB_ERROR;
return *this;
}

/** delay a signal in obj for dt sec with variable


delay time sig and with optional feedback
fdb.
* /
const Delay &process(const AudioBase &obj,
const AudioBase &dt,
double fdb) {
m_fdb = fdb;
return process(obj, dt);
}

/** operator(a,b,c) convenience method


*/
const Delay &operator()(const AudioBase &a,
const AudioBase &b,
double c) {
return process(a, b, c);
}

/** operator(a,b) convenience method


*/
const Delay &operator()(const AudioBase &a,
const AudioBase &b) {
return process(a, b);
}

/** operator(a,b,c) convenience method


*/
const Delay &operator()(const AudioBase &a,
272 18 Delay Line Processing

const double b, double c) {


return process(a, b, c);
}

/** operator(a,b) convenience method


*/
const Delay &operator()(const AudioBase &a,
double b) {
return process(a, b);
}

/** operator(a) convenience method


*/
const Delay &operator()(const AudioBase &a) {
return process(a);
}

/** get the current write position


*/
uint32_t pos() const { return m_pos; }

/** get a reference to the delay


line.
* /
const AudioBase &delayline() const { return m_delay; }
};
The Delay class and its subclass AllPass implement two typical algorithms
for fixed-delay reverberation and echo: comb and all-pass filters. They can be com-
bined together in different arrangements to construct artificial reverberation effects.

18.2.1 Comb Filters

Comb filters are very similar to the straight fixed-delay algorithm as implemented
in Sect. 18.2. The only difference is that they include a scaled feedback signal path
from the output back to the delay line input (Fig. 18.3). The amount of feedback g
determines how much of the output gets recirculated.
Comb filters are named after their characteristic amplitude response, which re-
sembles an upside-down comb (with teeth sticking upwards). This causes the effect
of filtering out some bands (at the trough of the amplitude response), while enhanc-
ing others (at the peaks). The peaks are spaced evenly in frequency, at τ1 Hz, where τ
is the delay (or loop) time. For more details on the signal processing characteristics
of this processor, see [39]. The g parameter determines the total reverb time r of the
comb filter for a given delay line length:
18.2 Fixed-Delay Effects 273

gain
?
i
×

input - +?
-
i delay - output

Fig. 18.3: Comb filter.

 r
1 τ
g= (18.1)
1000
A simple modification of the Delay DSP method presented earlier implements
a comb filter, where g is defined by the m_fdb member variable:
const double *AuLib::Delay::dsp(const double *sig) {
for (uint32_t i = 0; i < m_vframes; i++) {
m_vector[i] = m_delay.set(sig[i] +
m_delay[m_pos] * m_fdb, m_pos);
m_pos = m_pos == m_delay.vframes() - 1 ?
0. : m_pos + 1;
}
return vector();
}
In AuLib, rather than deriving a trivial class implementing a comb filter, the
Delay class instead implements this code instead of the original process with no
feedback. This behaviour can be easily emulated in the modified class by setting its
feedback parameter to zero. An example of the use of the Delay class was given in
Listing 17.5.

18.2.2 All-Pass Filters

All-pass filters, unlike comb filters, feature a flat amplitude response, and therefore
they do not impart a strong timbral colouration to their input, although they are prone
to ringing at abrupt transitions (e.g. when a signal is turned on or off suddenly) [15].
To implement this, they require both a feedforward and a feedback signal path in
addition to the delayed signal, as shown in Fig. 18.4.
Given these differences, a minimal derived class can be created to model this
component. Note that since it implements solely fixed-delay processing, the DSP
methods related to variable delay time (those with variable delay parameters) dele-
274 18 Delay Line Processing

gain
?
i
×

?
input - +i
- delay - +i- output
6
- ×i
6
-gain
Fig. 18.4: Allpass filter.

gate to the fixed delay method, ignoring any changes in delay time. This can only
be set when an object is constructed.
Listing 18.2: The AuLib::AllPass class.
/** All-pass filter
*/
class AllPass : public Delay {

virtual const double *dsp(const double *sig,


double dt) { return dsp(sig); }
virtual const double *dsp(const double *sig,
const double *dt) {
return dsp(sig);
}
virtual const double *dsp(const double *sig);

public:
/** AllPass constructor \n\n
dtime - delay time \n
fdb - feedback gain \n
vframes - vector size \n
sr - sampling rate
*/
AllPass(double dtime, double fdb,
uint32_t vframes = def_vframes,
double sr = def_sr)
: Delay(dtime, fdb, vframes, sr){};
};
The only method that needs to be implemented in this class is AllPass::dsp():
const double *AuLib::AllPass::dsp(const double *sig) {
18.3 Variable Delay Lines 275

double y;
for (uint32_t i = 0; i < m_vframes; i++) {
y = sig[i] + m_fdb * m_delay[m_pos];
m_vector[i] = m_delay.set(y, m_pos) - m_fdb * y;
m_pos = m_pos == m_delay.vframes() - 1 ?
0. : m_pos + 1;
}
return vector();
}
As discussed in [39], comb and all-pass filters may be connected together to
construct an artificial reverberator. A standard way of doing this is to connect a
set of comb filters in parallel, which would feed a set of all-pass filters in parallel.
The basic layout employs four comb filters, whose delay times are unrelated to each
other (to even out the timbral colouration), followed by two all-pass filters with short
delay times (of the order of a few milliseconds) and reverb times. The function of
these is to provide early echoes and help thicken the reverberation, whereas the
comb filters are responsible for the diffuse tail end of the effect. Therefore the total
reverb time is defined by the comb filter g parameter2 .

18.3 Variable Delay Lines

The AuLib code discussed so far takes the advantage of the fact that the delay line
size will not change once an object is constructed. This allows the code to be sim-
plified: one single read and write index is needed, and the delay time is given by
the size of the buffer. Also in this case, the delay time in seconds is rounded to the
nearest integer length in samples. While this is not exact, it is not problematic in
most cases. If a precise fractional-sample delay were required, then we would need
to apply one of the methods described in the literature [30]. The need for a more
precise delay line lookup also arises when we are implementing variable delays.
For a number of applications where some sort of pitch modification is the tar-
get, linked or not to a delay effect, a modification of the delay algorithm admitting
a change in delay time during performance is required. For this, we will need to
decouple the reading and writing operations and use two separate indices. The writ-
ing into the delay line will always proceed in single-step increments, as we would
expect, but the reading position may jump by more than one position, move back-
wards, or stay fixed. This is because the actual delay time will now be calculated on
the basis of the difference between the reader and writer positions. So the read index
will be a certain number of samples behind the writer, up to a maximum which is
defined by the buffer size.
If the requested delay time in seconds translates into a fractional number of sam-
ples, we must be careful not to always round it to an integral value. This, in many

2 See also Problem 18.1.


276 18 Delay Line Processing

applications, will result in poor quality audio. For this reason, we should apply in-
terpolation algorithms such as those discussed in Chapter 14. For these in particular,
we will need to be careful when the interpolation needs to occur at the end of the
delay line, since in this case we will need to look back at the first sample or samples
in order to apply the algorithm correctly. In the case of cubic interpolation, we also
need to be careful when the reader index is at the start of the delay line. All of these
cases follow from experiences we have had with interpolating oscillators.
The simplest applications of variable delays, however, may not require any in-
terpolation at all. For delays that are only changing very slowly, we can implement
truncated lookup. For instance, the Delay::dsp() method taking a scalar value
argument for the delay time does this:
const double *AuLib::Delay::dsp(const double *sig,
double dt) {
uint32_t ds = dt * m_sr;
int32_t rp;
if (ds > m_delay.vframes())
ds = m_delay.vframes();
for (uint32_t i = 0; i < m_vframes; i++) {
rp = m_pos - ds;
if (rp < 0)
rp += m_delay.vframes();
m_vector[i] = m_delay[rp];
m_delay[m_pos] = sig[i] + m_vector[i] * m_fdb;
m_pos = m_pos == m_delay.vframes() - 1 ?
0. : m_pos + 1;
}
return vector();
}
Nevertheless, in general we will need some sort of interpolation. This is provided
by the Delay::dsp() method that takes a signal vector instead to control the
delay time:
const double *AuLib::Delay::dsp(const double *sig,
const double *dt) {
double rp, ds, a, b, frac;
uint32_t irp;
for (uint32_t i = 0; i < m_vframes; i++) {
ds = dt[i] < 0. ? 0 : dt[i] * m_sr;
if (ds > m_delay.vframes())
ds = m_delay.vframes();
rp = m_pos - ds;
if (rp < 0)
rp += m_delay.vframes();
irp = (uint32_t)rp;
frac = rp - irp;
18.3 Variable Delay Lines 277

a = m_delay[irp];
if (++irp == m_delay.vframes() - 1)
irp = 0;
b = m_delay[irp];
m_vector[i] = a + frac * (b - a);
m_delay[m_pos] = sig[i] + m_vector[i] * m_fdb;
m_pos = m_pos == m_delay.vframes() - 1 ?
0. : m_pos + 1;
}
return vector();
}
The following effects are typical applications of variable delays:

• Flanger: the flanger is a variable-delay comb filter, whose delay time is variable
between close to 0 to only a few milliseconds. The amount of gain feedback
will determine the quality of the effect, by making the peaks of the comb more
prominent. As noted before, the delay (or loop) time determines the spacing of
the comb peaks, and therefore varying it will cause a filter sweep effect.
• Vibrato: vibrato is implemented by a straight variable delay with no feedback,
which is modulated by a periodic source such as a low-frequency oscillator
(LFO)3 . By varying the delay time a little more than in the flanger effect, we
will cause a pitch modulation effect. This is because of the difference in the de-
lay reading and writing rates: if the delay is decreasing, then the read index is
proceeding at faster rate than the writing one (their difference decreases), and we
have a raise in pitch; on the other hand, if the delay is increasing, the reading rate
has to be slower, causing a drop in pitch. If the modulating source is made up of
linear segments (e.g. a triangular wave), the result will be an alternation of two
(or more) fixed pitch transpositions. In the case of non-linear sources (e.g. a sine
wave), we will hear a smooth variation of pitch within a range that is determined
by the amount of modulation applied.
• Chorus: the chorus effect tries to model two things: (a) the slightly asynchronous
nature of multiple instruments playing together (time delays); and (b) a slight
detuning that also takes place. The first is achieved by the delay time effect, and
the second by a pitch modulation effect (a slow vibrato). The chorus effect is
created by modulating a signal and mixing the result with its input (generally
no feedback is used). The delay times are slowly modulated so to create fine
detuning effects and some delay asynchrony.
• Doppler: the Doppler effect is constructed by applying a change in delay time.
Since distance can be equated with a given time delay, by varying this we can
simulate a change in source sound position. Associated with this, a change in
amplitude is also needed to make the effect realistic.
• Pitch shifter: as we have noted, the differences in reading and writing rates caused
by delay time changes will create a pitch shifting effect. If these differences are

3 Effectively, this is just an ordinary oscillator with sub-audio fundamental frequencies.


278 18 Delay Line Processing

constant, then a constant pitch shift will take place. However, with a finite delay
buffer at some point one of the indices will overtake the other, causing the delay
to go from maximum to minimum or the other way round. At this point there is
a waveform discontinuity, and a click is heard. In order to remove this, we will
need to fade the reading in and out in synchrony with this discontinuous position,
using a periodic envelope (or windowing). With one reading ‘head’, this will add
an amplitude modulation effect. With two or more of these, offset by a certain
amount, this modulation artefact may be minimised.

18.4 Multiple Taps

The example of the pitch shifter effect suggests an interesting approach, the use of
more than one reading position, also called multiple taps into a delay line. In that
particular application, it is used to smooth the modulation effects caused by the en-
velope, but in others it can be used for multiple delays or echoes. To take advantage
of this, the AuLib library provides two classes, Tap and Tapi that implement taps
into an existing delay line object. The latter is an interpolating tap, and the former
truncates the readout position. Here is the class interface for Tap:

Listing 18.3: The AuLib::Tap class.


/** Creates a tap for a Delay object
truncating readout.
*/
class Tap : public AudioBase {

virtual const Tap &dsp(const Delay &obj,


double time);
virtual const double *dsp(const Delay &obj,
const double *time) {
return dsp(obj, time[0]).vector();
}

public:
/** Tap constructor \n\n
vframes - vector size \n
sr - sampling rate
* /
Tap(uint32_t vframes = def_vframes,
double sr = def_sr)
: AudioBase(1, vframes, sr){};

/** tap a delay object at time secs


*/
18.4 Multiple Taps 279

const Tap &process(const Delay &obj, double time){


return dsp(obj, time);
}

/** tap a delay object according to time signal in


secs
*/
const double *process(const Delay &obj,
const double *time) {
return dsp(obj, time);
}

/** tap a delay object according to time signal from


obj in secs
* /
const Tap &process(const Delay &del,
const AudioBase &obj) {
if (obj.vframes() == m_vframes
&& obj.nchnls() == m_nchnls) {
dsp(del, obj.vector());
} else
m_error = AULIB_ERROR;
return *this;
}

/** operator () convenience method


*/
const Tap &operator()(const Delay &del,
const AudioBase &b) {
return process(del, b);
}

/** operator () convenience method


*/
const Tap &operator()(const Delay &del, double b) {
return process(del, b);
}
};
}
The Tapi class reuses much of the mechanism in Listing 18.3, as it is a derived
class that only needs to reimplement the DSP methods.
280 18 Delay Line Processing

18.4.1 Convolution

The extreme case of multitap delays is where we have one tap at each buffer position.
If we place a gain multiplier at each output, we are in effect implementing directly a
signal processing operation called convolution, which also gives the general layout
of a finite impulse response (FIR) filter. Such filters are the result of the sum of N
delayed and scaled samples of a waveform; the sequence of scaling multipliers (or
coefficients) is itself another signal called the impulse response (IR). This is also
equivalent to the actual output of the delay line if we were to place a single impulse
at its input. After N samples, the output would be zero (hence the FIR denomination,
see Chapter 16).
The convolution algorithm is shown schematically in Fig. 18.5, where an input
signal (x[t]) is placed into a delay line and each tap output is multiplied by a coef-
ficient (the ir[] array), and mixed to yield the output y[t] (the time t is in samples).
Therefore for this algorithm, we need a delay line that is tapped at each point, and a
table of coefficients making up the impulse response.

x[t]
x[t-1]
x[t-2]
x[t-3] x[t-N-1]
???? ?

?
×g ir[N-1]
? ···
×g
 ir[3] ?
? - +g
×g
 ir[2] ?
? - g+
×g
 ir[1] ?
? - +g
×g
 ir[0] ?-
- g+ y[t]

Fig. 18.5: Delay line convolution.


18.4 Multiple Taps 281

An AuLib class derived from Delay models the direct convolution processor
(Listing 18.4). It takes in a function table object holding the impulse response, op-
tionally truncating its size (i.e. using only a portion of the impulse response).

Listing 18.4: The AuLib::Fir class.


/** This class implements a direct convolution
engine using an impulse response defined in
a function table.
*/
class Fir : public Delay {

virtual const double *dsp(const double *sig,


double dt) {
return dsp(sig);
}
virtual const double *dsp(const double *sig,
const double *dt) {
return dsp(sig);
}
virtual const double *dsp(const double *sig);

protected:
const double *m_ir;
uint32_t m_ir_nchnls;
uint32_t m_chn;

public:
/** Fir constructor \n\n
ir - impulse response
chn - selected channel from IR
len - if > 0, set the FIR length
vframes - vector size \n
sr - sampling rate
* /
Fir(const FuncTable &ir, uint32_t chn = 0,
uint32_t len = 0,
uint32_t vframes = def_vframes,
double sr = def_sr)
: Delay((len > 0 && len <= ir.vframes() ?
len : ir.vframes()) / sr, 0, vframes, sr),
m_ir(ir.vector()), m_ir_nchnls(ir.nchnls()),
m_chn(chn < m_ir_nchnls ? chn :
m_ir_nchnls - 1) { };
};
282 18 Delay Line Processing

To implement the DSP method, we only need to modify the original delay line
algorithm so that we can get a delay out of every sample and apply the coefficient
to it. For this, we need to source the data from a function table (holding the m_ir
array). Another modification is that we can have an IR with multiple channels (in-
terleaved), in which case we will select a specific channel (m_ch) from it to apply
the convolution to the input signal:
const double *AuLib::Fir::dsp(const double *sig) {
double out = 0;
uint32_t N = m_delay.vframes();
uint32_t nchnls = m_ir_nchnls;
for (uint32_t i = 0; i < m_vframes; i++) {
m_delay[m_pos] = sig[i];
m_pos = m_pos != N - 1 ? m_pos + 1 : 0;
for (uint32_t j = 0, rp = m_pos; j < N;
j += nchnls) {
out += m_delay[rp] * m_ir[N - 1 - j + m_chn];
rp = rp != N - 1 ? rp + 1 : 0;
}
m_vector[i] = out;
out = 0.;
}
return vector();
}
The direct convolution algorithm demands a significant number of operations,
demanding N multiplications and additions for each sample. It is only practical for
very short delay lengths. For anything longer, we will need to employ a spectral
domain version, which can take advantage of the fast Fourier transform (FFT) algo-
rithm [36], as we will see in Chapter 19.

18.5 Lambda Functions

Now we turn to a programming topic that might be useful for the implementation
of these and other processes discussed in this book. In the C++11 ISO standard
a number of new features were introduced. Among these, we have the concept of
lambda functions, which are anonymous functions used in contexts where a small,
temporary function is called for. This concept originated in the mathematics of the
lambda calculus [9] and is commonly found in functional-style languages [1]. As-
sociated with it, we have the principle of a closure, which is an environment, a set of
variables etc., to which the function has access (it becomes part of its scope). C++
allows us to construct lambdas with a well-defined capture list that will define the
closure.
18.5 Lambda Functions 283

The lambda syntax is as follows:

[ capture-list ] ( parameter-list ) –> return-type { body }

where capture-list is a comma-separated list of the variables that are captured as part
of the closure. This can be empty (no captures), or it can name specific variables.
Captures can be by copy or by reference (using &); a lambda inside an object can
capture its members through its this pointer. The return type can be omitted in
most cases if it is clear implicitly what the function return type is, in which case the
syntax simplifies to:

[ capture-list ] ( parameter-list ) { body }

Lambda functions are useful when processing elements in a vector (or a list). For
example, to change the gain of a signal vector, we can use a combination of iterators
and a lambda, with the std::transform function from the standard library:
std::vector<double> audio(vecsize);
int gain;
...
std::transform(audio.begin(),audio.end(),audio.begin(),
[gain](double s) { return s*gain; });
This particular example will multiply every element of audio by gain and
return it. The elements will be transformed in place. A similar operation can be ap-
plied to any AudioBase-derived object in the AuLib library, because these objects
have iterators defined for them, which allows us to act on the audio vector directly.

18.5.1 Auto Types

A lambda expression has a type that is unique and unnamed (it is actually a tem-
porary object of the type ClosureType). It is possible to assign this expression
to a C-style function pointer containing the same type declaration as the resulting
function:
int (*add)(int, int) =
[](int a, int b) { return a + b; };
std::cout << add(2,1) << std::endl;
However, to simplify this and other use cases, the C++11 standard introduced the
auto type specifier, which allows variables to have their types deduced from their
initialisers. For example,
auto add = [](int a, int b) { return a + b; };
The auto specifier can be used for other variable types, whenever it is possible
to deduce from the context what the type is. It cannot be used, however, to name
284 18 Delay Line Processing

a function parameter type. For these, we need to be clear what the type is. In this
case, the exact function type is used, which can sometimes look complicated (e.g.
int(*)(int,int) in the example above). Thankfully, in these cases we can also
use the standard library utility class template std::function, with the function
arguments and return type as the template parameter, which looks much simpler and
C++-like: std::function<int(int,int)>.
Consider this example, a classic functional-style operation, where we take a func-
tion in as an argument to another function and return a new function with fewer
parameters:
auto curry = [](std::function<int(int,int)> f, int b)
{ return [f,b](int a){ return f(a,b); }; };
auto add1 = curry(add,1);
auto add2 = curry(add,2);
std::cout << add2(add1(1)) << std::endl;
The first line defines a lambda with two parameters: a function of two int vari-
ables returning an int, and an int. This function returns another lambda, of one
int parameter. Note that we capture two variables from the enclosing environment,
by copy: f and b. This allows us to access and use their values inside the lambda,
as we have seen before. Alternatively, we could have created a closure over the two
variables by using the default copy capture syntax, [=], instead of naming the two
individual items:
auto curry = [](std::function<int(int,int)> f, int b)
{ return [=](int a){ return f(a,b); }; };
Given that the enclosing environment has only the two variables we need, it prob-
ably makes sense to use this notation. The curry function allows us to get two dif-
ferent one-parameter functions from a two-parameter function. This is an example
of partial function application.

18.6 Conclusions

In this chapter, we have looked at delay line processing, including both fixed and
variable delays. Example DSP methods from the AuLib library were used to illus-
trate how these are usually implemented. The concept of convolution and its direct
application through a tapped delay line was also explored. In the next chapter, we
will re-examine the convolution algorithm to provide a very efficient implementa-
tion for it. To complement the discussion from a programming point of view, we
have introduced lambda functions and their application in C++.
18.6 Conclusions 285

Problems

18.1. Create a Schroeder reverb using four comb filters and two all-pass filters (see
[15] for an outline).

18.2. Design a comb filter class that includes high-frequency losses and apply it in
a second version of the above program.

18.3. Create a flanging effect with the following characteristics:


(a) Sinusoidal LFO with frequency control.
(b) Depth-of-modulation control.
(c) Feedback gain control.
(d) Dry + wet mix control.
Chapter 19
Frequency-Domain Processing

Abstract This chapter is dedicated to the topic of spectral processing, in different


applications. We introduce the main concepts related to frequency analysis, such as
the discrete Fourier transform, and provide an implementation-based exploration of
the typical fast Fourier transform algorithm. Applications in fast convolution and
streaming spectral processing are discussed, with a number of programming exam-
ples.

Processing audio in the frequency domain involves the transformation of wave-


form data into a different representation. As we have seen, a digital waveform can be
described as a discrete-time (and discrete-amplitude) encoding of continuous func-
tions that express the audio signals we want to manipulate. These are functions of
time, and therefore we are in this case working in the time domain. If, instead, we
take a specific point in time and look at the audio signal for its frequency content,
we need to transform our functions of time into functions of frequency, yielding the
spectrum of a waveform, which is a frequency-domain representation.
What is the representation of this frequency content that we are seeking to ma-
nipulate? Generally speaking, this could be expressed in more than one way, but the
most practical form is to model a waveform as a set of sinusoidal components. In
this case, just as we have conceived a sound in the time domain as a sequence of
samples indicating the amplitude at various time points, we can also think of it as
a set of amplitudes of sinusoids of different frequencies. In the time domain, this
sound is realised by samples being played one after the other at a certain rate (the
sampling frequency), whereas in the frequency domain, this happens through sum-
ming (mixing) all the component sinusoids scaled by their respective amplitudes. In
other words, a digital waveform is a sequence of amplitude samples; its spectrum is
also a sequence of amplitude samples, but, this time, each represents the weight of
a sinusoid of a given frequency in the mix.
While this is the general interpretation we should always have in mind, there
are a number of important details we should note. We will look at these one by
one, using a mostly non-mathematical approach. The signal processing aspects of
this are detailed in [36] (Chapter 7), which can be taken as a companion text to

© Springer Nature Switzerland AG 2019 287


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_19
288 19 Frequency-Domain Processing

this chapter. Nevertheless, it should be possible to gain a good understanding of the


subject from a programming perspective, which is the way we will proceed here.
Where a mathematical formula might be a succinct way of describing a process we
are about to implement, we will use it.

19.1 Fundamental Principles

The basic idea of what we will be programming in this chapter has already been
outlined completely: we will take a digital waveform, comprising N samples of a
function of time (i.e. the sound) and transform it into a spectrum made up of N
samples of a function of frequency. These samples are the weights of a set of N
sinusoids of different frequencies that, when mixed together, can recompose the
original waveform. That being conceptually straightforward, we can move on to the
details and principles that underpin it.
Let’s first think about how we can completely model a sinusoid of any kind. Sinu-
soid is the generic name we give to sine and cosine waves. This actually just refers
to a particular wave shape, which can be produced by either function. Therefore
there are three parameters that absolutely define such a wave, and can be used to
distinguish individual instances:
1. Frequency, or how many cycles it completes per second (Hz),
2. Amplitude, which can be measured by its greatest absolute value,
3. Phase, the waveform starting point, relative to a given reference (e.g. the cosine
wave phase).
Once we have these three parameters, the sinusoid is perfectly defined. The sam-
pled spectrum, which is a function of frequency modelling a digital waveform (as a
mix of sinusoids), is based on this representation, where:
• frequencies are determined by the sample index,
• amplitudes and phases are defined by the (2-dimensional) sample value.
It is possible to have an incomplete spectral representation using either the ampli-
tude or the phase, which is useful in some applications. In particular, the amplitude
spectrum is a common frequency-domain representation for the purposes of identi-
fying the distribution of energy in a waveform (see Fig. 19.1).
Just as in a digital waveform each sample index refers to a time point, in a sam-
pled spectrum it indicates a frequency point. If in the time domain we divide the
index by the sampling rate fs to get the sample time in seconds, in the frequency
domain we will divide the index by the number of samples in the waveform and
scale it by the sampling rate.
For example, let’s say we have a 1-second digital waveform (N = fs ): in this case
each spectral sample defines frequency points that are 1 Hz apart: 0, 1, 2, ... up to
fs 1
2 , which is half of the sampled spectrum. The other half refers to frequencies that
fs
1 This point, index N/2, refers to both 2 and − f2s Hz.
19.1 Fundamental Principles 289

are negative: 1 − f2s , 2 − f2s , ..., up to −1 Hz. Each one of these frequency points
represents a single sinusoid; together, this mix of N sinusoids composes the original
waveform.
The plots in Fig. 19.1 demonstrate this: we have a waveform with N = 100, and
its amplitude spectrum showing sinusoidal components at 1, 5, 11, 16, and 17 Hz, as
well as −17, −16, −11, −5 and −1 Hz (we are using in this case fs = N). The left
half of the plot refers to the positive-frequency components, whereas the right side
displays the negative side of the spectrum. It is possible to notice that in this case,
the negative and positive spectra are mirror-like images of each other: the positive-
frequency indices are k, from 0 to N/2 and the negative-frequency ones are N − k,
from N/2 to N − 1. Therefore, the five components will show up at 1, 5, 11, 16,
and 17, and also at N − 1, N − 5, N − 11, N − 16, and N − 17. A single component
always shows up on both sides. This will always be the case for audio waveforms
(as discussed below).

waveform
1.0
0.5
0.0
−0.5
−1.0
0 20 40 60 80 100
amplitude spectrum
1.0
0.8
0.6
0.4
0.2
0.0
0 20 40 60 80 100

Fig. 19.1: A sampled waveform and its amplitude spectrum.

19.1.1 Complex Numbers

In order to represent both the amplitude and the phase of a sinusoid, each spectral
sample must be a two-dimensional number. While time-domain samples are unidi-
290 19 Frequency-Domain Processing

mensional, each just a single real number representing the amplitude at a time point,
a frequency-domain sample has to pack two numbers together so that it can repre-
sent amplitude and phase independently. This is done through a complex number,
which can be thought of as a pair of real numbers in separate dimensions.
Complex numbers have two representations (Fig. 19.2):

1. Rectangular, which can be interpreted geometrically as coordinates measured on


two number lines at 90 degrees to each other. The complex pair is a projection of
the number on these two lines, conventionally known as the real and imaginary
parts of the number. The usual mathematical notation adds a j to the imaginary
part of the number to distinguish it from its real part, e.g. a + jb.
2. Polar, which can be interpreted geometrically as a line starting from the intersec-
tion of the real and imaginary lines (called the origin) to the point representing
the complex number. This line makes an angle with the real axis. The two parts of
the number are the line length (magnitude) and angle. For the sinusoids in ques-
tion, these parts correspond directly to their amplitude and phase, respectively.
Therefore one complex spectral sample can determine the frequency (through its
index), amplitude, and phase of a component of a waveform.

19.1.2 Spectral Analysis

From a digital waveform, we can determine its spectrum by performing a discrete


Fourier transform (DFT) [8]. This is an operation that can be defined by the follow-
ing formula [33]:
    
1 N−1 t t
X(k) = ∑ x(t) cos 2π k fs − j sin 2π k fs
N t=0
(19.1)

where x(t) is the waveform, N is its size in samples, X(k) is its spectrum, t is the
time index, and k is the frequency index. To obtain the full spectrum, we increment
k from 0 to N − 1. In this case, we call N the transform size.
As we can see, the operation involves the sample-by-sample product of the wave-
form with a complex sinusoid (cos(ω ) − j sin(ω )), which is made up of a cosine for
its real part and a sine for its imaginary part (with a negative sign). This yields
N complex numbers, which are all summed together and scaled by N1 to give the
spectral sample at frequency point k. The principle behind this is that the complex
sinusoid works as a detector: if a sinusoidal component exists in the vicinity of
its frequency, the complex sinusoid will pick it up, yielding a non-zero sample. The
spectrum X(k) can be used to recompose the original waveform by using the inverse
formula (the inverse DFT, or IDFT):
N−1     
t t
x(t) = ∑ X(k) cos 2 π k
fs
+ j sin 2 π k
fs
(19.2)
k=0
19.1 Fundamental Principles 291

0.8

imag (j)
0.6
z = a + jb
b
0.4

)
p(z
am
0.2

pha(z)
0.0
a real

−0.2

−0.2 0.0 0.2 0.4 0.6 0.8

Fig. 19.2: Geometrical interpretation of the complex number z = a + jb = 0.5 + 0.5 j.

where we vary the time t from 0 to N − 1 to get all the waveform samples. Each
complex amplitude sample X(k) scales a complex sinusoid at frequency k/ fs . Half
of these frequencies are in the positive spectrum (from 0 to f2s Hz, and half in the
negative side (from − f2s to close to 0 Hz). The waveform x(t) is a mix of these
components.
The spectrum given by Eq. 19.1 comes out in rectangular form, as real and imag-
inary pairs. To get the amplitude and phase, we need to convert this to polar form
using the following expressions, for an arbitrary complex number z = a + jb:

A(z) = a2 + b2 (19.3)
 
b
φ (z) = arctan (19.4)
a
The first of these is also known as the absolute value, modulus, or magnitude of
the complex number, and can be calculated with the C++ abs() function, as we
292 19 Frequency-Domain Processing

will see later. The second is also known as the argument, and can be computed with
arg() (or also atan2()). From the amplitude A(z) and phase φ (z), we can get
the rectangular form a + jb with

a = A(z) cos(φ (z)) (19.5)


and

b = A(z) sin(φ (z)) (19.6)


Together with the DFT and IDFT, these expressions are the fundamental tools of
spectral analysis and resynthesis, and we will be employing them in various situa-
tions later in this chapter.
An important assumption that is built into the DFT is that the waveform that is
subject to the operation is periodic, with the period T = N. In other words, the anal-
ysis breaks down the input as if it were made of up of harmonics of fs /N Hz, which
might be appropriate in some cases, but not ideal in others. It is always important to
bear this in mind when attempting to understand the result of the DFT.
Another way to interpret this is that each frequency point is in fact a band, chan-
nel, or bin, centred at a given multiple of fs /N Hz. When the input period T does
not align with a multiple of N, there will be some smearing in the analysis results,
and a single sinusoidal partial will be detected in more than one bin. Since this is
more likely to be the case in practical applications of the DFT, this is a better way
of reading the analysis results.
One final point worthy of note is that audio waveforms are unidimensional, and
therefore can be represented by functions of real numbers. The spectrum of an audio
waveform, as we have seen, is represented by a function of complex numbers. When
we recompose the waveform from the spectrum, the result is again unidimensional,
as the imaginary part in the results of Eq.19.2 gets cancelled out. So we can go from
real waveform to complex spectrum, and vice versa.
An important characteristic of the complex spectrum of real functions is that it
displays a certain symmetry2 : the amplitudes are symmetric (mirrored) about 0 Hz,
and the phases anti-symmetric. An example of this was shown in Fig. 19.1, for the
amplitude spectrum only. This means that we can always infer the negative spectrum
from its positive side in this case, and therefore we can work solely with positive
frequencies. This will allow some simplifications and efficiencies when we come to
program the DFT.

19.2 The Fast Fourier Transform

The DFT in Eq. 19.1 can be written out in C++ using one loop nested inside another.
The inner loop realises the sum of products, and the outer one increments the fre-

2 This is called a Hermitian symmetry.


19.2 The Fast Fourier Transform 293

quency index k. The following function implements a DFT of a real-valued input,


excluding the 1/N scaling:
#include <complex>
#include <cmath>
using namespace std::complex_literals;
extern double TWOPI;

void dft(double *x, std::complex<double> *X,


int N, float sr) {
double w;
for(int k =0; k < N; k++) {
X[k].real(0.);
X[k].imag(0.);
for(int t = 0; t < N; t++) {
w = TWOPI*k*t/sr;
X[k] += x[t]*(cos(w) + 1i*sin(w));
}
}
}
We should note the std::complex<T> type, which holds a complex pair of
type T. This class is made up of two contiguously-stored numbers, and a number
of methods to manipulate them. It is possible to cast it to an array of two; a vector
of complex numbers can also be reinterpreted as a real-valued vector of twice its
length. In C++14, the constant 1i is a complex type (with real part = 0), and there-
fore we can use it to make sin(w) the imaginary part of a complex sinusoidal. It
allows us to translate the complex product in Eq. 19.1 more or less directly. This
constant is available in the namespace std::complex_literals.
While the DFT implementation is quite compact, it is of considerable computa-
tional complexity, especially if N is large. For each of the N output samples we have
to compute N multiplications and additions. If N is a highly composite number, such
as a power of two, many of the exact same operations are repeated in the process. It
is possible to factor them to allow for an efficient implementation of the DFT. This
is called the fast Fourier transform (FFT).
We will now examine the implementation of the most common type of FFT, the
one where we use power-of-two transform sizes, called the radix-2 FFT. While the
mathematical details of its derivation are beyond the scope of this book (but explored
in the companion text [36]), we will give an outline of the algorithm and explore a
reference implementation in C++.
The first step in working out the details of the FFT is to take a very general
approach, assuming that the input is a complex function, rather than considering
the specific case of audio signals, where we are dealing with real-valued data. We
can apply it to our case simply by placing the waveform samples in the real part
of each complex number, and zeros in the imaginary part. However, in a second
294 19 Frequency-Domain Processing

stage, we will actually be able to take advantage of real-valued input to simplify the
computation even further.
So for a complex-to-complex DFT, we can proceed to break it down into a fun-
damental operation, which is applied iteratively to the input to yield the transform.
The outline of the process is as follows. For a transform of size N, to implement a
radix-2 FFT, we start with setting a counting variable M = 2 and

1. Divide the N input (complex) samples into one or more sets containing M sam-
ples each.
2. Mark each half of each set alternately as even and odd.
3. Apply a pair of equations to transform each data set (of size M) in place (i.e.
replace the original data with the results), merging the even and odd sides to
produce the result.
4. Multiply M by 2.
5. If M > N, stop; otherwise continue from 1.

The core of this process is in step 3, where we apply the following expressions
[36]:

XM (k) = E(k) + ω −k O(k)


(19.7)
XM (k + M/2) = E(k) − ω −k O(k)

with 0 < k < M, E(k) and O(k) are the even and odd input data, and XM (k) is where
we will store the result. The ω −k factor is a complex sinusoid (which stems from
the complex product in the straight DFT formula), also known as the twiddle factor.
These two equations arise from the factorisation of the sums and products in the
DFT formula. They represent the fundamental computation needed to calculate the
transform, which will be applied repeatedly to the input data in several stages.
In an implementation of the FFT, this step represents the innermost loop, which
iterates over the N-sample array and applies the transform to sets of M samples:
for (uint32_t m = 0; m < n; m++) {
M = n * 2;
for (uint32_t k = m; k < N; k += M) {
i = k + n;
even = s[k];
odd = w * s[i];
s[k] = even + odd;
s[i] = even - odd;
}
w *= wp;
}
In this code, n represents M/2, and k is the index k, which is offset by M each
time, so that the transform covers all of the N-sample input in blocks of M samples.
The variables even and odd hold the even and odd inputs to the expression, which
19.2 The Fast Fourier Transform 295

are taken from the s array (the input data). The results are put back in place, ready
for the next iteration.
The twiddle factor represented by the variable w needs to be updated every time
k is incremented by raising it to higher powers (ω −k ). Starting from 1, we use
cos(−2π /M) + j sin(−2π /M) to update it through a spectral product (w *= wp).
This is done in the code for every start offset of the variable k. In the base case
of M = 2, this never happens, because we are transforming pairs of numbers. For
longer transforms, ω is incremented so that the sinusoid can complete one cycle
over the M/2 samples.
The complete three-level iteration code involved in the FFT is shown below:
for (uint32_t n = 1; n < N; n *= 2) {
o = -pi/n;
wp.real(cos(o)), wp.imag(sin(o));
w = 1.;
for (uint32_t m = 0; m < n; m++) {
for (uint32_t k = m; k < N; k += n * 2) {
i = k + n;
even = s[k];
odd = w * s[i];
s[k] = even + odd;
s[i] = even - odd;
}
w *= wp;
}
}
This is the basic radix-2 FFT algorithm, which is almost as compact as the
straight DFT, but has a lower computational complexity. A transform of size N re-
quires roughly N 2 operations, whereas the FFT only needs N log2 N. This becomes
quite significant when N is large.
There are a few other requirements to complete a full reference FFT implementa-
tion. The first of these is to do with the order of data points at the end of the process.
Because the FFT operates by merging the even and odd data points of the input, in
successive iterations, we will need to reorder the input so that the output frequency
bins are in the correct sequence. That is, we will need to move the samples of the
input waveform around to place them in even and odd pairs, otherwise the output
spectrum will be scrambled. Alternatively, we can reorder the output.
A diagram of the FFT (for N = 8) including the reordering of the input is shown
in Fig. 19.3. We can see how the merging performed in step 3 affects the order
of the data in each successive iteration. The required reordering can be described
recursively as splitting the sequence into even and odd points. Starting with M = N,
1. Divide an M-sample block into two sets of M/2,
2. Move every odd sample to the right-hand side and every even sample to the left-
hand side,
296 19 Frequency-Domain Processing

3. If all N samples have been processed, continue; otherwise start from 1. with the
next block of M samples,
4. Divide M by 2,
5. If M = 2, stop; otherwise start from 1. with the first block of M samples.

x(0) x(1) x(2) x(3) x(4) x(5) x(6) x(7)


re-order
?
x(0) x(4) x(2) x(6) x(1) x(5) x(3) x(7)
E(0) O(0) E(0) O(0) E(0) O(0) E(0) O(0)

? ? ? ?
X2 (0) X2 (4) X2 (2) X2 (6) X2 (1) X2 (5) X2 (3) X2 (7)
E(0) E(1) O(0) O(1) E(0) E(1) O(0) O(1)
? ?
X4 (0) X4 (2) X4 (4) X4 (6) X4 (1) X4 (3) X4 (5) X4 (7)
E(0) E(1) E(2) E(3) O(0) O(1) O(2) O(3)

?
X8 (0) X8 (1) X8 (2) X8 (3) X8 (4) X8 (5) X8 (6) X8 (7)

Fig. 19.3: Diagram of operations for FFT with N = 8, adapted from [36].

For instance, let’s take an array of eight points {0, 1, 2, 3, 4, 5, 6, 7}. The first step
is to divide it into two sets and separate the even and odd points: {0, 2, 4, 6} and
{1, 3, 5, 7}. Then we take each of these and split it into two sets, separating the even
and odd points again: {0, 4}, {2, 6}, {1, 5}, and {3, 7}. Since we now have pairs, we
have finished the reordering: {0, 4, 2, 6, 1, 5, 3, 7}.
However, this is an expensive way of reordering, since we may have to iterate
several times over the data. There is a pattern in the reordering, which we can ob-
serve to simplify things. Some data points stay put, so we could ignore them, and
others get swapped. If we observe the bit-pattern of each sample index, we will know
which ones have to be moved. By reversing these bit-patterns we can determine the
swaps, for instance for N = 8, we have the following indices (0–7):
000 001 010 011 100 101 110 111
19.2 The Fast Fourier Transform 297

which reversed become


000 100 010 110 001 101 011 111
allowing the data to be swapped accordingly: 1 & 4, 3 & 6, the rest staying in
place. We should note that this results in the same reordering as in the worked-out
example: {0, 1, 2, 3, 4, 5, 6, 7} becomes {0, 4, 2, 6, 1, 5, 3, 7}, where only two pairs of
points were actually swapped. If we observe this, it is possible to write a function
that only acts on the required points, in one single iteration over the data array:
static void
reorder(std::vector<std::complex<double>> &s) {
uint32_t N = s.size();
uint32_t j = 0, m;
for (uint32_t i = 0; i < N; i++) {
if (j > i) {
std::swap(s[i], s[j]);
}
m = N / 2;
while (m >= 2 && j >= m) {
j -= m;
m /= 2;
}
j += m;
}
}
Finally, to complete the FFT function, we should also note that the same code can
be employed for the inverse transform, with a simple change of sign in the twiddle
phase calculation. Therefore we can have a single function to perform the DFT and
IDFT very efficiently. The code for this, which is part of AuLib is shown in Listing
19.1. It calculates an in-place transform, that is, the output data replaces the input.

Listing 19.1: A reference C++ implementation of the radix-2 FFT in AuLib


void transform(std::vector<std::complex<double>> &s,
bool dir) {
uint32_t N = s.size();
std::complex<double> wp, w, even, odd;
double o;
uint32_t i;
reorder(s);
for (uint32_t n = 1; n < N; n *= 2) {
o = dir == forward ? -pi / n : pi / n;
wp.real(cos(o)), wp.imag(sin(o));
w = 1.;
for (uint32_t m = 0; m < n; m++) {
for (uint32_t k = m; k < N; k += n * 2) {
298 19 Frequency-Domain Processing

i = k + n;
even = s[k];
odd = w * s[i];
s[k] = even + odd;
s[i] = even - odd;
}
w *= wp;
}
}
if (dir == forward)
for (uint32_t n = 0; n < N; n++)
s[n] /= N;
}

19.2.1 Real-to-Complex and Complex-to-Real Transforms

Finally, as we have discussed earlier on, we will be dealing with real-valued input
data, and therefore we can take advantage of this to produce spectral data represent-
ing only the non-negative frequencies. This would allow us to use a half-size FFT
to compute the spectrum, thus saving some more computation in the process. The
approach is as follows:
1. We will re-nterpret the real input data as if it were a half-size complex array. The
even samples will be taken as the real parts and the odd as the imaginary parts of
these complex numbers.
2. We can then apply the half-size complex FFT from Listing 19.1.
3. We convert the output data into a form that contains only the non-negative spec-
trum. We treat the points referring to 0 Hz and f2s separately, as these will always
have zero imaginary parts. They are derived from the complex data in the first
point of the output array.
4. The rest of the points are converted in a loop starting at 1, covering the rest of the
data. This operation combines the samples from the two halves of the spectrum
to produce the resulting non-negative frequency points. The mathematical details
of this conversion can be found in [36].
The non-negative spectrum contains N/2 + 1 bins, that is all positive frequency
points including f2s , plus 0 Hz. It means that we could potentially pack the output
into the same memory space of the input, since a complex number occupies twice
the size of a real sample. With N inputs there is space for N/2 outputs. How can we
store the extra point? There are two ways:
1. The packed format: since 0 Hz and the Nyquist frequency ( f2s ) points are purely
real (zero imaginary), we can place them together in the first array position in
place of a single complex number.
19.2 The Fast Fourier Transform 299

2. The extra point: we provide an extra position at the end, and place the Nyquist
sample there. This means that the imaginary parts of the first and last points of
the complex array will be zero.
We can accommodate the two formats in the real-input FFT by adding an optional
boolean argument that indicates whether or not a packed format is used (true by de-
fault). If an extra point is used, we will also expect the input array to contain enough
memory for it. The function implement a real-to-complex FFT is shown in Listing
19.2. It takes two required parameters: a complex vector and a double pointer. If the
two refer to the same memory, the transform happens in place. Otherwise, we will
expect the pointer to refer to an array of N samples that will be used for input. The
output will always be placed in the complex-data vector. The line
double *s = reinterpret_cast<double *>(c.data());
tells the compiler to reinterpret the vector data as a double array, so that if we
need to, we can copy in the real input data from a different location to reinterpret it
as complex and perform the FFT in place. This real-to-complex FFT function uses
the complex FFT as implemented in Listing 19.1 to perform the spectral analysis.

Listing 19.2: The real-to-complex FFT from AuLib.


void transform(std::vector<std::complex<double>> &c,
double *r, bool pckd) {
using namespace std::complex_literals;
uint32_t N = c.size() - (pckd ? 0 : 1);
std::complex<double> wp, w = 1., even, odd;
double o, zro, nyq;
double *s = reinterpret_cast<double *>(c.data());
if (s != r)
std::copy(r, r + 2 * N, s);
if (!pckd)
c.resize(N);
transform(c, forward);
zro = c[0].real() + c[0].imag();
nyq = c[0].real() - c[0].imag();
c[0].real(zro * .5), c[0].imag(nyq * .5);
o = -pi / N;
wp.real(cos(o)), wp.imag(sin(o));
w *= wp;
for (uint32_t i = 1, j = 0; i < N / 2; i++) {
j = N - i;
even = .5 * (c[i] + conj(c[j]));
odd = .5i * (conj(c[j]) - c[i]);
c[i] = even + w * odd;
c[j] = conj(even - w * odd);
w *= wp;
}
300 19 Frequency-Domain Processing

if (!pckd) {
c.resize(N + 1);
c[N].real(c[0].imag());
c[0].imag(0.);
c[N].imag(0.);
}
}
The inverse operation, complex-to-real, applies the same steps, but now in re-
verse (Listing 19.3). The data is converted back from a single-sided spectrum into
the original complex-FFT data, and then we use an inverse transform to obtain the
output. This is reinterpreted again as a real-valued sequence. Similarly, the function
takes in two required arguments, now in reverse order: a double pointer to the out-
put data location, and a complex vector containing the input. In-place transforms
are also possible if the two parameters refer to the same memory location.

Listing 19.3: The complex-to-real inverse FFT from AuLib.


void transform(double *r,
std::vector<std::complex<double>> &c,
bool pckd) {
using namespace std::complex_literals;
uint32_t N = c.size() - (pckd ? 0 : 1);
std::complex<double> wp, w = 1., even, odd;
double o, zro, nyq;
double *s = reinterpret_cast<double *>(c.data());
if (pckd)
zro = c[0].real() * 2., nyq = c[0].imag() * 2.;
else
zro = c[0].real() * 2., nyq = c[N].real() * 2.;
c[0].real(zro + nyq), c[0].imag(zro - nyq);
o = pi / N;
wp.real(cos(o)), wp.imag(sin(o));
w *= wp;
int j;
for (uint32_t i = 1; i < N / 2 + 1; i++) {
j = N - i;
even = .5 * (c[i] + conj(c[j]));
odd = .5i * (c[i] - conj(c[j]));
c[i] = even + w * odd;
c[j] = conj(even - w * odd);
w *= wp;
}
if (!pckd)
c.resize(N);
transform(c, inverse);
if (s != r)
19.2 The Fast Fourier Transform 301

std::copy(s, s + 2 * N, r);
if (!pckd)
c.resize(N + 1);
}
An example program is shown in Listing 19.4. It generates a test waveform with
three partials (1, 5, 13), then applies the real-to-complex DFT, as implemented
using the radix-2 FFT algorithm, and prints the amplitude spectrum. The trans-
form is performed in place, reusing the input memory, and we input the data as
a double array, which is reinterpreted as a std::complex<double> vec-
tor in the FFT function. The format of the spectral data is packed: that is, the 0 Hz
and Nyquist bins are stored as the first complex pair. We print out the amplitudes,
which are computed using the abs() function with a complex input.

Listing 19.4: Real-to-complex transform example.


#include <AuLib/fft.h>
#include <iostream>
#include <vector>
#include <complex>
#include <iomanip>

using namespace AuLib;


const int N = 32;

int main() {
// complex vector with N/2 bins (packed format)
std::vector<std::complex<double>> cdata(N/2);
// reinterpret it as double array
double *rdata =
reinterpret_cast<double *>(cdata.data());
// generate a N-sample waveform
for(int n=0; n < N; n++)
rdata[n] = sin((twopi*n)/N) + 0.5*sin(5*(twopi*n)/N)
+ 0.25*sin(13*(twopi*n)/N);
// apply the real-to-complex DFT
fft::transform(cdata, rdata);

// set printing precision to 2 decimal positions


std::cout << std::fixed;
std::cout << std::setprecision(2);
// print amplitude spectrum
for(auto s : cdata)
std::cout << abs(s) << std::endl;

return 0;
}
302 19 Frequency-Domain Processing

Piping the output of this code to a plotting program yields the graph in Fig.
19.4, where we can see the amplitudes of the three components in the waveform (1,
0.5, and 0.25). The FFT functions developed here will be the basic tools for all the
processes explored in the remaining sections of this chapter.

1.0

0.8

0.6

0.4

0.2

0.0
0 2 4 6 8 10 12 14 16

Fig. 19.4: A plot of a magnitude spectrum as produced by the program in Listing 19.4.

19.3 Fast Convolution

A very useful application of the FFT is to implement fast convolution. As we have


seen in Chapter 18, computing the output of a long FIR filter is very processing-
heavy, and may not be achievable in realtime. An alternative is to apply the operation
in the frequency domain. This is possible because the convolution of two waveforms
is equivalent to the multiplication of their spectra (Fig. 19.5). Therefore, if we have
a fast method of spectral analysis, we can perform this operation in a single pass in
the frequency domain.

signal - DFT
R
@ - IDFT - output
×

IR - DFT 
Fig. 19.5: Fast convolution: the product of two spectra is equivalent to the convolution of their
corresponding waveforms.

The convolution of a sequence of L samples by another of M samples results in


an output that is L + M − 1 samples long. This is because of the delay operation
19.3 Fast Convolution 303

involved in convolution. So, if we want to produce the convolution in the spectral


domain, we will need to use a transform size of at least L + M − 1 samples. Since
we are using the FFT, then this also needs to be a power of two. How can we work
under these restrictions? We can simply select our transform size according to these
requirements (a power of two no smaller than L + M − 1) and then zero-pad the
inputs to that length. That is, we pack the data with zeros to make up the correct
size. At the output, the first L + M − 1 samples are the result of the convolution.
With this in mind, we can test this idea in a program. This is what we will do:

1. Produce a test waveform and impulse response (IR), each of size N.


2. Create input arrays that can hold 2N − 1 samples and are set to a power-of-two
size.
3. Fill these with the signal and IR, padding with zeros.
4. Take their individual DFTs.
5. Multiply their spectra, sample by sample. We can store the result in one of the
existing arrays.
6. Take the inverse DFT of the result, which is the convolution output.

The two test signals that we will use are of the simplest kind: a single cycle of
a sine wave, and an IR consisting of one unit sample at N/4. The convolution of
these two inputs should show a sine wave delayed by N/4 samples. The full test
program is show in Listing 19.5. Since the FFT implements the 1/N normalisation
in the forward transform, we need to scale the impulse response by its size (N),
and therefore the unit sample is set to that value. In some implementations, the
normalisation is placed in the inverse transform instead, in which case this scaling
is not needed.
Note that in this particular application it suits us to use a non-packed format, with
the 0 Hz and Nyquist bins in separate complex array positions. This is because we
want to use a compact C++ std::transform algorithm to multiply each point.
The packed format would have meant a need to treat the first position differently
from the rest, which is less convenient in this case. A plot of the program output is
shown in Fig. 19.6, where we can see the sine wave delayed by N/4 (8) samples.

Listing 19.5: Fast convolution example.


#include <AuLib/fft.h>
#include <iostream>
#include <vector>
#include <complex>
#include <iomanip>

using namespace AuLib;


const int N = 32;

int main() {
// FFT size: 2N (N bins)
std::vector<std::complex<double>> ir(N+1);
304 19 Frequency-Domain Processing

std::vector<std::complex<double>> sig(N+1);
// input arrays
double *irdata =
reinterpret_cast<double *>(ir.data());
double *sigdata =
reinterpret_cast<double *>(sig.data());

// generate a 2N-sample sine wave


for(int n=0; n < N; n++)
sigdata[n] = sin((twopi*n)/N);

// zero-pad to 2N size
std::fill(sigdata+N, sigdata+2*N, 0.);
std::fill(irdata, irdata+2*N, 0.0);
// impulse response: single unit sample at N/4
irdata[N/4] = N;

// apply the real-to-complex DFT (not packed)


fft::transform(ir, irdata, !fft::packed);
fft::transform(sig, sigdata, !fft::packed);

// complex multiplication (in-place)


std::transform(sig.begin(), sig.end(),
ir.begin(), sig.begin(),
[](std::complex<double> sig,
std::complex<double> ir) {
return sig*ir; });

// complex-to-real IDFT
fft::transform(sigdata, sig,!fft::packed);

// set printing precision to 3 decimal positions


std::cout << std::fixed;
std::cout << std::setprecision(3);
// print convolution result
for(int n=0; n < 2*N; n++)
std::cout << sigdata[n] << std::endl;

return 0;
}
While this test program is a very good proof of concept for fast convolution, it
skirts around some practical problems that we are faced with when implementing
it in real-life scenarios, especially in realtime applications. First of all, it is very
unlikely that we will have the whole signal ready for input. In offline processing,
that might be the case, but then the FFT sizes involved may be extremely large if we
19.3 Fast Convolution 305

1.0

0.5

0.0

−0.5

−1.0

0 10 20 30 40 50 60

Fig. 19.6: A plot of the convolution output from the test program in Listing 19.4.

have minutes-long sounds to work with. In more general cases, it is not practical to
use a single-batch operation such as this one. Instead, we can use partitions of the
input signal and apply the convolution to these, and then recompose the signal using
the output. This is known as partitioned convolution.
There are two ways to go about this, both involving creating a partition of the
input signal whose size is determined by the impulse response (and set to a power
of two), applying the same process as outlined in the test program, and then gluing
the individual output blocks together to form the final result. For this, we need to
overlap the data: we can overlap the output or the input. The former is known as the
overlap-add algorithm (OLA) and the latter as the overlap-save (OLS) algorithm.

19.3.1 Overlap Add

Of the two options we have, the OLA approach is the simplest conceptually. It is
based on the principle that an N-size partition will result in a 2N − 1-sized output,
and that the partition blocks are spaced N samples apart. So, we can just extract N
samples from the input, pad them to the DFT length, apply the convolution, and then
place the result at N-sample boundaries, overlapping with the final N − 1 samples
of the previous partitition (Fig. 19.7). The partition size N is set to the next power
of two no less than the impulse response size.

19.3.2 Overlap Save

OLS is potentially more efficient because it avoids the need for an output overlap
by taking an overlapping transform instead. This is possible because of the fact
that the DFT has a circular (periodic) property: it implies that its input is periodic
over the transform size. So, instead of padding the input signal with zeros for the
306 19 Frequency-Domain Processing

A B C D

?
A zeros
?
convolution

?
A
B
C
D
Fig. 19.7: OLA method: each partition is zero-padded. The convolution is applied and the result is
overlap-added to form the output.

second half of the DFT block, it just fills it with input data without any padding.
We normally save the final N samples of the previous block to be placed in the first
half of the current 2N input frame. At the start of the process, the first N samples
are zero because there is no previous input block. The convolution is applied with a
zero-padded IR as before.
Only the N final samples are kept, the rest of the block discarded (because it
is redundant) and the result is just made up of a sequence of these blocks with no
overlaps (Fig. 19.8). The input partition boundaries are still at N samples, so what
we are in effect doing is taking 2N − 1 samples from the input waveform, but only
keeping N of them. The overlapping happens at the input, and it is caused by the
circular property of the DFT 3 . The partition size (N) is set in the same way as in
the overlap-add case.

19.3.3 Multiple Partitions

One difficulty with fast convolution is that in order to perform spectral analysis, we
need to wait until all data has been input into the DFT analysis frame. This effec-
tively places a fixed latency between the input and the output in a realtime scenario.
Since the transform size depends on the IR length, long filters will insert a notice-
able delay in the signal path. This would be the case for convolution reverb effects.

3 This is due to the fact that the DFT models a signal as if it were a periodic waveform DFT-size
samples long.
19.3 Fast Convolution 307

A B C D

?
(saved) A
?
convolution

?
discard A

?
A B C D
Fig. 19.8: OLS method: each partition is made up of the current N samples. preceded by the pre-
vious block, which has been saved from before. The convolution is applied and the result is just
placed in the output.

For disk rendering, this is not a problem as we can read ahead and compensate for
the latency, but for realtime input, it can be problematic.
A practical solution to minimise this problem is to create multiple partitions of
the impulse response, instead of a single one. This has the effect of reducing the
transform size, on one hand, but also increasing the computational demand, on the
other, as we are not able to take full advantage of the FFT. In practical terms, this in-
volves more iterations in the multiplication of the frequency-domain data, as we will
need to implement a spectral delay line to compute the convolution of the multiple
partitions (Fig. 19.9).
Here is an outline of what we need to do:
1. Split the impulse response into M partitions of N samples (and transform each
partition into spectral data, 2N bins)
2. Take the input data, using either the OLA or the OLS method, and keep filling
the convolution buffer.
3. When the buffer is full, perform the DFT and place the result in a delay line
containing M spectral blocks at the current write position.
4. Move the write position one partition ahead in the delay line (circularly).
5. Multiply each IR partition by a corresponding input block and sum the products
together.
6. Take the inverse DFT of this sum and place it in the output buffer.
7. Use either OLA or OLS to recompose the output.
As we can see from this, it is effectively a mix of the direct convolution and the
fast convolution methods. If the partition size is set to 1, then the DFT becomes an
identity operation and we have simply a delay line, sample-by-sample, process. If
it is set to match the impulse response size, we have a single partition and fast con-
308 19 Frequency-Domain Processing

st
st−N
?
DFT st−2N
?
DFT st−3N st−3(M −1)
?
DFT
? ?
DFT DFT

???? ?

?
×g DFT  irM −1
? ···
×g DFT  ir3 ?
? - +g
×g DFT  ir2 ?
? - g+
×g DFT  ir1 ?
? - +g
×g DFT  ir0 ?-
- g+ IDFT -y

Fig. 19.9: Partitioned convolution: each input block is placed in a delay line. The output is a sum
of all products of the input and IR spectra, to which an IDFT is applied.

volution. Anything in between combines the two approaches, to balance out latency
and computational efficiency.
We can now examine the implementation of partitioned convolution in AuLib.
This is found in the AuLib::PConv class, which implements both OLA and OLS,
in single or multiple partitions (the actual algorithm used can be selected by the
constructor). The IR data is provided by a function table, and kept in the spectral
domain. The input data is taken in, transformed, and then the convolution with the
IR is applied to it. Listing 19.6 shows the partitioned convolution code, which is
common to both OLA and OLS. This code is responsible for steps 2–5 in the outline
of the algorithm.

Listing 19.6: Partitioned convolution core.


void AuLib::PConv::convolution(){
// transform it and store in the delay line
fft::transform(m_del[m_p], m_in.data(),
!fft::packed);
// clear the spectral mix buffer
std::fill(m_mix.begin(), m_mix.end(), 0.);
19.3 Fast Convolution 309

// increment the delay write position


m_p = m_p == m_nparts - 1 ? 0 : m_p + 1;
auto del = m_del.begin() + m_p;

// do the spectral products and mix


// m_part contains all IR partitions
// m_del is the input delay line
for (auto part = m_part.rbegin();
part != m_part.rend();
part++, del++) {
if (del == m_del.end())
del = m_del.begin();
auto dsamp = del->begin();
auto psamp = part->begin();
// product & sum
for (auto &mix : m_mix)
mix += *dsamp++ * *psamp++;
}

// inverse transform into output buffer


fft::transform(m_out.data(), m_mix,
!fft::packed);
}
With this core convolution code, we can implement the process using either OLA
or OLS. Listing 19.7 shows the PConv::ola() method, which implements OLA
as described in Sect. 19.3.1. The code is more or less self-explanatory, but has com-
ments that highlight the key steps of the process. The zero-padding of the input
happens as a consequence of never writing to the second half of the buffer.

Listing 19.7: OLA partitioned convolution method


const double *AuLib::PConv::ola(const double *sig){
for (uint32_t n = 0; n < m_vframes; n++) {
// data from sig feeds the conv input buffer
m_in[m_count] = sig[n];
// data from the conv output is
// overlap-added into signal vector
m_vector[n] = m_out[m_count] +
m_saved[m_count];
m_saved[m_count] = m_out[m_count + m_psize];

// if we have enough data in input buffer


if (++m_count == m_psize) {
convolution();
m_count = 0;
}
310 19 Frequency-Domain Processing

}
// return the object signal vector pointer
return vector();
}
Likewise, the OLS implementation follows similar principles, but replaces the
method of input and output (Listing 19.8.) We can see that the main difference is
that we will not zero-pad the input; instead we will have it filled completely with
input data. In order to stream the data properly, this implementation writes new data
to the second half of the buffer, and, after the process is complete, saves this back
into the first half for next time (this is the save aspect of the method). The result
always consists of the second half of the output buffer, with no overlaps. As we can
see, the actual convolution process is the same in both approaches.

Listing 19.8: OLS partitioned convolution method


const double *AuLib::PConv::ols(const double *sig) {
for (uint32_t n = 0; n < m_vframes; n++) {
// data from sig feeds the conv input buffer
// always in the second half of the buffer
m_in[m_psize + m_count] = sig[n];
// we output only the second half of the
// conv output buffer
m_vector[n] = m_out[m_psize + m_count];
if (++m_count == m_psize) {
convolution();
// save the second half of the previous input
// back in the first half.
std::copy(m_in.begin() + m_psize, m_in.end(),
m_in.begin());
m_count = 0
}
}
return vector();
}

19.3.4 Convolution Reverb

One of the typical applications of fast convolution is in the implementation of re-


verberation. Convolution reverb uses IRs recorded in different locations to create
very realistic room effects. As we have noted above, processing live input requires
that we minimise the input–output latency imparted by the fast convolution process.
In fact, it is possible to reduce it to zero by employing a scheme with non-uniform
partition sizes [19].
19.3 Fast Convolution 311

The principle is to divide the IR into sections, where the front ones have small
partition sizes, and those towards the end have longer sizes. For instance, we could
divide an arbitrary-length IR into three sections: early, using direct convolution; mid,
using a small partition; and tail, with a long partition, as illustrated in Fig. 19.10. In
order for the sizes to fit together, we start with a small power of two for the early
section (e.g. 32 or 64), which then becomes the partition size for the middle section.
The sizes of this section and the first section determine the partition size of the final
stretch (a power of two, of course).

early
mid
tail

early mid tail


Fig. 19.10: Non-uniform partitioned convolution, splitting the IR into three sections.

So, for example, we can divide the impulse response into 32, 992, and L − 1024
samples, with L as the IR size. The partition sizes are then, respectively, 1, 32 and
1024. We use a different convolution for each section, feeding them with the cor-
rectly offset portions of the impulse response: the first 32 samples; the samples 32
to 1023; and the samples from 1024 to L − 1. We then run the three processors and
add their output together.
The following code shows an example of this principle. It is a reverb class made
by composing three AuLib classes Fir (direct convolution), PConv (partitioned
convolution), and SampleTable (a function table used to hold an IR taken from
a file). The class is designed to be able to take an AuLib AudioBase object as in-
put (that is, an arbitrary processing object) and apply a non-uniform multi-partition
convolution. The complete code for the Reverb class is shown in Listing 19.9.

Listing 19.9: Convolution reverb class.


#include <AuLib/Fir.h>
#include <AuLib/PConv.h>
#include <AuLib/SampleTable.h>
using namespace AuLib;

class Reverb {
// direct convolution frames
const uint32_t dfrms = 32;
// tail partition size
const uint32_t part = 1024;
312 19 Frequency-Domain Processing

// impulse response
SampleTable m_ir;
// direct convolution
Fir m_early;
// middle section fast conv
PConv m_mid;
// tail section fast conv
PConv m_tail;

public:
// parameter is impulse response filename
Reverb(const char *impulse)
: m_ir(impulse, 1), m_early(m_ir, 0, dfrms),
// psize: drms, begin: dfrms end: part
m_mid(m_ir, dfrms, 0, dfrms, part),
// psize: part begin: part (to the end of table)
m_tail(m_ir, part, 0, part){};

const AudioBase &operator()(const AudioBase &in,


double g) {
m_tail(in);
m_mid(in);
// mix the three outputs
m_tail += m_mid += m_early(in);
// scale reverb
m_tail *= g;
// mix direct signal
return m_tail += in;
}
};
This reverb class can be used with any AuLib input object. Since its processing
method outputs a reference to an AudioBase object, it can be used as an ordinary
AuLib object, as well as avail of facilities in the library base class.

19.4 Streaming Spectral Processing

We might have noticed, even though this has not been explicitly stated, that the DFT
works like a snapshot of the frequency content of a waveform. It tells us something
about the N samples that went in, and assumes that these are a single period of a
wave that repeats forever. While we were able to put this to good use in implement-
ing fast convolution, at some point we might want to try to do more with it. The
most interesting types of audio signals have time-varying characteristics (frequency
19.4 Streaming Spectral Processing 313

glides, changes in amplitude and brightness, etc.). Therefore we need to try to cap-
ture this in the spectral analysis so that we can manipulate waveforms in useful
ways.
The key to this is an extension of the DFT to take account of time: the short time
Fourier transform (STFT). At one extreme, this is represented by taking individual
analyses of the signal at one-sample spacings. More commonly, however, we can
take these at larger intervals of, for instance, N/4 or N/8 samples. The STFT allows
us to process input signals as streams of data, producing frames of DFT bins at
every analysis period. Such spectral streams are then characterised by the following
parameters:

• DFT frame size (N): how many bins we are holding, which affects the width
of each bin, and therefore the frequency resolution of the analysis. The analysis
splits the positive spectrum into N/2 evenly-spaced bins (plus the extra bin at
the Nyquist frequency), centred at fs /N Hz. The more bins, the finer the analysis
will be frequencywise. However, since we are taking more samples, this will
also imply a longer time averaging of the data, which implies a less accurate
analysis time-wise. Common sizes for streaming spectral analysis are 1024 and
2048 samples (with bin bandwidths of 46.8 and 23.4 Hz, respectively at fs =
48, 000).
• Hopsize: how many samples of the input are in between each frame, affecting the
quality of the resolution of the analysis, and also determining the stream analysis
frame rate. Smaller intervals will also be more computationally demanding.
• Window size: when we extract samples from a waveform to make up an analysis
input frame, we are windowing the signal at a point. The size of this window is
generally the same as N, but can be different in some cases.
• Window shape: likewise, the windowing can simply extract the samples, in which
case we are using a rectangular window, or apply an envelope of some other
shape. This is used to smooth the transitions at the edges of the frame, providing
a better analysis and reducing the artefacts that occur when we apply the inverse
analysis to produce the output waveform.
• Data format: the format of the spectral frame data (e.g. polar, rectangular).
• Frame count: an index that defines the frame time in a stream, in much the same
way that a waveform frame index identifies a time point for that sample frame.
The STFT can be implemented as a sequence of DFT analyses spaced at hopsize
samples, but we have to address two additional issues that arise as part of this pro-
cess. The first of these is windowing. When we extract samples of an input stream,
it is likely that we will cut the waveform in awkward positions, creating abrupt dis-
continuities at the edges of the window. This leads to a certain amount of smearing
in the analysis, but, more critically, it prevents us from manipulating the spectral
data more generally, since the recomposed waveform blocks will not fit together
smoothly.
A solution to this problem is to employ an envelope, that is, a window shape
that will fade out to zero at its edges. Three examples of these are shown in Fig.
19.11, of which the third type (Hanning, also known as the Von Hann window) is
314 19 Frequency-Domain Processing

the most commonly used. Applying this to the data will prevent more significant
problems with waveform discontinuities. This also has the side effect of broadening
the analysis bandwidth to encompass not only a single bin, but also some of its
neighbours. The net effect is of a certain band overlap in the spectrum; we may
liken it to a band-pass filter bank whose bandwidths are wider than their spacing.

Bartlett
1.0

0.8

0.6

0.4

0.2

0.0
0 10 20 30 40 50 60

Kaiser
1.0

0.8

0.6

0.4

0.2

0.0
0 10 20 30 40 50 60

Hanning
1.0

0.8

0.6

0.4

0.2

0.0
0 10 20 30 40 50 60

Fig. 19.11: Three window shapes (N = 64): Bartlett (triangle), Kaiser, and Hanning (inverted raised
cosine).

The second issue is that since the spacing between the analysis time points is less
than a full frame (N), the phases in each analysis bin will contain an offset. This
will be relative to the time position modulo the frame size. If we do not intend to
19.4 Streaming Spectral Processing 315

manipulate the phases of the spectral data, we may ignore this, as this offset will not
matter. However, if the process we are applying affects the phases as well, then we
need to address this issue.
The simplest way to fix the phase offset is to rotate the samples of the waveform
relative to their time position modulo the frame size. For example, if the hopsize is
set to N/4, then successive frames will have the following rotations: 0, 1/4, 2/4, 3/4,
0, 1/4, . . . . To implement this, samples are moved around circularly in the frame,
before the actual DFT is applied. Alternatively, the offset can be corrected in the
phase of each bin, but that is probably less efficient (and more awkward), since the
rotation can be implemented as part of the data input process.
With these ideas in place, we can introduce a streaming spectral analysis/syn-
thesis framework, which we will see implemented in AuLib by the Stft and
SpecBase classes. Similar principles are implemented in Csound, as we will see
when we discuss frequency-domain plugins in Chapter 20. The main pieces are an
analysis (STFT) processor, which will produce the spectral-domain stream, and a
synthesis (Inverse STFT) processor, which will recompose this stream into a time-
domain signal. In between these two, a number of processes may be applied to the
data (Fig. 19.12).

input - STFT - . . . - ISTFT - output


spectral stream

Fig. 19.12: Streaming spectral processing framework flowchart.

Since we have a well-defined format (based on the stream parameters listed


above), this is an open framework for time-domain processing. Any process con-
suming and producing data in conformance with it can be slotted in between the
analysis and the synthesis.

19.4.1 Spectral Analysis

As we have outlined above, a spectral stream is created by producing a sequence of


DFT frames from an input signal. Assuming the hopsize H is less than the frame
size N, which is the most common case, the analyses will involve a certain input
overlap. That is, each frame will contain some samples in common with previous
ones. The extreme case is where H = 1, and N −1 samples overlap. Most commonly,
H = N/D, where D, called the decimation, is 4, 8, or 16, which is also the number
of overlapping frames.
The overlap in the analysis means that as we input data, each waveform sample
will feed D frames (Fig. 19.13). In AuLib, the analysis is implemented by the Stft
class inthe fft::forward mode. In this class, a two-dimensional array with D
316 19 Frequency-Domain Processing

rows and N columns is used to hold the overlapped input samples. The following
line in the analysis code feeds the input samples into D frames:
m_framebufs[j][m_pos[j]++] = sig[i];
where j is the frame index (0 to D − 1), m_pos[j] holds the sample count for
the respective frame, and sig[i] is the sample from the input signal feeding the
analysis. For each sample, we loop D times, placing it in each frame.
s(t)

?
0
0

?
1
3H

?
2
2H

?
3
H

Fig. 19.13: Overlapping frames in streaming spectral analysis, with D = 4.

Once the sample count for a frame reaches N, we can proceed to transform and
output new spectral data. This will happen at every H samples of input. At that point,
we window and copy the frame into the DFT input buffer and perform the transform
in place. The Stft class takes as input a reference to a table object which will
contain the window. This is set to have the same number of points as the analysis
frame, and is accessed via the m_win member variable.
As noted above, in order to correct the implicit phase offset that arises from over-
lapping analyses, the copying of the data has to take into account the required input
rotation. This is calculated according to the frame time modulo N, and is a periodic
circular shift: 0, H, 2H, . . . , (D − 1)H, which can be easily set by multiplying H by
the frame index j. We can then input the data into a reinterpreted double array as
follows:
uint32_t offset = j * m_H;
double *r = reinterpret_cast<double *>(m_cdata.data());
for (uint32_t n = 0; n < m_N; n++)
r[(n + offset) % m_N] = m_framebufs[j][n] * m_win[n];
fft::transform(m_cdata, r);
where m_cdata is a complex vector. Note that we are assuming a packed FFT
format with the 0 Hz and the Nyquist points forming the first pair in the array.
There are two options for the format of the bin data output by the object:
fft::rectang or fft::polar. If the former is chosen, then the data is simply
19.4 Streaming Spectral Processing 317

copied into the output buffer. Otherwise, it is converted into amplitude and phase
pairs before the copying.

19.4.2 Resynthesis

At the resynthesis stage, we will attempt to retrace the steps performed during the
analysis stage. The time domain signal will be a mix of D overlapping frames; there-
fore, for each sample we must loop D times, picking up the samples from each frame
and accumulating them, with the line
m_vector[i] += m_framebufs[j][m_pos[j]++];
which does the reverse of the analysis, where we extracted samples from the input
into the D frames.
At every H samples, we will take new data from the input, and then, depending
on the numeric format, we will either convert it to real + imaginary pairs, or just
copy the data into the DFT buffer. We then perform the inverse transform in place,
and feed the current frame. We will have to take care to reverse-rotate it and apply
a window. Finally, we overlap add all the transformed frames. The following code
takes care of all of these steps:
uint32_t offset = j * m_H;
double *r = reinterpret_cast<double *>(m_cdata.data());
if (m_repr == fft::polar) {
m_cdata[0].real(sig[0]), m_cdata[0].imag(sig[1]);
for (uint32_t n = 2, k = 1; n < m_N; n += 2, k++) {
m_cdata[k] = std::polar(sig[n], sig[n + 1]);
}
} else
std::copy(sig, sig + m_N, r);
fft::transform(r, m_cdata);
for (uint32_t n = 0; n < m_N; n++) {
m_framebufs[j][n] = r[(n + offset) % m_N] * m_win[n];
}
The STFT transforms in the forward and inverse direction can be combined into
one single function, which will act on an input signal that is either a vector of time-
domain samples or a spectral frame (with complex numbers stored as pairs). The
output vector will be also of one or the other form. The transform method of the
AuLib::Stft class is shown in Listing 19.10.

Listing 19.10: The short-time Fourier transform code in AuLib.


const double *
AuLib::Stft::transform(const double *sig,
uint32_t vframes) {
318 19 Frequency-Domain Processing

for (uint32_t i = 0; i < vframes; i++) {


if (m_dir == fft::inverse)
m_vector[i] = 0.;
for (uint32_t j = 0; j < m_D; j++) {
if (m_dir == fft::forward) {
m_framebufs[j][m_pos[j]++] = sig[i];
if (m_pos[j] == m_N) {
uint32_t offset = j * m_H;
double *r =
reinterpret_cast<double *>(m_cdata.data());
for (uint32_t n = 0; n < m_N; n++) {
r[(n + offset) % m_N] =
m_framebufs[j][n] * m_win[n];
}
fft::transform(m_cdata, r);
if (m_repr == fft::polar) {
m_vector[0] = m_cdata[0].real(),
m_vector[1] = m_cdata[0].imag();
for (uint32_t n = 2, k = 1; n < m_N;
n += 2, k++) {
m_vector[n] = std::abs(m_cdata[k]);
m_vector[n + 1] = std::arg(m_cdata[k]);
}
} else
std::copy(r, r + m_N, m_vector.begin());
m_pos[j] = 0;
m_framecount++;
}
} else {
m_vector[i] += m_framebufs[j][m_pos[j]++];
if (m_pos[j] == m_N) {
uint32_t offset = j * m_H;
double *r =
reinterpret_cast<double *>(m_cdata.data());
if (m_repr == fft::polar) {
m_cdata[0].real(sig[0]),
m_cdata[0].imag(sig[1]);
for (uint32_t n = 2, k = 1; n < m_N;
n += 2, k++) {
m_cdata[k] = std::polar(sig[n],sig[n + 1]);
}
} else
std::copy(sig, sig + m_N, r);
fft::transform(r, m_cdata);
for (uint32_t n = 0; n < m_N; n++) {
19.4 Streaming Spectral Processing 319

m_framebufs[j][n] =
r[(n + offset) % m_N] * m_win[n];
}
m_pos[j] = 0;
m_framecount++;
}
}
}
}
return vector();
}
An example of the typical usage of the Stft class in streaming processing is
shown in Listing 19.11. This example demonstrates the analysis/synthesis opera-
tions with no processing; any spectral manipulation would be placed in between
these two steps. The process() method calls transform() to do the analysis
or synthesis as shown in Listing 19.10.

Listing 19.11: Streaming spectral processing example program.


#include <Oscili.h>
#include <SoundOut.h>
#include <Stft.h>
#include <Wintabs.h>

using namespace AuLib;


int main(int argc, const char **argv) {
Oscil sig;
Hann win;
Stft ana(win, fft::forward), syn(win, fft::inverse);
SoundOut output(argv[1]);
// DSP loop
for (int i = 0; i < def_sr * 2; i += def_vframes) {
sig.process(0.5, 440.);
ana.process(sig);
syn.process(ana);
output.write(syn);
}
return 0;
}
The Stft class overrides the AudioBase signal arithmetic operators (see Ap-
pendix A) so that they can be applied to spectral data. It also contains other methods
that allow access to vector samples as complex numbers.
320 19 Frequency-Domain Processing

19.4.3 Spectral Manipulation

As we have noted before, any process that manipulates bin data may be applied to
the stream between the analysis and synthesis stages. For this, the most straightfor-
ward way to start manipulating the spectrum is to use a polar representation, which
conveniently separates the amplitudes detected at each bin from the phases. This
allows us to apply different types of filtering to the data by modifying only the am-
plitudes. We can think of the analysis as a bank of band-pass filters, spaced linearly
in fs /N-wide bins. To make a low-pass filter, we can draw an amplitude curve that
cuts off the higher bins, and apply it by multiplying the amplitudes of each frame.
The filter can be time-varying: each frame may have a unique curve applied to it.
Another effect could be created by using two spectral streams, multiplying their
amplitudes together and using the phases of one or the other. This would work as a
kind of cross-synthesis where one sound would be used as a ‘filter’ to modify the
other. A bin-by-bin ‘noise gate’ is another possibility. Using a spectral amplitude
mask (which could be derived from an input signal), cancel out all bins whose am-
plitudes fall below the mask amplitudes. A ‘spectral trace’ effect can also be created
by keeping only the n loudest bins and zeroing the amplitudes of the others. Many
different types of manipulation can be applied to the amplitudes alone.
Classes implementing these effects can be derived from SpecBase, which has
the fundamental supports in place for streaming spectral processing. For instance, it
includes a ready() method, which can be used to check if the input data is ready to
be processed. Called at every time-domain frame block, e.g. inside the DSP loop in
Listing 19.11, it updates an internal count and returns true when it is time to produce
a new spectral frame at the output. This allows seamless integration of spectral and
time-domain processing. For instance, a spectral processing object spec could be
slotted in between the analysis and synthesis, as in
ana.process(sig);
spec.process(ana);
syn.process(spec);
in which case it would call SpecBase::ready() to determine if it needed to act
on its input.
To process the spectral phases, we will need however to transform them into a
more useful format. By taking the inter-frame phase deltas at every bin, we can cal-
culate their instantaneous frequencies (IFs), which are more flexible to manipulate.
In fact, from these deltas we can in some cases detect quite accurate actual par-
tial frequencies. By applying some conversion to the raw phase differences, we can
obtain these in Hz. This will then allow us to hold frames of amplitude–frequency
pairs, opening up further possibilities for manipulation. This is known as phase
vocoder [16] analysis, synthesis, and processing.
In AuLib, we derive a Pvoc phase vocoder class from Stft, as the STFT is a
more general form of streaming spectral analysis. Taking a frame in polar format,
we
1. Take the difference, bin by bin, of the current and the previous frame phases (Δk )
19.4 Streaming Spectral Processing 321

2. Store the current phases for next time.


3. Unwrap the phase difference to bring it into the −π to π range.
4. Apply a conversion based on the bin centre frequency (k fs /N, where k is the bin
index), and a scaling by a constant defined by the ratio of fs and the hopsize
H times 2π (this is to do with how much the phase is supposed to increment
between frames):
fs fs
IF(k) = k + Δk (19.8)
N 2π H
5. Store the IF, replacing the phase.
Processing only needs to happen once there is a new frame at the input, produced
by the STFT every H time-domain frames. This is the norm for any streaming spec-
tral manipulation function: data only needs to be acted on at these time intervals.
For this purpose, we can keep track of the frame count and only process data if this
has been incremented.
To reconstitute the bin phases, we can do the process in reverse: apply the inverse
conversion equation, and then accumulate the results to produce a running phase for
each bin. Since instantaneous frequencies are phase deltas, the phases are obtained
by adding up all the instantaneous frequencies of successive frames. The code pro-
viding a forward and inverse phase vocoder transform is shown in Listing 19.12.

Listing 19.12: Phase vocoder code in AuLib.


const double *
AuLib::Pvoc::transform(const double *sig,
uint32_t vframes) {
// delta and conversion constants
double delta, c = m_sr / m_N,
d = m_sr / (twopi * m_H);
uint32_t fmcnt = m_framecount;

if (m_dir == fft::forward) {
// STFT forward transform
Stft::transform(sig, vframes);
// if we need to produce a new frame,
// m_framecount was updated in Stft::transform()
if (m_framecount > fmcnt)
m_done = false;
if (!m_done) {
// for each bin (except 0, Nyq)
for (uint32_t i = 2, j = 1; i < m_N;
i += 2, j++) {
// take the inter-frame delta
delta = m_vector[i + 1] - m_sbuf[j];
// save the current phase
m_sbuf[j] = m_vector[i + 1];
322 19 Frequency-Domain Processing

// unwrap the delta


if (delta >= pi)
delta -= twopi;
if (delta < -pi)
delta += twopi;
// apply the conversion into IF in Hz
m_vector[i + 1] = j * c + delta * d;
}
m_done = true;
}
} else {
// inverse transform
if (!m_done) {
// for each bin
for (uint32_t i = 2, j = 1; i < m_N;
i += 2, j++) {
// store the current amplitude
m_sbuf[i] = sig[i];
// re-cover the delta from IF in Hz
delta = (sig[i + 1] - c * j) / d;
// accumulate the delta
m_sbuf[i + 1] += delta;
}
m_done = true;
}
// apply inverse STFT
Stft::transform(m_sbuf.data(), vframes);
// check that we need to produce a new
// frame next time.
if (m_framecount > fmcnt)
m_done = false;
}
return vector();
}
This process provides a new format for spectral streams, one that holds pairs of
amplitudes and frequencies for each bin, in addition to rectangular and polar data
frames (Fig. 19.14).
With spectral data in amplitude–frequency format, all manner of manipulations
are possible. We can for instance, shift the pitch of a stream by scaling all frequen-
cies and then moving them (along with their amplitudes) to a new bin whose centre
frequency is close to the new frequency. We can create spectral morphing effects
by interpolating two streams. We can place the stream in a delay line and read it
at different rates, time-scaling the stream (stretching or compressing it). Many dif-
ferent effects are possible, given the malleable nature of the format. To implement
19.5 Conclusions 323

bin 1 bin 2

rectangular 0Hz Nyq re im re im . . .


bin 1 bin 2

polar 0Hz Nyq a ph a ph . . .


bin 1 bin 2

phase vocoder 0Hz Nyq a fr a fr . . .

Fig. 19.14: The different spectral frame formats and their data.

these, we can derive from SpecBase, as this class has the basic infrastructure for
amplitude–frequency processing.
Finally, this format allows us to resynthesise the data with an oscillator bank
instead of doing the inverse STFT operation. In this way, we can treat each bin as a
control stream containing the amplitudes and frequencies of a sine wave generator.
The output is a sum of all sine waves, up to one per bin, in a process called additive
synthesis. Furthermore, we may process frame data, finding the amplitude peaks
that correspond to waveform partials. By connecting these in successive frames, we
can create control tracks that will model these partials. This process would lead us
from N bins to M tracks (M < N). This has the effect of reducing the number of
oscillators needed, and effectively modelling the input as a mix of sinusoidal tracks
[45]. It is also possible to extract, separately, the more noisy and transient sound
components that resist representation as sinusoids [51, 57].

19.5 Conclusions

Frequency-domain audio effects are a very important component of the toolbox for
music signal processing. This chapter has introduced the fundamental aspects of
spectral manipulation: the DFT, the FFT, fast convolution, and streaming spectral
processing. These were explored from the perspective of object-oriented program-
ming and their implementation in AuLib. The DSP details of these operations are
described in more detail in the companion text [36], which the reader may refer to
also for ideas for frequency-domain processes that can be applied to streaming data.
In fact, in the next chapter, as part of discussing plugins in general, we will see how
streaming spectral processing can be implemented in another context and another
music computing framework.
324 19 Frequency-Domain Processing

Problems

19.1. Create a reverberation program to demonstrate the Reverb class (Listing


19.9).

19.2. Derive a class from AuLib::SpecBase to implement a low-pass spectral-


processing filter of your design, with time-varying parameters. Write a test program
to demonstrate it (using AuLib::Stft objects for analysis and synthesis). Make
sure the processing only occurs when a new frame is available for input.
Chapter 20
Plugins

Abstract In this chapter we introduce computer music instrument components as


plugins to larger systems. As a practical example, we take a look at a C++ frame-
work designed to facilitate the implementation of plugin opcodes for Csound. The
chapter explores each aspect of component development with reference to the prin-
ciples already introduced in earlier chapters, but now applied to plugins. Examples
of signal generation, processing, and spectral manipulation are provided.

One of the typical ways in which developers can supply new algorithms to ex-
tend working audio processing systems and digital audio workstations is through
a plugin mechanism. Plugins are in most cases built as dynamically-loaded mod-
ules, which implement some key functionality through a well-defined interface. The
most common languages in which these interfaces are defined are C and C++. The
OOP paradigm is particularly useful in this context, and we will observe that many
systems adopt it to provide a model for plugins. In this chapter, we will study the de-
velopment of plugins in C++ for Csound [39]. A basic understanding of this system
is assumed, and users may refer to the companion text [36] for a general introduc-
tion to it. However, many of the principles explored here can also be adopted more
widely in other systems.

20.1 Plugins in Csound

The Csound [39] sound and music computing system is composed of a set of pro-
cessing units known as opcodes or unit generators [34], which are central to it. In
order to allow for extensions to the system, Csound provides a plugin mechanism to
load new opcodes from user-supplied libraries. Plugins can be written in C or C++,
as well as using other programming languages. Csound has a very comprehensive
C API for opcode development, which provides low-level access to the underlying
audio engine to support it.

© Springer Nature Switzerland AG 2019 325


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0_20
326 20 Plugins

While the C interface follows an OOP design, it does not provide significant
support for some of the principles we have been developing in previous chapters,
particularly in terms of code reuse and benefitting from existing object-oriented
algorithms and libraries. As we have been learning, C++ in its more modern incar-
nations [63] supports these concepts very well, and, as an extension to C, can be
made to use the underlying API in a seamless way.
For this purpose, Csound now includes a lightweight C++ framework designed
to facilitate the programming of Csound plugins, the Csound Plugin Opcode Frame-
work (CPOF) [37]. This is effectively an OOP C++ layer that attempts to be thin,
simple, and complete, handling internal Csound resources in a safe way. It takes ad-
vantage of the same mechanisms provided by the underlying C interface. CPOF is
composed of the framework proper, that is, base classes that are specialised to make
new opcodes, and a collection of support classes that wrap the underlying C API in
a convenient way.

20.2 Framework Design

Csound plugins need to conform to a number of constraints, which will limit the
types of C++ construct we will be able to employ. The following list outlines some
of these conditions:

1. The Csound engine is written in C. It includes mechanisms to instantiate opcodes


and call opcode functions, which will work with C++ opcodes as long as these
obey some rules of linkage and data structures.
2. Owing to condition 1, we are not able to employ the dynamic binding that is
needed for virtual methods. All binding needs to be done at compile time.
3. Certain functions will have to be defined as static so that they can be regis-
tered with Csound.
4. Three different functions for signal processing are normally implemented in plu-
gins, which are set to be called by the engine at different action times. This makes
up the required method space for a plugin.
5. An opcode is derived from a basic model, which in C is given by the data struc-
ture OPDS. In CPOF, this becomes the fundamental base class to the opcode
framework.

Since we cannot make use of virtual functions but would still want to take ad-
vantage of inheritance for code reuse, we could have opted to use a design approach
called the curiously recurring template pattern (CRTP) [2]. This mimics the virtual-
function mechanism at compile time, at the cost of some loss in code readability and
a somewhat complex class structure.
However, as it turns out, there is no need for this, the reason being that, in the
narrow scope of Csound plugins, opcode classes are never user-instantiated, and
therefore it is much simpler to provide the correct function override. As the engine
is responsible for object instantiation and method calls, it is easy to define at compile
20.2 Framework Design 327

time which functions should be invoked. We can do this just by providing methods
that hide rather than override the base class ones.
We should remember, from Chapter 14, that if we do not mark a member function
as virtual, any appearances of the same in derived objects will hide the base class one
(instead of overriding it). However, given the particular application scenario here,
there is no difference between these two cases. We can have a plugin base class from
which we will inherit to create the actual opcodes. This class is derived from OPDS,
providing some extra members that may be useful to all of its subclasses. It will also
provide stub methods for the three required processing functions, which can then be
specialised in the actual opcode classes.

20.2.1 The Base Classes

The opcode framework actually consists of two template base classes, from which
opcode classes derive and instantiate. CPOF classes, enumerations, and functions
are declared in the namespace csnd, in the header file plugin.h.
The fundamental template base class, used in the majority of cases, is called
Plugin. In order to create a new opcode, we write our class as its subclass, and
instantiate the template by passing the numbers of outputs and inputs needed. For
example, a minimal, non-operational, opcode with one input and one output can be
declared as
#include <plugin.h>
struct Opcd : csnd::Plugin<1,1> { };
This class is perfectly correct from the viewpoint of the framework, although it
does not define any specialised processing code, therefore when the opcode is used,
nothing happens. In this case, the base class methods (stubs) are used, but they are
simply placeholders and have no processing code. Therefore, to implement any type
of action, we will need to define one or more of the following processing methods:
init(), kperf(), and/or aperf().
The base class also includes the follow member variables:

• outargs: a Params object holding output arguments.


• inargs: input arguments (Params).
• csound: a pointer to the Csound engine object.
• offset: the starting position of an audio vector (for audio opcodes only).
• nsmps: the size of an audio vector (also for audio opcodes only).

The second base class in the framework is FPlugin. This is derived from
Plugin, adding an extra member variable:

• framecount: a member to hold a running count of spectral frames.


328 20 Plugins

OPDS
6

Plugin
6

FPlugin Opcode
6

SpecOp
Fig. 20.1: CPOF base classes (solid) and opcode subclasses (dashed).

This is used in streaming spectral-processing opcodes (see Sect. 19.4). Figure


20.1 illustrates the CPOF base classes and opcode classes that can be derived from
them.
Opcode inputs and outputs (inargs and outargs) are passed via the Params
class,
template <uint32_t N> class Param {
MYFLT *ptrs[N];
...
};
which wraps the different types used by Csound (scalars, vectors, strings, spectral,
arrays). This class provides a number of convenience methods that can be used to
get or set the data in the desired form, as we will see in the examples below.

20.2.2 Deriving Opcode Classes

As noted above, deriving a class from Plugin requires that we provide processing
methods to be be called by the engine when the opcode is run. Csound has two basic
action times (or passes) for opcodes, when these methods are invoked:
1. Initialisation time (i-time): processing happens once at every new opcode instan-
tiation, or at specified re-initialisation steps. For example, when an instrument
containing an opcode starts, an init-time pass is carried out (once).
20.2 Framework Design 329

2. Performance time (perf-time): following an init-time pass, opcodes are called


in a loop (called the k-cycle loop) and their processing functions are invoked
periodically.

Code for the init-time pass should be placed in the init() function of the op-
code class. The perf-time pass can be of two types:
1. Scalar: the opcode consumes/produces single values. This is called control or
k-rate processing.
2. Vectorial: the opcode consumes/produces blocks of audio samples. This is cvalled
audio or a-rate processing.

These two modes of processing are implemented by the two other class meth-
ods, kperf() and aperf(), respectively. Opcodes can operate at i-time, k-rate,
and/or a-rate processing. If their inputs and outputs are only i-time variables, then
they only need to implement the init() method. If they take k-(scalar) or a-(vec-
torial) type variables, then they may need to implement the other functions, ac-
cordingly. In these cases, whenever an initialisation step is needed, we will have to
implement the i-time method also. When registering opcodes, as discussed in Sect.
20.2.3, we define what action times the opcode class has been designed for.
The following examples demonstrate the derivation of plugin classes for each of
these opcode types (i, k or a).

i-time opcodes

For init-time opcodes, all we need to do is provide an implementation of the


init() method:
struct Simplei : csnd::Plugin<1,1> {
int init() {
outargs[0] = inargs[0];
return OK;
}
};
In this simple example, we just copy the input arguments to the output once, at
init-time. Each scalar input type can be accessed using array indexing. All numeric
argument data is real, and declared as MYFLT, the internal floating-point type used
by Csound.

k-rate opcodes

For opcodes running only at k-rate (with no i-time operation), all we need to do is
provide an implementation of the kperf() method:
330 20 Plugins

struct Simplek : csnd::Plugin<1,1> {


int kperf() {
outargs[0] = inargs[0];
return OK;
}
};
Similarly, in this simple example, we just copy the input arguments to the output
in each k-period.

a-rate opcodes

For opcodes running only at a-rate (with no i-time operation), all we need to do is
provide an implementation of the aperf() method:
struct Simplea : csnd::Plugin<1,1> {
int aperf() {
std::copy(inargs(0)+offset,inargs(0)+nsmps,
outargs(0));
return OK;
}
};
In a Plugin-derived opcode, the number of samples in a vector is always given
by the nsmps member variable, and its starting position by offset. Since audio
arguments are nsmps-size vectors, we can get these as MYFLT pointers, using the
overloaded operator() for the inargs and outargs objects, which takes the
opcode argument number as a parameter.
Alternatively, we could use the AudioSig class that wraps these raw pointers,
facilitating an OOP approach to handling audio data. This class provides iterators,
as well as subscript access. Objects are constructed by passing the current plugin
pointer (this; see 14.5.3) and the raw parameter pointer, for example
csnd::AudioSig in(this,inargs(0));
csnd::AudioSig out(this,outargs(0));
std::copy(in.begin(), in.end(), out.begin());
Note that this class encapsulates an audio signal vector completely. Therefore, we
do need to refer directly to attributes such as the number of samples or the offset.
We will see in some of the following examples how this can be very helpful when
designing an opcode to process audio.
20.2 Framework Design 331

20.2.3 Registering Opcodes with Csound

Once we have written our opcode classes, we need to inform Csound about their
existence, so that they can be listed and employed in user code. For this we have the
function template plugin():
template <typename T>
int plugin(Csound *csound, const char *name,
const char *oargs, const char *iargs,
uint32_t thrd, uint32_t flags = 0)
Its parameters are:
• csound: a pointer to the Csound object to which we want to register our op-
code.
• name: the opcode name as it will be used in Csound code.
• oargs: a string containing the opcode output types, one identifier per argument
• iargs: a string containing the opcode input types, one identifier per argument
• thrd: a code to tell Csound when the opcode should be active.
• flags: multithread flags (generally 0 unless the opcode accesses global re-
sources).
For opcode type identifiers, the most common types are a (audio), k (control),
i (i-time), S (string), and f (fsig). For the thread argument, we have the following
options, depending on the class processing methods which we want to run in a given
opcode:
• thread::i: indicates init().
• thread::k: indicates kperf().
• thread::ik: indicates init() and kperf().
• thread::a: indicates aperf().
• thread::ia: indicates init() and aperf().
We instantiate and call these template functions inside the plugin dynamic library
entry-point function on_load(). This function needs to implemented only once1
in each opcode library. For example,
#include <modload.h>
void csnd::on_load(Csound *csound){
csnd::plugin<Simplei>(csound, "simple", "i", "i",
csnd::thread::i);
csnd::plugin<Simplek>(csound, "simple", "k", "k",
csnd::thread::k);
csnd::plugin<Simplea>(csound, "simple", "a", "a",
1 The header file modload.h, where on_load() is declared, contains three boilerplate calls to
Csound module C functions, required for Csound to load plugins properly. For this reason, each
plugin library should also include this header only once, otherwise duplicate symbols will cause
linking errors.
332 20 Plugins

csnd::thread::a);
return 0;
}
will register the simple polymorphic opcode, which can be used with i-time, k-
rate and a-rate variables. In each instantiation of the plugin registration template,
the class name is passed as an argument to it, followed by the function call. If the
class defines two specific static members, otypes and itypes, to hold the types
for output and input arguments, declared as
struct MyPlug : csnd::Plugin<1,2> {
static constexpr char const *otypes = "k";
static constexpr char const *itypes = "ki";
...
};
then we can use a simpler overload of the plugin registration function:
template <typename T>
int plugin(Csound *csound, const char *name,
uint32_t thread, uint32_t flags = 0)
For some classes, this may be a very convenient way to define the argument types.
For other cases, where opcode polymorphism may be involved, we may reuse the
same class for different argument types, in which case it is not desirable to define
these statically in a class.

20.3 The Csound Engine Object

Opcodes are run by an engine that is encapsulated by the Csound class. They all
hold a pointer to this, called csound, which is needed for some of the operations in-
voking parameters, and for some utility methods (such as console messaging, MIDI
data access, and FFT operations). The following are some useful public methods in
the Csound class:

• init_error(): takes a string message and signals an initialisation error.


• perf_error(): takes a string message, and an instrument instance and signals
a performance error.
• warning(): warning messages.
• message(): information messages.
• sr(): returns the engine sampling rate.
• _0dbfs(): returns the maximum amplitude reference.
• _A4(): returns the A4 pitch reference.
• nchnls(): return number of output channels for the engine.
• nchnls_i(): similarly, for input channel numbers.
• current_time_samples(): the current engine time in samples.
20.4 Opcode Programming 333

• current_time_seconds(): the current engine time in seconds.


• midi_channel(): the MIDI channel assigned to this instrument.
• midi_note_num(): the MIDI note number (if the instrument was instantiated
with a MIDI NOTE ON).
• midi_note_vel(): simliarly, for velocity.
• midi_chn_aftertouch(),midi_chn_polytouch(),
midi_chn_ctl(), midi_chn_pitchbend(): the MIDI data for this chan-
nel.
• midi_chn_list(): list of active notes for this channel.
• fft_setup(), rfft(), fft(): FFT operations.
In addition to these, the Csound class also holds a deinit method registration
function template for Plugin objects to use:
template <typename T> void plugin_deinit(T *p);
This is only needed if the plugin has allocated extra resources using mechanisms
that require deallocation (see Sect. 20.4.6). It is not employed in most cases, as we
will see below. To use it, the plugin needs to implement a deinit() method and
then call the plugin_deinit() method passing itself (through its this pointer)
in its own init() function:
csound->plugin_deinit(this);

20.4 Opcode Programming

In this section, we will look at key aspects of opcode programming, exploring the
various supports for typical application requirements, such as memory allocation,
function table access, use of external resources, and multithreading. As part of this,
we will also discuss the manipulation of several different Csound variable types in
a number of programming examples. In order to keep the examples meaningful, we
will re-implement some basic processing components already explored in earlier
chapters.

20.4.1 Delay Line

As a first example of audio signal processing, we implement here a simple delay line
[31] opcode, whose delay time is set at i-time, providing a slap-back echo effect
as discussed in Chapter 18. This will require us to allocate memory for the delay
buffer, which will need some special attention, as the mechanism to do this needs to
conform to certain conditions.
In order to be efficient, and also to prevent leaks and undefined behaviour we
need to leave all memory allocation to Csound and refrain from using C++ allocators
334 20 Plugins

or standard library containers that use dynamic allocation behind the scenes (e.g.
std::vector). If we follow these rules, our code will work as intended and cause
no problems for users.
This requires us to use the AuxAlloc mechanism implemented in the Csound
engine to manage memory dynamically. To access it, CPOF provides a wrapper
template class (which is not too dissimilar to std::vector) for us to allocate and
use as much memory as we need. This functionality is given by the AuxMem class,
which has the following methods and members:

• allocate(): allocates memory (if required).


• operator[]: array-subscript access to the allocated memory.
• data(): returns a pointer to the data.
• len(): returns the length of the vector.
• begin(), cbegin() and end(), cend(): return iterators to the beginning
and end of data.
• iterator and const_iterator: iterator types for this class.

With this in hand, we can create a class that implements a delay effect as de-
scribed in Sect. 18.2. The opcode is called by Csound with the following (functional)
syntax:
asig = delayline(ain, idel)
with ain as the input signal, and idel as the init-time delay time. The code is
outlined in Listing 20.1.

Listing 20.1: The delayline opcode.


struct DelayLine : csnd::Plugin<1,2> {
csnd::AuxMem<MYFLT> delay;
csnd::AuxMem<MYFLT>::iterator iter;

int init() {
delay.allocate(csound, csound->sr()*inargs[1]);
iter = delay.begin();
return OK;
}

int aperf() {
csnd::AudioSig in(this, inargs(0));
csnd::AudioSig out(this, outargs(0));

std::transform(in.begin(),in.end(), out.begin(),
[this](MYFLT s){
MYFLT o = *iter;
*iter = s;
if(++iter == delay.end())
iter = delay.begin();
20.4 Opcode Programming 335

return o;});
return OK;
}
};
In this example, we use an AuxMem iterator to access the delay vector. It is
equally possible to access each element with an array-style subscript. Since the ex-
tra memory allocated by this class is managed by Csound, we do not need to be
concerned about disposing of it. To register this opcode, we use
csnd::plugin<DelayLine>(csound,"delayline", "a", "ai",
csnd::thread::ia);

20.4.2 Table-Lookup Oscillator

The next example explores the principles of table lookup introduced in Chapter 13
by implementing a simple truncating oscillator. Access to Csound function tables is
also facilitated by a thin wrapper class that allows us to treat it as a vector object.
This is provided by the Table class, which has the following members:
• init(): initialises a table object from an opcode argument pointer.
• operator[]: array-subscript access to the function table.
• data(): returns a pointer to the function table data.
• len(): returns the length of the table (excluding guard point).
• begin(), cbegin() and end(), cend(): return iterators to the beginning
and end of the function table.
• iterator and const_iterator: iterator types for this class.
With the support of this class, we are able to implement any type of table lookup.
A typical application example is found in oscillators, as discussed in Chapter 13.
The Csound syntax for such an opcode is
asig = oscillator(kamp, kcps, itab)
where kamp and kcps are k-rate (scalar, control) signals for the amplitude and the
fundamental frequency in Hz, and itab is the i-time function table number. The
opcode class is laid out in Listing 20.2.

Listing 20.2: The Oscillator class.


struct Oscillator : csnd::Plugin<1,3> {
static constexpr char const *otypes = "a";
static constexpr char const *itypes = "kki";
csnd::Table tab;
double scl;
double x;
336 20 Plugins

int init() {
tab.init(csound,inargs(2));
scl = tab.len()/csound->sr();
x = 0;
return OK;
}

int aperf() {
csnd::AudioSig out(this, outargs(0));
MYFLT amp = inargs[0];
MYFLT si = inargs[1] * scl;

for(auto &s : out) {


s = amp * tab[(uint32_t)x];
x += si;
while (x < 0) x += tab.len();
while (x >= tab.len()) x -= tab.len();
}
return OK;
}
};
The table object is initialised by passing the relevant argument pointer to it (using
its data() method). Note also that, since we need to manipulate the phase index
very precisely, it is hard to use iterators in this case without making the code very
awkward. Therefore we employ straightforward array subscripting. The opcode is
registered by
csnd::plugin<Oscillator>(csound, "oscillator",
csnd::thread::ia);

20.4.3 Text Processing

Text in Csound is manipulated via S(string)-type variables. Such objects are held in
a STRINGDAT data structure,
typedef struct {
char *data;
int size;
} STRINGDAT;
which contains a data member that holds the actual string and a size member with
the allocated memory size. There are no classes to wrap strings, but translated access
to string arguments is provided through the Param object str_data() member
20.4 Opcode Programming 337

function. This takes an argument index (similarly to data()) and returns a refer-
ence to the string variable, as demonstrated in this example:
struct Tprint : csnd::Plugin<0,1> {
static constexpr char const *otypes = "";
static constexpr char const *itypes = "S";
int init() {
char *s = inargs.str_data(0).data;
csound->message(s);
return OK;
}
};
This opcode will print the string to the console. Note that we have no output
arguments, and so we set the first template parameter to 0. We register it using
csnd::plugin<Tprint>(csound, "tprint", "", "S",
csnd::thread::i);

20.4.4 Spectral Processing

As we have noted, for streaming spectral processing opcodes, we have a different


base class with the extra facilities needed for their operation (FPlugin). Spectral
data in Csound is carried by f-type variables (fsigs). These are held internally in a
PVSDAT C-language data structure.
To facilitate their manipulation, CPOF provides the Fsig class, derived from
PVSDAT. While fsigs can carry different types of spectral data, the most com-
mon format is the phase vocoder frame, composed of amplitude–frequency bins
representing equally-spaced frequency bands, as discussed in Chapter 19. The fsig
type encapsulates the spectral-processing signal parameters described in Sect. 19.4.
Csound also includes a special sliding mode, where the hopsize is set to 1 and frames
are produced at the audio rate.
To access phase vocoder bins, a container interface is provided by pv_frame
(or spv_frame, for the sliding mode)2 . This holds a series of pv_bin (spv_bin
for the sliding mode)3 objects, which have the following methods:
• amp(): returns the bin amplitude.
• freq(): returns the bin frequency.
• amp(float a): sets the bin amplitude to a.
• freq(float f): sets the bin frequency to f.
• operator*(pv_bin f): multiplies the amplitude of a pvs bin by f.amp.

2 pv_frame is a convenience typedef for Pvframe<pv_bin>, whereas spv_frame is


Pvframe<spv_bin>
3 pv_bin is Pvbin<float> and spv_bin is Pvbin<MYFLT>.
338 20 Plugins

• operator*(MYFLT f): multiplies the bin amplitude by f


• operator*=(): unary versions of the above.
The pv_bin class can also be translated into an std::complex<float> ob-
ject if needed. This class is also fully compatible with the C complex type and an ob-
ject obj can be cast into a float array consisting of two items (or a float pointer) us-
ing reinterpret_cast<float(&)[2]>(obj) or reinterpret_cast
<float*>(&obj). The Fsig class has the following methods:

• init(): initialisation from individual parameters or from an existing fsig. Also


allocates frame memory as needed.
• dft_size(), hop_size(), win_size(), win_type() and nbins(),
returning the PV data parameters.
• count(): get and set the fsig framecount.
• isSliding(): checks for sliding mode.
• fsig_format(): returns the fsig data format (fsig_format::pvs, ::polar
::complex, or ::tracks).

The pv_frame (or spv_frame) class contains the following methods:

• operator[]: array-subscript access to the spectral frame


• data(): returns a pointer to the spectral frame data.
• len(): returns the length of the frame.
• begin(), cbegin() and end(), cend(): return iterators to the beginning
and end of the data frame.
• iterator and const_iterator: iterator types for this class.

Spectral-processing opcodes run nominally at k-rate but internally use an up-


date rate based on the analysis hopsize. For this to work, a frame count is kept and
checked to make sure that we only process the input when new data is available.
As an example, the class in Listing 20.3 implements a simple gain scaler for fsig
variables.

Listing 20.3: The PVGain class.


struct PVGain : csnd::FPlugin<1, 2> {
static constexpr char const *otypes = "f";
static constexpr char const *itypes = "fk";

int init() {
if(inargs.fsig_data(0).isSliding()){
char *s = "sliding not supported";
return csound->init_error(s);
}
if(inargs.fsig_data(0).fsig_format()
!= csnd::fsig_format::pvs &&
inargs.fsig_data(0).fsig_format()
!= csnd::fsig_format::polar){
20.4 Opcode Programming 339

char *s = "format not supported";


return csound->init_error(s);
}
csnd::Fsig &fout = outargs.fsig_data(0);
fout.init(csound, inargs.fsig_data(0));
framecount = 0;
return OK;
}

int kperf() {
csnd::pv_frame &fin = inargs.fsig_data(0);
csnd::pv_frame &fout = outargs.fsig_data(0);
uint32_t i;

if(framecount < fin.count()) {


std::transform(fin.begin(), fin.end(),
fout.begin(),
[this](csnd::pv_bin f){
return f *= inargs[1];
});
framecount = fout.count(fin.count());
}
return OK;
}
};
Note that, as with strings, there is a dedicated method in the arguments object that
returns a reference to an Fsig class (which can also be assigned to a pv_frame
reference). This is used to initialise the output object at i-time and then to obtain
the input and output variable Csound processing data. The framecount member
is provided by the base class, as well as the format check methods. This opcode is
registered using
csnd::plugin<PVGain>(csound, "pvg", csnd::thread::ik);

20.4.5 Array Processing

The array type in Csound is defined by the C data structure ARRAYDAT. In order to
facilitate access to arguments of this type, CPOF provides a wrapper class, Vector.
This is derived from ARRAYDAT, and includes the following members:

• init(): initialises an output variable.


• operator[]: array-subscript access to the vector data.
• data(): returns a pointer to the vector data.
340 20 Plugins

• len(): returns the length of the vector.


• begin(), cbegin() and end(), cend(): return iterators to the beginning
and end of the vector.
• iterator and const_iterator: iterator types for this class.
• data_array(): returns a pointer to the vector data.
The inargs and outargs objects in the Plugin class have a template
method that can be used to get a Vector class reference. A trivial example is
shown below:
struct SimpleArray : csnd::Plugin<1, 1>{
int init() {
csnd::Vector<MYFLT> &out =
outargs.vector_data<MYFLT>(0);
csnd::Vector<MYFLT> &in =
inargs.vector_data<MYFLT>(0);
out.init(csound, in.len());
return OK;
}

int kperf() {
csnd::Vector<MYFLT> &out =
outargs.vector_data<MYFLT>(0);
csnd::Vector<MYFLT> &in =
inargs.vector_data<MYFLT>(0);
std::copy(in.begin(), in.end(), out.begin());
return OK;
}
};
Note that output arrays need to be initialised to a given length, which is done by
the Vector::init() method. This opcode is registered using the following line:
csnd::plugin<SimpleArray>(csound,
"simple", "k[]", "k[]",
csnd::thread::ik);
Since MYFLT arrays are the most commonly-used in Csound, CPOF provides a
myfltvec definition that instantiates the template for that variable type. Together
with the Params::myfltvec_data() method, it simplifies access to opcode
arguments:
csnd::myfltvec &out = outargs.myfltvec_data(0);
The Vector class only wraps one-dimensional arrays. For more than one di-
mension, the ARRAYDAT structure needs to be used directly.
20.4 Opcode Programming 341

20.4.6 External Resources

Opcode classes can, in general, be composed of member variables of any type, built
in or user defined. However, we have to remember that opcodes are allocated and
instantiated by C code, which does not know anything about classes. A member
variable of a given class will not be constructed at instantiation when the memory
for it might be first allocated, therefore we need to arrange for its constructor to be
invoked explicitly, if required. The mechanism to do this is the C++ placement new
operator:

new ( placement-parameters ) constructor ( constructor-parameters )

where the placement parameters required to be passed usually simply consist of a


pointer to the existing pre-allocated memory. The placement new does not allocate
memory, but uses the already existing space. So, this is a practical solution for cases
where the allocation happens in C, as is the situation with Csound. This allows
Csound to use C++ classes that require construction, such as ones that will allocate
and use external resources (e.g. memory not allocated by Csound).
While in most circumstances the advice is to avoid the use of external libraries
that dynamically allocate resources outside of the control of the Csound engine, this
might impose too narrow constraints on opcode developers. The solution chosen was
to provide a clear mechanism for the management of external resources. To facilitate
this, a template function is available to construct any member objects as needed,
hiding the placement new and furnishing a standard means of calling constructors:
template <typename T, typename ... Types>
T *constr(T* p, Types ... args){
return new(p) T(args ...);
}
For instance, if we have in our opcode a member variable of type A called obj,
we can construct it by placing the following line in the plugin init() method:
csnd::constr(&obj,10,10.f);
where the arguments are the variable address, followed by any class constructor
parameters.
Equally, if a class allocates any resources (which we will assume is the case un-
less documented otherwise), we are required to invoke its destructor explicitly. This
is done through calling csnd::destr(&obj) in a deinit() method, which is
the corresponding template function to access the class destructor. It is important not
to miss this step, as that could lead to memory leaks and other undefined behaviour.
As an example, we will look at using standard C++ library classes in opcodes.
Many such objects will require explicit constructor and destructor calls through the
mechanism outlined above. The class in Listing 20.4 implements an opcode that
generates signals based on a Gaussian distribution, using
std::normal_distribution
342 20 Plugins

defined in the random header. This opcode is overloaded for audio and control
signals (the actual processing function will be selected on the basis of its output
type or via type annotation). Its general form is
xsig = gaussian:x(imean, idev, iseed)
where x stands for the output type (a, k), and the i-type parameters are the mean,
standard deviation, and seed, in that order.

Listing 20.4: Gaussian opcode class.


struct Gaussian : csnd::Plugin<1, 3> {
std::normal_distribution<MYFLT> norm;
std::mt19937 gen;

int init() {
csnd::constr(&norm, inargs[0], inargs[1]);
csnd::constr(&gen, inargs[2]);
csound->plugin_deinit(this);
return OK;
}

int deinit() {
csnd::destr(&norm);
csnd::destr(&gen);
return OK;
}

int kperf() {
outargs[0] = norm(gen);
return OK;
}

int aperf() {
csnd::AudioSig out(this, outargs(0));
for (auto &sample : out)
sample = norm(gen);
return OK;
}
};
As we can see, the use of external resources requires us to construct the object in
the opcode init() method, and use an explicitly-defined deinit() callback to
free them.
20.5 Conclusions 343

20.4.7 Multithreading Opcodes

A C-language interface for multithreading is provided by the Csound API. This is


implemented via pthreads [22] on POSIX systems, or other native threading libraries
in non-POSIX platforms. A support class is provided to allow object-oriented access
to the underlying C interface: the Thread pure virtual class. This is designed to be
subclassed and instantiated to encapsulate a separate thread. The entry point is given
by a run() method that needs to be implemented in the derived class. Thread also
provides join() and get_thread() methods for joining a thread and getting
its handle. Opcodes that require an asynchronous operation can take advantage of
this class to spawn a new thread to work in parallel with the main processing.

20.5 Conclusions

This chapter has described CPOF and its fundamental characteristics. We have
looked at how the base classes are constructed, how to derive from them, and how
to register new opcodes in the system. The framework is designed to support the
modern C++ idioms discussed in this book, and adopts the C++11 standard. All of
the code examples discussed in this chapter are provided in opcodes.cpp, found
in the examples/plugin directory of the Csound source codebase4 . CPOF is
part of Csound and is distributed alongside its public headers.
To build a plugin opcode library, we require a C++ compiler supporting the
C++11 [23] standard (-std=c++11), and the Csound public headers. CPOF has no
link-time dependencies. The opcodes should be built as a dynamic/shared module
(.so on Linux and .dylib on MacOS). For example, on MacOS, the following
command is used:
$ c++ -dynamiclib -o opcode.dylib opcode.cpp \
-std=c++11 \
-I /Library/Frameworks/CsoundLib64.framework/Headers
On other systems, a similar command line can be used, with adjustments to the
header path and the dynamic/shared library link flag (-shared in some compilers).

Problems

20.1. Write a ring modulation opcode with two audio inputs and one output.

20.2. Write a version of your spectral processing low-pass filter AuLib class (Prob-
lem 19.2) as a Csound opcode.

4 https://github.com/csound/csound
Appendix
Appendix A
AuLib Reference

In this appendix we provide a general reference to the AuLib code. We also discuss
some details of its operation that have been left out of the main body of the book.

A.1 Library-Wide Definitions

A number of library-wide constants and free functions are provided in the AuLib.h
header, in the AuLib namespace

Versioning: these constants and functions are defined in the namespace AuLib::Info
and return information on version and copyright:
const uint32_t major_version
const uint32_t minor_version
static inline const std::string version()
static inline const std::string copyright()
DSP: these constants control some key signal processing attributes of the library.
They include default values for vector, buffer and table sizes; sampling rates; num-
ber of channels; FFT size; decimation; π and 2π ; the minimum value for double
precision floats and the maximum value for 8-byte unsigned integers; and the −120
dB full-scale constant.
const uint32_t def_vframes
const uint32_t def_bframes
const uint32_t def_tframes
const double def_sr
const double def_kr
const uint32_t def_nchnls
const uint32_t def_fftsize
const uint32_t def_decim
const double pi

© Springer Nature Switzerland AG 2019 347


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0
348 A AuLib Reference

const double twopi


const double db_min
const uint64_t ui64_max
const double m120dBfs
The npow2() function is also defined to return the next power of two not greater
than its argument.

MIDI: defined in the namespace midi, these define message status codes:
const uint32_t note_on
const uint32_t note_off
const uint32_t ctrl_msg
const uint32_t aftouch
const uint32_t poly_aftouch
const uint32_t prg_msg
const uint32_t pitchbend
enum error_codes
const std::string aulib_error[]
In addition to these,fft.h defines a number of spectral-processing related con-
stants and free functions in the AuLib::fft namespace:
const bool forward
const bool inverse
const bool polar
const bool rectang
const bool packed
void transform(std::vector<std::complex<double>> &data,
bool dir)
void transform(std::vector<std::complex<double>> &out,
double *in, bool pckd = packed)
void transform(double *out,
std::vector<std::complex<double>> &in,
bool pckd = packed)

A.2 AudioBase

AuLib::AudioBase is the DSP base class for AuLib. It defines a generic object
with no particular processing functions (apart from basic signal arithmetic).

Protected members:

• uint32_t m_nchnls: number of interleaved audio channels.


• uint32_t m_vframes: number of frames in the audio vector.
A.2 AudioBase 349

• std::vector<double> m_vector: the audio vector containing space for


at least the number of channels times the number of frames in audio vector.
• double m_sr: sampling rate.
• uint32_t m_error: error state indicator.
Constructor:
AudioBase(uint32_t nchnls = def_nchnls,
uint32_t vframes = def_vframes,
double sr = def_sr)

• nchnls: number of channels.


• vframes: number of frames in the vector. This is set to the next power of two no
greater than the requested number of frames. Objects requiring arbitrary vector
sizes should explicity resize the vector using the resize_exact() method.
• sr: sampling rate.
Attributes: methods for setting or getting object attributes. The vector frame size
is normally set to the next power of two not greater than the requested value, unless
resize_exact() is used. The vector is always cleared on resizing:
uint32_t vframes(uint32_t frames)
uint32_t resize_exact(uint32_t frames)
uint32_t vframes() const
uint32_t nchnls() const
uint32_t sr() const
uint32_t error() const
virtual const char *error_message() const
Frame access: these methods provide individual or block read or write access to
frames in the object vector:
const AudioBase &set(const AudioBase &obj)
const AudioBase &set(const double *sig)
const double *set(double v)
double set(double v, uint32_t p)
const double *vector() const
double vector(uint32_t frndx, uint32_t chn) const
Iterators: standard iterators are defined for this class, providing access to the
signal vector:
typedef std::vector<double>::iterator iterator
typedef std::vector<double>::const_iterator
const_iterator
iterator begin()
iterator end()
const_iterator cbegin() const
const_iterator cend() const
350 A AuLib Reference

Overloaded operators:
• Signal arithmetic: these scale and offset the vector frames by scalars or vectors
(raw pointers or object references). They can be redefined in derived classes if
necessary:
virtual const AudioBase &operator*=(double scal)
virtual const AudioBase
&operator*=(const double *sig)
virtual const AudioBase
&operator*=(const AudioBase &obj)
virtual const AudioBase &operator+=(double offs)
virtual const AudioBase
&operator+=(const double *sig)
virtual const AudioBase
&operator+=(const AudioBase &obj)
• Vector access, array-like access to the object signal vector:
double &operator[](uint32_t ndx)
const double &operator[](uint32_t ndx) const
• Streams, providing stream IO for the object signal vector:
friend std::ostream &operator<<(std::ostream &os,
const AudioBase &obj)
friend std::istream &operator>>(std::istream &is,
AudioBase &obj)
• Conversion: these are conversion operators into a vector reference or raw pointer.
They allow the object to be cast as one of these types:
operator const std::vector<double> &() const
explicit operator const double *() const

A.3 Deriving New Classes

The AuLib library is designed to be easily extended. There is significant freedom


for developers in this process, as there are no strict prescriptions on the form or sig-
nature of processing (or other) methods. Usually we inherit from the AudioBase
class to allow easy integration with existing objects, and to avail of basic audio
processing facilities provided there. The majority of the library classes do this.
Supplying a processing method that consumes a vector of samples and another
that reads from an AudioBase reference is the minimum necessary to allow full
integration with existing classes. The recommended approach is to separate inter-
face from the implementation, thus placing the vector-processing code in a private
method and providing a means of overriding it. The interface then delegates to this
A.3 Deriving New Classes 351

as appropriate. The example in Listing A.1 is a skeleton of an AudioBase-derived


class that demonstrates these ideas.

Listing A.1: An AudioBase-derived class.


#include <AuLib/AudioBase.h>

namespace AuLib {

class NewClass : public AudioBase {

// this is the main DSP method


// takes in a const pointer with the input
// frames and returns
// a const pointer to the vector data
virtual const double *dsp(const double *sig);

protected:
// a processing parameter
double m_par;

public:
// constructor takes a default parameter value
// as well as the basic AudioBase attributes
NewClass(double param = .5,
uint32_t nchnls = def_nchnls,
uint32_t vframes = def_vframes,
double sr = def_sr)
: m_param(param),
AudioBase(nchnls, vframes, sr){};

// basic interface to DSP process


const double process(const double sig) {
return dsp(sig);
}

// overload allowing for a parameter update


const double process(const double *sig,
double par) {
m_par = par;
return dsp(sig);
}

// overload taking an AudioBase object reference


const NewClass &process(const AudioBase &obj) {
if (obj.vframes() == m_vframes &&
352 A AuLib Reference

obj.nchnls() == m_nchnls) {
dsp(obj.vector());
} else
m_error = AULIB_ERROR;
return this;
}

// overload taking an AudioBase object reference


// and a parameter update
const NewClass &process(const AudioBase &obj,
double par) {
m_par = par;
dsp(obj);
return this;
}

// function-like operator
const NewClass &operator()(const AudioBase &obj) {
return process(obj);
}

// function-like operator with two parameters


const NewClass &operator()(const AudioBase &obj,
double par) {
return process(obj, par);
}
};
}

A.4 Audio DSP Classes

The following is a list of the existing audio DSP classes in the library, arranged in
their different function categories.

Signal processors:

• Balance: balancing of input against a comparator, envelope following.


• Chn: channel extractor.
• Delay: delay line with feedback (comb filter).
• AllPass: high-order all-pass filter.
• Fir: direct convolution, finite impulse response filter.
• Iir: generic second-order infinite impulse response section.
• LowP: second-order low-pass filter.
A.4 Audio DSP Classes 353

• HighP: second-order high-pass filter.


• BandP: second-order band-pass filter.
• BandR: second-order band-reject filter.
• ResonR: resonator with added second-order feedforward section.
• ResonZ: variation on ResonR.
• Reson: standard resonator.
• Pan: panning of a mono input.
• PConv: fast partitioned convolution.
• Tap: delay line tap (truncating read).
• Tapi: delay line tap with linear interpolation.
• ToneLP: first-order low-pass filter.
• ToneHP: first-order high-pass filter.
• Rms: root mean square computation.

Signal generators:

• Envel: general-purpose multi-segment envelope.


• Adsr: attack–decay–sustain–release envelope.
• Line: line generator.
• Expon: exponential curve generator.
• Oscil: truncating oscillator.
• Oscili: linear-interpolation oscillator.
• Oscilic: cubic-interpolation oscillator.
• BlOsc: band-limited oscillator.
• SawOsc: sawtooth oscillator.
• TriOsc: triangle oscillator.
• SqOsc: square oscillator.
• SamplePlayer: general-purpose sampling oscillator.
• Phasor: phase generator.
• TableRead: table lookup.
• TableReadi: linearly interpolated lookup.
• TableReadic: cubic interpolation lookup.

Streaming spectral processing:

• Stft: short-time Fourier transform (forward or inverse).


• Pvoc: phase vocoder analysis or synthesis.
• SpecBase: base class for streaming spectral processing.

Function tables:
• FuncTable: general-purpose function table.
• EnvelTable: multi-segment envelope table.
• FourierTable: Fourier series.
• HammingTable: Hamming window.
• HannTable: Hanning window.
• SawTable: sawtooth wave.
354 A AuLib Reference

• TriTable: triangle wave.


• SqTable: square wave.
• SampleTable: generic sampled-sound table.

Input and output:

• SoundIn: multichannel audio input.


• SoundOut: multichannel audio output.

A.5 Control Classes

A key aspect of sound and music computing software is how to manage, at a higher
level, the instantiation of signal processing graphs. The simplest approach is to hard-
code these in the program, which works well if we do need to make significant
modifications to the graph during operation. A more flexible way is to compose au-
dio processing objects into containers and provide means to instantiate and control
these. That is an important element of any audio processing engines which might be
constructed with AuLib.
We can take advantage of object-oriented programming devices in C++ to im-
plement similar functionality. In AuLib, a basic mechanism for instrument compo-
sition and instantiation is provided by two classes: AuLib::Instrument and
AuLib::Note.

Note: AuLib::Note is a base class that is specialised to contain a signal pro-


cessing graph of AuLib objects.
/** Note class: \n
Base class for modelling synthesiser notes
*/
class Note : public AudioBase {

/** specialise this to contain your


signal processing objects
*/
virtual const Note &dsp() { return *this; }

/** specialise this to handle any


note onset processing (e.g. envelope resets etc)
*/
virtual void on_note(){};

/** specialise this to handle any


note termination processing
(e.g. envelope releases etc)
A.5 Control Classes 355

*/
virtual void off_note(){};

/** specialise this to handle any incoming msg


*/
virtual void on_msg(uint32_t msg,
const std::vector<double> &data,
uint64_t tstamp){};

...
}
In line with the rest of the library, it contains a dsp() method that is to be
overridden in derived classes. This will be responsible for executing the graph and
producing an output. Since Note is derived from AudioBase, it contains the basic
functionality for audio processing like all other objects in the library.
Note also has three fundamental virtual methods used for controlling the signal
processing graph:
1. on_note(): called when the graph is supposed to start executing.
2. off_note(): called to stop execution.
3. on_msg(): called on an arbitrary message that is sent to the graph.
The derived class needs to implement these to allow for instrument control. The
Note class contains basic attributes such as num, vel, channel, and timestamp.
The first two can be thought of as independent parameters, even though they are
named after the MIDI protocol note number and note velocity. The channel is an
instrument identifier, which can be used to filter control data sent to an instance.
The Note public interface provides the basic functionality to control the pro-
cessing graph:
Note(int32_t chn = -1, uint32_t nchnls = def_nchnls,
uint32_t vframes = def_vframes,
double sr = def_sr)
const Note &process()
bool is_on() const
uint64_t time_stamp() const
bool note_on(int32_t chn, double num, double vel,
uint64_t tstamp)
bool note_off(int32_t chn, double num, double vel)
bool note_off()
void ctrl_msg(int32_t chn, uint32_t msg,
const std::vector<double> &data,
uint64_t tstamp)
void set_chn(uint32_t chn)
Instrument: once a Note-derived class is created, we can pass it to an Instrument
object, which will be able to instantiate and control it. This is a template that takes
356 A AuLib Reference

a Note type, and any number of argument types that are defined in the Note con-
structor, as template parameters. The constructor will take as parameters the maxi-
mum number of note instances, and the channel they should respond to.
An Instrument object can be passed to players defined by AudioBase-derived
classes. Two existing types of players are:
• MidiIn: realtime MIDI input, to which one or more Instrument objects can
be passed.
• ScorePlayer: aScore object player, which plays one or more Instrument
objects.

A.5.1 MIDI Synth Example

A simple example showing the use of the MidiIn player is discussed here. A simi-
lar approach can be employed with other types of instrument player. The first step is
to define our note model as a Note-derived class (Listing A.2). We have quite a lot
of freedom to do this, the only requirement is to provide overrides for the dsp(),
on_note(), off_note(), and on_msg() methods. These will be called by
instruments following controls originating from players.
The dsp() method is where we place our synthesis graph, which is fairly sim-
ple: an envelope controlling the amplitude of an oscillator. We use the set()
method from AudioBase to fill the output vector of this note. The other meth-
ods take in the control data and set the relevant class members.
Listing A.2: Note-derived class modelling a sine wave synth note.
class SineSyn : public Note {

// DSP override
virtual const SineSyn &dsp() {
if (!m_env.is_finished())
set(m_osc(m_env(), m_cps * m_bend));
else
clear();
return *this;
}

// note off processing


virtual void off_note() { m_env.release(); }

// note on processing
virtual void on_note() {
m_amp = m_vel / 128.;
m_cps = 440. * pow(2., (m_num - 69.) / 12.);
m_env.reset(m_amp * 0.2, m_ctl[m_atn] + 0.001,
A.5 Control Classes 357

m_ctl[m_dcn] + 0.001,
m_ctl[m_ssn] * m_amp,
m_ctl[m_rln] + 0.001);
}

// msg processing
virtual void on_msg(uint32_t msg,
const std::vector<double> &data,
uint64_t tstamp) {

// pitchbend;
if (msg == midi::pitchbend) {
int32_t bnd = (int32_t)data[1];
bnd = (bnd << 7) | (int32_t)data[0];
double amnt = (bnd - 8192.) / 16384.;
m_bend = std::pow(2., (4. * amnt) / 12.);
}
// ctrls: att, dec, sus, rel
else if (msg == midi::ctrl_msg) {
uint32_t num = (uint32_t)data[0];
m_ctl[num] = data[1] / 128.;
}
};

protected:
// control list
uint32_t m_atn, m_dcn, m_ssn, m_rln;
std::map<uint32_t, double> m_ctl;
double m_bend;
double m_cps;
double m_amp;

// signal processing objects


Adsr m_env;
Oscili m_osc;

public:
typedef std::array<int, 4> ctl_list;

SineSyn(int32_t chn, SineSyn::ctl_list lst)


: Note(chn), m_atn(lst[0]), m_dcn(lst[1]),
m_ssn(lst[2]), m_rln(lst[3]),
m_ctl({{m_atn, 0.01},
{m_dcn, 0.01},
{m_ssn, 0.25},
358 A AuLib Reference

{m_rln, 0.01}}),
m_bend(1.),
m_env(0., m_ctl[m_atn],
m_ctl[m_dcn],
m_ctl[m_ssn],
m_ctl[m_rln]),
m_osc() {
m_env.release();
};
};
With this Note-based class defined, we can derive a second one from it, which
will simply vary the type of oscillator used (a sawtooth instead of a sine). It reuses
most of the code, including all of the control overrides defined in its parent class.

Listing A.3: Note-derived class modelling a sawtooth wave synth note.


// sawtooth note
class SawSyn : public SineSyn {
SawOsc m_saw;

// DSP override
virtual const SawSyn &dsp() {
if (!m_env.is_finished())
set(m_saw(m_env(), m_cps * m_bend));
else
clear();
return *this;
}

public:
SawSyn(int chn, SineSyn::ctl_list lst) :
SineSyn(chn, lst){};
};
The main program in Listing A.4 demonstrates how these Note classes are used
by instruments. In lines 19–23, we see two Instrument objects created, using
the two different types of note. They are set to respond to channels 0 and 1, which
are going to be mapped directly from MIDI channels 0 and 1. Each has 8-note
polyphony, and a list of control numbers is passed as extra parameters to the Note
objects. These are the control numbers for the envelope parameters of these notes.
The next few code lines, 25–28, create a Reverb object, as well as audio output
and the MidiIn player. The synthesis code is a single statement:
out(reverb(midi.listen(sinsynth, sawsynth),
midi.ctlval(-1, 91)));
A.5 Control Classes 359

where the listen method of midi takes two instruments, dispatches any MIDI
messages to them, and collects the audio output. This is sent into the reverb ob-
ject, which processes it, using the MIDI control 91 value as the effect amount. The
reverb output is then sent to the audio card.

Listing A.4: Main program using the two classes in Listings A.2 and A.3, as well as
the Reverb class defined in Listing 19.9.
1 // handle ctrl-c
2 static std::atomic_bool running(true);
3 void signal_handler(int signal) {
4 running = false;
5 std::cout << "\nexiting...\n";
6 }
7
8 int main(int argc, const char *argv[]) {
9 if(argc < 2) {
10 std::cout << "usage: " << argv[0]
11 << " <ir_file>" << std::endl;
12 }
13
14 int dev;
15
16 // control numbers used: 71 - att, 74 - dec,
17 // 84 - sus, 07 - rel
18 // Sinewave Synthesizer - channel 0 (MIDI 1), 8 voices
19 Instrument<SineSyn, SineSyn::ctl_list>
20 sinsynth(8, 0, {{71, 74, 84, 7}});
21 // Sawtooh Synthesizer - channel 1 (MIDI 2), 8 voices
22 Instrument<SawSyn, SineSyn::ctl_list>
23 sawsynth(8, 1, {{71, 74, 84, 7}});
24
25 Reverb reverb(argv[1]);
26
27 SoundOut out("dac", 1, 128);
28 MidiIn midi;
29 std::signal(SIGINT, signal_handler);
30
31 std::cout << "Available MIDI inputs:\n";
32 for (auto &devs : midi.device_list())
33 std::cout << devs << std::endl;
34 std::cout << "choose a device: ";
35 std::cin >> dev;
36
37 if (midi.open(dev) == AULIB_NOERROR) {
38 std::cout << "running... (use ctrl-c to close)\n";
360 A AuLib Reference

39 // listen to midi on behalf of sinsynth & sawsynth


40 while (running)
41 out(reverb(midi.listen(sinsynth, sawsynth),
42 midi.ctlval(-1, 91)));
43 } else
44 std::cout << "error opening device...\n";
45 std::cout << "...finished \n";
46 return 0;
47 }

A.6 Other Classes

A small number of classes are found outside the main AudioBase class:
• Score: this class models a basic numeric score, containing a set of events, and
is used by ScorePlayer.
• Event: a single score event.
• Score::Cmd: a score command.
• Segment: a curve segment for envelope generators or tables.
• TableSet: a set of wave tables used by BlOsc.

A.7 Building AuLib

Programs using the AuLib library can be built in a variety of ways. If specific indi-
vidual classes are used, it is possible to add the relevant implementation source files
to the build. If the library is used extensively, it is simpler to link to the pre-compiled
library. For this it is necessary to first build the library binary.
To get the latest sources, we use git1 . With this installed, we use
$ git clone https://github.com/aulib/aulib
$ cd aulib
The AuLib sources include a CMake2 script that can be used to build and install
the library. It requires the cmake program to be installed. On MacOS, this comes as
a graphical application, but can also be installed as a command-line program. With
this in place, these are the steps to build and install the library:
1. The preferred way to build the library is to do this away from the source tree. For
this we can create, at the top-level source directory, a new directory to hold the
build, and change to it:
1 https://git-scm.org
2 https://cmake.org
A.7 Building AuLib 361

$ mkdir build
$ cd build
2. CMake is then run from the build directory, by passing the top-level source di-
rectory (..).
$ cmake ..
The cmake command allows us to define where we want to install the library.
The default is in /usr/local, but we can use the option CMAKE_INSTALL_
PREFIX= to change it. For instance ,if we want to install in the user directory,
we use
$ cmake .. -DCMAKE_INSTALL_PREFIX=$HOME
CMake will identify the system and installed toolchain, reporting problems if
any components are not installed. Two optional dependencies are Portaudio and
Portmidi: if these are installed then support will be provided for realtime audio
and MIDI in the library.
3. Once CMake has configured the build successfully, we can call make to create
the library. As we have already seen, make is a software build and maintenance
utility provided by the OS:
$ make
4. To install it, we run
$ make install
This installs the library, the headers and example programs in the requested lo-
cation under the ./lib, ./include/AuLib, and ./bin directories. The
library can then be used like any other in the system.
References

1. Abelson, H., Sussman, G.J.: Structure and Interpretation of Computer Programs, 2nd edn.
MIT Press, Cambridge, MA (1996)
2. Abrahams, D., Gurtovoy, A.: C++ Template Metaprogramming: Concepts, Tools, and Tech-
niques from Boost and Beyond (C++ in Depth Series). Addison-Wesley Professional (2004)
3. Apple Inc.: OS X server: Advanced administration (2014). URL http://help.apple.com/
advancedserveradmin/mac/4.0/
4. Beauchamp, J.: Introduction to MUSIC 4C. School of Music, University of Illinois at Urbana-
Champaign (1996)
5. Bencina, R.: Portaudio API, v.19 (2016). URL http://portaudio.com/docs/v19-doxydocs/
6. Blaauw, G.A., Brooks Jr., F.P.: Computer Architecture: Concepts and Evolution, 1st edn.
Addison-Wesley, Boston, MA (1997)
7. Boulanger, R. (ed.): The Csound Book. MIT Press, Cambridge, MA (2000)
8. Bracewell, R.: The Fourier Transform and Its Applications. Electrical Engineering Series.
McGraw-Hill, New York (2000)
9. Church, A.: The Calculi of Lambda Conversion. AM-6, Annals of Mathematics Studies.
Princeton University Press, Princeton, NJ (1985)
10. Cohen, D.: On holy wars and a plea for peace. IEEE Computer 14(10), 48–54 (1981)
11. Cook, P., Scavone, G.: The Synthesis Toolkit (STK). In: Proceedings of the ICMC 99, vol. III,
pp. 164–166. Berlin (1999)
12. Dannenberg, R.: Portmidi API, v.2.2 (2016). URL http://portmedia.sourceforge.net/portmidi/
doxygen/
13. Dannenberg, R.B., Thompson, N.: Real-time software synthesis on superscalar architectures.
Computer Music Journal 21(3), 83–94 (1997). URL http://www.jstor.org/stable/3681016
14. Davis, P.: The JACK audio connection kit (2003). URL http://lac.linuxaudio.org/2003/zkm/
slides/paul davis-jack/title.html
15. Dodge, C., Jerse, T.A.: Computer Music: Synthesis, Composition and Performance, 2nd edn.
Schirmer, New York (1997)
16. Dolson, M.: The phase vocoder: A tutorial. Computer Music Journal 10(4), 14–27 (1986).
URL http://www.jstor.org/stable/3680093
17. Forsyth, R.: Pascal at Work and Play: An Introduction to Computer Programming in Pascal.
Springer (1982)
18. Fourier, J.B.: Théorie analytique de la chaleur. Chez Firmin Didot, Père et fils, Paris (1822)
19. Gardner, W.G.: Efficient convolution without input-output delay. Journal of the Audio Engi-
neering Society 43(3), 127–136 (1995)
20. Huovilainen, A.: Non-linear digital implementation of the Moog ladder filter. In: Proceedings
of the 7th International Conference on Digital Audio Effects (DAFx-04), pp. 61–64. Naples,
Italy (2004)
21. IEEE: Standard for floating-point arithmetic. IEEE Std 754-2008 pp. 1–70 (2008)

© Springer Nature Switzerland AG 2019 363


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0
364 References

22. IEEE/Open Group: The Open Group base specifications, issue 7. IEEE Std 1003.1-2008
(2016). URL http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/pthread.h.html
23. ISO/IEC: International standard ISO/IEC 14882:2011: Information technology: Programming
language C++. ISO Standards pp. 1–1314 (2011). URL https://www.iso.org/standard/50372.
html
24. ISO/IEC: ISO international standard ISO/IEC 9899:2011: Information technology: Program-
ming language C. ISO Standards pp. 1–683 (2011). URL https://www.iso.org/standard/57853.
html
25. ISO/IEC: International standard ISO/IEC 14882:2014: Information technology: Programming
language C++. ISO Standards (2014). URL https://www.iso.org/standard/64029.html
26. ISO/IEC/IEEE: International standard ISO/IEC 9945:2009: Information technology: Portable
operating system interface (POSIX) base specifications, issue 7. ISO Standards pp. 1–3718
(2009). URL https://www.iso.org/standard/50516.html
27. Kernighan, B.W., Pike, R.: The UNIX Programming Environment. Prentice Hall Professional
Technical Reference (1984)
28. Kernighan, B.W., Ritchie, D.M.: The C Programming Language, 2nd edn. Prentice Hall Pro-
fessional Technical Reference (1988)
29. Knuth, D.: The Art of Computer Programming 1: Fundamental Algorithms, 3rd edn. Addison-
Wesley, Menlo Park, CA (1997)
30. Laakso, T.I., Välimäki, V., Karjalainen, M., Laine, U.K.: Splitting the unit delay — Tools for
fractional delay filter design. IEEE Signal Processing Mag. 13(1), 30–60 (1996)
31. Lazzarini, V.: Time-domain signal processing. In: R. Boulanger, V. Lazzarini (eds.) The Audio
Programming Book, pp. 463–512. MIT Press, Cambridge, MA
32. Lazzarini, V.: The SndObj sound object library. Organised Sound 1(5), 35–49 (2000)
33. Lazzarini, V.: Spectral audio programming basics: The DFT, the FFT, and convolution. In:
R. Boulanger, V. Lazzarini (eds.) The Audio Programming Book, pp. 521–538. MIT Press,
Cambridge, MA (2010)
34. Lazzarini, V.: The development of computer music programming systems. Journal of New
Music Research 42(1), 97–110 (2013)
35. Lazzarini, V.: AuLib documentation, v.1.0 beta (2017). URL http://vlazzarini.github.io/aulib/
36. Lazzarini, V.: Computer Music Instruments: Foundations, Design and Development. Springer
(2017)
37. Lazzarini, V.: Supporting an object-oriented approach to unit generator development: The
Csound plugin opcode framework. Applied Sciences 7(10) (2017)
38. Lazzarini, V., Accorsi, F.: Designing a sound object library. In: Proceedings of the XVIII
Brazilian Computer Society Conference, vol. III, pp. 95–104. Belo Horizonte (1998)
39. Lazzarini, V., ffitch, J., Yi, S., Heintz, J., Brandtsegg, Ø., McCurdy, I.: Csound: A Sound and
Music Computing System. Springer Verlag (2016)
40. Lopo, E.C.: Libsndfile API, v.1.0.27 (2016). URL http://www.mega-nerd.com/libsndfile/api.
html
41. Mathews, M.: An acoustical compiler for music and psychological stimuli. Bell System Tech-
nical Journal 40(3), 553–557 (1961)
42. Mathews, M.: The digital computer as a musical instrument. Science 183(3592), 553–557
(1963)
43. Mathews, M., Miller, J.E.: MUSIC IV Programmer’s Manual. Bell Telephone Laboratories,
Murray Hill, N.J. (1964)
44. Mathews, M., Miller, J.E., Moore, F.R., Pierce, J.R.: The Technology of Computer Music.
MIT Press, Cambridge, MA
45. McAulay, R., Quatieri, T.: Speech analysis/synthesis based on a sinusoidal representation.
IEEE Transactions on Acoustics, Speech and Signal Processing 34(4), 744–754 (1986)
46. McElhearn, K.: The Mac OS X Command Line: Unix Under the Hood. Wiley, New York, NY
(2004)
47. MIDI Manufacturers Association: MIDI 1.0 specification (1983). URL http://www.midi.org
48. Moore, F.R.: Elements of Computer Music. Prentice-Hall, Inc., Upper Saddle River, NJ (1990)
References 365

49. Nyquist, H.: Certain topics in telegraph transmission theory. Transactions of the AIEE 47,
617–644 (1928)
50. Orlarey, Y., Fober, D., Letz., S.: Faust: An efficient functional approach to DSP programming.
In: G. Assayag, A. Gerszo (eds.) New Computational Paradigms for Computer Music, pp.
1–33. Edition Delatour (2009)
51. Pampin, J.: ATS: A system for sound analysis transformation and synthesis based on a sinu-
soidal plus crtitical-band noise model and psychoacoustics. In: Proceedings of the Interna-
tional Computer Music Conference, pp. 402–405. Miami, FL (2004)
52. Park, T.: An interview with Max Mathews. Computer Music Journal 33(3), 9–22 (2009)
53. Pate, S., Bosch, F.V.D.: UNIX Filesystems: Evolution, Design and Implemenation. Wiley,
New York, NY (2003)
54. Puckette, M.: The Theory and Technique of Computer Music. World Scientific Publ., New
York (2007)
55. Roads, C., Mathews, M.: Interview with Tongues. Computer Music Journal 4(4), pp. 15–22
(1980)
56. Rocher, M.: Introduction to the theory of Fourier’s series. Annals of Mathematics 7(3), 81–152
(1906)
57. Serra, X., Smith, J.: Spectral modeling synthesis: A sound analysis/synthesis based on a de-
terministic plus stochastic decomposition. Computer Music Journal 14, 12–24 (1990)
58. Shannon, C.E.: Communication in the presence of noise. Proceedings of the Institute of Radio
Engineers 37(1), 10–21 (1949)
59. Shotts Jr., W.E.: The Linux Command Line: A Complete Introduction. No Starch Press, San
Francisco, CA (2012)
60. Silberschatz, A., Galvin, P.B., Gagne, G.: Operating System Concepts, 8th edn. Wiley, New
York, NY (2008)
61. Steiglitz, K.: A Digital Signal Processing Primer, with Applications to Digital Audio and
Computer Music. Addison-Wesley Longman, Redwood City, CA (1996)
62. Stroustrup, B.: The C++ Programming Language, 2nd edn. Addison-Wesley (1991)
63. Stroustrup, B.: The C++ Programming Language, 4th edn. Addison-Wesley (2013)
64. Timoney, J., Lazzarini, V., Lysaght, T.: New SndObj classes for sinusoidal modelling. In:
Proceedings of the 5th Int. Conference on Digital Audio Effects (DAFx-02), pp. 217–221.
University of the Federal Armed Forces, Hamburg, Germany (2002)
65. Vercoe, B.: MUSIC 11 Reference Manual. Studio for Experimental Music, MIT (1981)
66. Widrow, B., Kollár, I.: Quantization Noise: Roundoff Error in Digital Computation, Signal
Processing, Control, and Communications. Cambridge University Press, Cambridge, UK
(2008)
Index

#define, 25, 71 AuLib::Chn, 259, 262


#include, 13, 75 AuLib::Delay, 255, 259, 262, 268, 273,
%=, 47 276
%, 26 AuLib::Envel, 259
&&, 40 AuLib::Fir, 281, 311
&, 8, 34, 58, 90 AuLib::FourierTable, 255, 259
*=, 47 AuLib::FuncTable, 258
*, 26, 58 AuLib::Instrument, 260, 355
++, 46 AuLib::LowP, 254
+=, 47 AuLib::MidiIn, 260, 356
+, 26 AuLib::Note, 260, 354
--, 47 AuLib::PConv::convolution(), 308
-=, 47 AuLib::PConv, 311
-framework, 161 AuLib::Pan, 262
-ljack, 145, 151, 178 AuLib::Pvoc::transform(), 321
-lm, 78 AuLib::ResonZ, 254
-lportaudio, 142 AuLib::Rms, 255
-lportmidi, 168 AuLib::SampleTable, 311
-lsndfile, 128 AuLib::ScorePlayer, 260, 356
-o, 15 AuLib::Score, 260
-, 26 AuLib::Segments, 259
/=, 47 AuLib::SigBus, 259, 262
/, 26 AuLib::SoundIn, 259, 261
<<, 92 AuLib::SoundOut, 259, 262
<=, 39 AuLib::Stft::transform(), 317
<, 39 AuLib::Tapi, 259, 278
==, 39 AuLib::Tap, 259, 278
=, 24 AuLib::fft::transform(), 297, 299,
>=, 39 300
>>, 92 AudioGetCurrentHostTime(), 160
>, 39 CoreFoundation.h, 161
AuLib.h, 347 CoreMidi.h, 160
AuLib::AllPass, 274 EOF, 34, 50, 80, 103, 106, 110
AuLib::AudioBase, 253, 255, 256, 348 FILE, 106
AuLib::Balance, 255 HostTime.h, 160
AuLib::BandP, 254 JackProcessCallback(), 148
AuLib::BlOsc, 255, 258 JackProcessCallback, 150

© Springer Nature Switzerland AG 2019 367


V. Lazzarini, Computer Music Instruments II,
https://doi.org/10.1007/978-3-030-13712-0
368 Index

MIDIGetDestination(), 161 continue, 48


MIDIPacketListAdd(), 161 csnd::Param<M>, 328
MIDIPacketListInit(), 161 csnd::Plugin<N,M>, 327
MIDISend(), 161 csnd::constr(), 341
PATH, 15 cstdint, 227
PaDeviceInfo, 134 cstdio, 188
PaStreamParameters, 135 ctl-c, 145, 264
Pa_CloseStream(), 139 ctl-d, 50, 103
Pa_GetDefaultInputDev(), 134 default, 43
Pa_GetDefaultOutputDev(), 134 delete, 191
Pa_GetDeviceCount(), 134 do - while, 46
Pa_GetErrorText(), 133 double, 23
Pa_GetStreamTime(), 137 else if, 41
Pa_Initialize(), 133 else, 41
Pa_OpenStream(), 135 enum, 89
Pa_ReadStream(), 137 extern, 74
Pa_StartStream(), 136 false, 216
Pa_StopStream), 139 feof(), 109
Pa_StreamCallback(), 137 ferror(), 109
Pa_Terminate(), 139 fft.h, 348
Pa_WriteStream(), 137 fgetc(), 107
PmDeviceInfo, 164 fgetpos(), 109
Pm_Close(), 166
fgets(), 107
Pm_CountDeices(), 164
float, 23
Pm_Event, 169
fopen(), 106
Pm_GetDeviceInfo(), 164
for, 47
Pm_Initialize(), 164
fprintf(), 107
Pm_Message(), 166
fputc(), 107
Pm_OpenInput(), 168
fputs(), 107
Pm_OpenOutput(), 165
fread(), 108
Pm_Poll(), 168
free(), 97
Pm_Read(), 168
Pm_Terminate(), 166 friend, 224, 230
Pm_WriteShort(), 166 fscanf(), 107
Pt_Start(), 164 fseek(), 109
SF_INFO, 122 fsetpos(), 109
SIGINT, 145, 264 ftell(), 109
ˆ, 90 fwrite(), 108
_Atomic, 176 getchar(), 35
_Bool, 40 if, 40
atof(), 82 inline, 72, 207, 209
atoi(), 82 int, 22
atomic_fetch_add(), 177 iostream, 229
atomic_fetch_sub(), 177 jack.h, 146
atomic_load(), 176 jack_activate(), 148, 174
auto, 24, 69, 283 jack_client_close(), 149, 174
bool, 216 jack_client_open(), 146, 174
break, 43, 48 jack_connect(), 148, 174
calloc(), 96 jack_deactivate(), 149, 174
case, 43 jack_get_sample_rate(), 147
char, 22, 23 jack_midi_event_get(), 175
class, 224 jack_midi_event_t, 175
const char, 64 jack_port_get_buffer(), 148
const, 26, 193 jack_port_register(), 147, 174
Index 369

jack_set_process_callback(), 148, std::cin, 229


174 std::complex<T>, 293
long, 22 std::complex_literals, 293
main(), 13, 16, 81 std::copy, 247
malloc(), 96 std::cout, 229
math.h, 78, 185 std::iota, 262
memcpy(), 97 std::normal_distribution , 341
memmove(), 97 std::vector, 245, 261
memset(), 97 stdatomic.h, 176
namespace, 225 stderr, 107
new, 191 stdint.h, 22
open(), 105 stdin, 49, 107
operator(), 255 stdio.h, 13, 14, 105, 110
operator+=, 264 stdlib.h, 82, 95
operator<<, 229 stdout, 49, 107
operator>>, 229 std, 245
operator[], 244 strcat(), 64
operator, 228 strcpy(), 64
perror(), 109 strdup(), 96
portaudio.h, 133 string.h, 64, 97
portmidi.h, 164 strlen(), 64
porttime.h, 164 strncat(), 64
printf(), 15, 16, 31 strncpy(), 64
private, 223 strtod(), 82
protected, 224 strtof(), 82
public, 224 strtol(), 82
putchar(), 35 struct, 85, 188
puts(), 35 switch, 43
random, 341 template, 243
read(), 105 tmpfile(), 110
realloc(), 96 true, 216
reinterpret_cast<T>(), 299 typedef, 86
remove(), 110 typename, 243
rename(), 110 ungetc(), 107
return, 14, 67, 68 union, 89
rewind(), 109 unistd.h, 105
scanf(), 34 unsigned, 22
sf_open(), 122 usleep(), 139
sf_read_type(), 124 va_arg(), 72
sf_readf_type(), 124 va_end(), 72
sf_seek(), 125 va_list, 72
sf_write_type(), 124 va_start(), 72
sf_writef_type(), 124 virtual, 205
short, 22 void*, 96
signal(), 264 void, 68
sin(), 78, 185 while, 45
size_t, 28 write(), 105
sizeof(), 28 ∼, 90
sndfile.h, 122
snprintf(), 65 abstraction, 253
sprintf(), 65 access control, 222
sscanf(), 83 address operator, 34, 58
static, 24, 69, 74, 188 all pass, 236, 273
std::cerr, 229 ALSA, 133, 145
370 Index

amplitude, 288 character set, 12


analogue-digital converter, 131 comments, 15
AND, 40 entry point, 16
bitwise, 90 function, 13
Android, 4 ISO, 9
angle, 290 keywords, 12
API, 131 math library, 78
argument, 68 standard library, 14, 77
main() function, 81 structurers, 85
data structures, 87 C++ language, 182
translation, 82 auto type, 283
variable list, 72 closure, 282
arithmetic iterator, 246
operators, 26 memory allocator, 191
order, 28 namespaces, 224
pointers, 60 placement new, 341
array, 55, 96 range-based for, 247
for loop, 56 references, 210
declaration, 55 standard library, 229, 244
dynamic, 97 structures, 188
index, 56 template, 243
initialisation, 56, 57 calc, 36
pointers, 60 callback, 77
two-dimensional, 57, 61 capture, 283
ASCII, 25, 33, 52 cc, 15, 36, 128, 142, 151, 161, 168, 178
ASCII character set, 5 channels, 118
assignment, 24 character, 19, 23
atomic, 170, 176 chorus, 277
attack, 222 circular buffer, 176, 265, 266
AudioUnit, 131 class, 188, 224
access control, 222
balance, 242 constructor, 189, 191
band pass, 235 member function, 188
base class, 202 client-server, 145
bash, 6 comb, 272
big endian, 21, 121 command, 6
binary encoding, 20 c++, 188, 195, 209
bit, 20 cat, 7
order, 20 cc, 10, 128
bit reverse, 296 cd, 6
bitwise cp, 7
AND, 90 echo, 7
NOT, 90 gnuplot, 51
OR, 90 kill, 8
shift, 92 killall, 8
XOR, 90 ls, 7
blocking, 137 make, 209
Boolean expression, 40 man, 9, 77
branching, 40 mkdir, 7
buffer, 79, 126, 132 mv, 7
byte, 20 pipe, 50
order, 20, 121 ps, 8
pwd, 6
C language rm, 7
Index 371

rmdir, 7 variable, 275


running in background, 8 dereference, 58
standard IO redirection, 49 derived class, 202
compiler, 9, 10 destructor, 191, 245
compiling, 10, 15 DFT, 290, 293
CoreMIDI program, 161 digital signal, 49, 115
Jack program, 151, 178 basic operations, 119
libsndfile program, 128 normalised range, 125
portaudio program, 142 range, 117
Portmidi program, 168 realtime, 131
complex number, 290 digital-analogue converter, 80, 131
complex-to-real FFT, 300 direct form I, 240
composition, 255 direct form II, 240
conditional execution, 40 Doppler effect, 277
conditional expression, 39 dynamic array, 97
conditional operator, 42 dynamic memory allocation, 95
constant, 25
constructor, 189 encapsulation, 253
copy, 212, 245 envelope, 219
move, 245 environment variable, 7
convolution, 280 PATH, 7, 11
convolution reverb, 310 exclusive mode, 106
copy exponential envelope, 221
assignment operator, 229, 245
constructor, 212, 245 factorial, 73
Coreaudio, 131, 133 fall through, 44
CoreMIDI, 160 fast convolution, 302
counting variable, 46 feedback, 236, 272
CPOF, 326 FFT, 282, 293
cps, 88, 170 data re-order, 295
cross synthesis, 320 inverse, 297
CRTP, 326 real input, 298
Csound, 93, 111, 325 FIFO, 176, 266
engine object, 332 file
library, 11
data race, 176 long listing, 7
data structure, 85 plain text, 5
function arguments, 87 removing, 110
function members, 88 renaming, 110
member access, 86 source code, 10
member dereference, 88 temporary, 110
pointers, 87 types, 5
data type, 19 file system, 4
cast, 27 functions, 110
defining, 85, 188 finite impulse response filter, 236, 280
effect on operators, 27 first-order filter, 236
size, 28 flanger, 277
decay, 222 floating-point, 19, 22, 25
decrement, 47 floor, 192
delay, 265 format string, 31
Csound opcode, 333 conversion specifier, 32
fixed, 267 formatting codes, 33
multitap, 278 length modifier, 33
program, 260 pattern matching, 35
372 Index

Fourier series, 195 callbacks, 148


fourth-order filter, 241 closing clients, 149
frame, 118 connecting ports, 148
free function, 230 MIDI, 174
frequency, 288 opening clients, 146
frequency domain, 287 registering ports, 147
frequency glide, 222 sampling frequency, 147
fsig, 337 starting a server, 154
function
arguments, 68 kernel, 3
call semantics, 69 ksig, 329
callbacks, 77
declaration, 70 lambda function, 282
definition, 67 latency, 131, 132, 260
inline, 72 library, 11, 14
optional parameters, 191 libsndfile, 122, 230
overloading, 190 major formats, 123
pointers, 75 opening files, 122
pointers in data structures, 88 reading and writing, 124
recursion, 73 subtypes, 123
variable argument list, 72 linear envelope, 220, 225
virtual, 205 linked list, 99
function table, 193, 209 linker, 10
flags, 128
gain, 119 Linux, 4, 9
gnuplot, 51 literal, 25
little endian, 21, 52, 121
HAL, 131 logical expression, 40
Hermitian spectrum, 292 loop, 45
hexadecimal number, 25, 156 range-based, 247
high pass, 235 low pass, 235
higher-orders filter, 236
home directory, 4 MacOS, 4, 9
Hz, 88, 170 frameworks, 159
macro, 25
IDFT, 290 arguments, 71
imaginary part, 290 magnitude, 290
impulse response, 280 method, 188
increment, 46 microsecond, 139, 151
indirection, 58 MIDI
infinite impulse response filter, 236 channel, 158
inheritance, 202 CoreMIDI, 159
instantaneous frequency, 186, 320 messages, 156
integer, 19, 22 Portmidi, 163
integration, 185 programming, 158
interleaving, 118 protocol, 155
interpolation status byte, 158
cubic, 201 MIDI generator, 161, 167
linear, 200 MIDI synthesiser, 169
interpreter, 10 mixing, 119
iOS, 4 modular programming, 74
modulo, 27
Jack Connection Kit, 132, 133, 145, 231
activating clients, 148 namespace, 225
Index 373

negative spectrum, 289, 292 increment, 61


non-blocking, 138 initialisation, 58
non-realtime safe, 141 syntax, 59
note number to Hz, 88, 170 polar, 290
polymorphism, 205
object, 224 polynomial, 200, 201
octal number, 25 inear, 220
one’s complement, 90 Portaudio, 132, 230
opcode asynchronous, 137
a-rate, 330 closing devices, 139
argument type, 331 initialising, 133
arguments, 328 listing devices, 134
array processing, 339 opening devices, 135
building, 343 stream time, 137
init-time, 329 synchronous operation, 136
k-rate, 329 Portmidi, 163
multithreading, 343 closing devices, 166
perf-time, 329 initialising, 164
registering, 331 input, 168
spectral, 337 listing devices, 164
text processing, 336 opening devices, 165
operating system, 3, 9 polling input, 168
OR, 40 timers, 164
bitwise, 90 timestamp, 166
oscillators, 185, 251 writing to output, 166
Csound opcode, 335 positive spectrum, 289
overlap add, 305, 317 POSIX, 9, 343
overlap save, 305 precedence, 28
overlapped analysis, 315 preprocessor, 10
overloading printing, 31
function, 190 process, 8, 13, 14, 138, 145
operators, 228 program
compiling, 10, 15
panning, 119, 126 linking, 10
partitioned convolution, 305 running, 11, 13, 15
mutliple partittions, 307 pthreads, 343
non-uniform partition size, 311 Pulseaudio, 133
period, 48
phase, 288 quantising, 116
phase offset, 314
phase vocoder, 320 ramp, 48
Csound, 337 reading
Csound processing opcode, 338 binary streams, 108
effects, 322 soundfiles, 124
inverse, 321 text streams, 107
phasor, 215 real number, 22
pitch shifter, 277 real part, 290
plot, 50 real-to-complex FFT, 299
plotting, 80 realtime preemption, 131
pointer, 58, 96 realtime safe, 138, 176, 229, 256
arithmetic, 60 rectangular, 290
array equivalence, 60 recursion, 73
declaration, 58 redirection, 49
functions, 75 refactoring, 209, 256
374 Index

reference type, 210 position, 109


remainder, 27 text functions, 107
resonator, 238 streaming spectral signal
reuse, 253 analysis, 315
root directory, 5 Csound type, 337
root mean square, 238, 242 manipulation, 320
parameters, 313
sample, 52, 79, 116 resynthesis, 317
format, 123 string, 14, 23, 26, 35, 57
precision, 117 pointer, 63
sampling, 115 zero-terminated, 57
sampling frequency, 77, 116, 123, 147 structures, 85, 188
sampling increment, 186 synthesis, 48, 77, 120
sawtooth wave, 48, 52 synthesiser, 169
scaling, 119
score generation, 111 table lookup, 192, 215
second-order filter, 236, 238, 240 template, 243
seeking, 109, 125 function, 244
shell, 6 terminal, 5
shin, 15 thread, 8, 138, 170, 176, 177
sine wave, 81 time domain, 265, 287
sinusoid, 288 tobin, 52, 110
sinusoidal components, 287 todac, 139
smearing, 292 tone filter, 236
SndObj library, 249
toolchain, 9
soundfile
tremolo, 141
libsndile, 122
twiddle factor, 294
raw, 120
self-describing, 121
spectral filter, 320 unary *, 58
spectrum, 287 UNIX, 4, 9
square wave, 53
standard C library, 14, 77, 105 variable, 19
in C++, 188 automatic, 24, 69
standard C++ library, 244 global, 24
standard IO, 13, 16 local, 24
C++ classes, 229 read-only, 26
redirection, 49 scope, 24
streams, 107 vector, 193
state, 186 vibrato, 277
static members, 188
STFT, 313 whole number, 22
STK, 249 window, 278, 313
stream, 105 writing
binary functions, 108 binary streams, 108, 120
error reporting, 109 soundfiles, 124
orientation, 105 text streams, 107

You might also like