You are on page 1of 9

Part III : Audio Semantics Motivation

• What is music?
– Stylistic Aleatorics
• Music is:
is
– Factor Oracle – Sound ?
• Cognitive Model
– Organization ?
– Music as Information Source
– Listening as Communication Channel – Experience ?
– Anticipation: description and explanation
– Emotional Force and Familiarity
– Information Rate and Signal Recurrence Organization of Sounds in
• Conclusion order to create an
– References and resources
Experience.

Motivation (cont.)
• Many aspects of musical structure are Stylistic Aleatorics
impossible to define formally
• Nevertheless, music and sound exhibit a
great amount of structure and redundancy
Computational modeling of music that
– Redundancy can be measured statistically
allows reproduction of stylistic music by
• Musical cognition is guided by memory and
learning from examples
expectation
– Guessing the future based on the past

Stochastic Music
Music as a Prediction Tree
Iannis Xenakis “Musiques Formelles” (Formalized Music), • Analysis of “abracadabra”.
published in 1963: root context : “”
continuations : a (4/7), b (1/7), r (2/7).
“ Since antiquity the concepts of chance (tyche), disorder
(ataxia), disorganization were considered as the opposite and
negation of reason (logos), order (taxis), and organization a b r
context : “a” context : “r”
(systasis). [a stochastic process is] . . . an asymptotic evolution continuations : b (1/3), continuation : a (1/1).
towards a stable state, towards a kind of goal, of stochos, c (1/3), d (1/3).

whence comes the adjective <stochastic>”. b c d a

P(generate "abrac") = P(a|"")P(b|a)P(r|ab)P(a|abr)P(c|abra) = 4/7 ·


1/3 · 2/7 · 1 · 1/3.

1
lzify
Examples
http://music.ucsd.edu/~sdubnov/ThoughtsAboutMemex.htm
http://www.ircam.fr/equipes/repmus/MachineImpro/

Chick Corea original

Impro1

Impro2

J.S.Bach Ricercar impro

Factor Oracle

s = “abbbaab”

[trn,sfx] = FO(s,a)

% Factor Oracle for sequence s


% input:
% s - string of numbers in range [1,a]
% a - size of the alphabet
% output:
% trm - transition matrix (forward)
% sfx - suffix vector (backward)

[s, kend, ktrace] = FOgen(trn,sfx,n,p,k)


% Generate new sequence using a Factor Oracle
% input:
% trn - transition table
% sfx - suffix vector
% n - length of new string
% p - probability of change
% k - starting point
% output:
% s - new sequence
% kend - end point

Music as Information Source


What is style?
Typical Set:
• Emergence 1
An = { x1n : " log p ( x1 , x2 ,.., xn ) ! H ( X ) }
artist’s search for self-expression n
All sequences

• Decision making Non Typical Set


what you do if there is no rational basis for doing it P(An) 1
Typical
• Influential Set An |An |~2nH
setting up frameworks of expectations that
influence the audience and allow aesthetic
planning

2
Cog-Comm Model
Anticipation
Aesthetic perception as a communication process: • Entropy & Information • Communication Channel
Information that influences the “cognitive state” of the information receiver
H(x) H(y)

I(x,y)
Source x Channel. Receiver y
Slow brain
appraisal H(x|y) H(y|x)

Self-supervision
Source x Channel. Receiver y (emotions, etc.,…)
H(x,y) I ( x, y ) = H ( x ) ! H ( x | y )
Fast brain
processing
I ( x, y ) = H ( x ) ! H ( x | y ) y – past experience, what you heard so far
x – new material
The information paradox: H(x) – uncertainty about x
Discover more by listening more….
(you gain by learning, not get bored!)
H(x|y) – uncertainty about x when we know already y
I(x,y) – how much the past tells us about the future

“Cognitive” features Information Rate


Anticipation is proportional to the amount of information
that the past “carries” into the present
Cognitive Measure Signal Measure

Familiarity Similarity Structure,


I ( xpast , xn ) = I ( x1, x2 ,..., xn ) ! I ( x1, x2 ,..., xn!1 ) = H ( x ) ! H ( x | xn!1 )
n n 1
(Recognition, Repetition, Recurrence
Categorization) (Long term) • Increase in information with arrival of new data
• Difference between uncertainty before and after prediction
Emotional Force Measures of Predictability,
• Can be estimated from notes or audio (score or recording)
(Anticipation, Information Rate (Short
• Psychologically plausible “Inverted U function”
Implication-Realization) Term)

Information Rate (cont.)


Anticipatory
Assuming a model " we can write Listening Influential
Experience features
*
I ( x past , xn ) "< I! ( x past , xn ) > P (! ) + H (! )+ < D(! || ! > P (! )
CODING APPRAISAL
! Model Modeling
Recognition
Information rate = Description + Explanation Cost

Observations Coding
• I! ( x past , xn ) Description “within” a model (Coding Gain) Prediction Gain

• H (! ) Model uncertainty (size of model space)


(Model
*
• D (! || ! ) Distance between model and “true” Cost)
distribution

3
Coding Gain: IR Estimation IR_Analysis.m
Sound In IR vs. Emotional Force
The Angel of Death by Roger Reynolds
Feature extraction:
IR using 200 msec cepstral features vs. human judgments of
Timbral / Textural descriptors emotional force (EF)
(Cepstrum analysis of 20-600ms frames. )

X
Factorization:
Descriptor prototypes and their time evolution
(SVD analysis of the data. )

S1(t) S2(t)
... SN(t)
IR IR IR

+
SVD – Singular Value Decomposition
IR – Derived from Spectral Flatness Measure

Information Rate (cont.)


*
How to find D(" n ," )and H(" n )? The Angel of Death by Roger Reynolds
Recur_Analysis.m

Idea:
• Transform the similarity matrix D into a
!
Markov transition
! matrix
d(X i , X j )
Pij = P( j | i) =
" j
d(X i , X j )

• Approximate P(" * ) by eigenvectors of P


• Consider it as a measure of “explanation”
!
H(" n ) + D(" n ," * ) =< log P(" * ) >" n ~ log Pn (" * )
!

“Explanation” Profile vs. Familiarity


Segmentation Application
Apel & Dubnov, ICMC 04

Original Segmented

• Segmentation is done by grouping the values


of 2 largest eigenvectors into 3 clusters
• Sounds are associated to their nearest cluster
JASIST, Special Issue on Style, Sep. 2006

4
Spectral Clustering Example
The above clustering method is closely
related to spectral clustering

Shi & Malik Algorithm:


• Construct the matrices D and Z. Z = diag(" j d(X i , X j ))
• Find the second smallest generalized eigenvector of
(Z-D) i.e.
( Z " D) y = ! Zy
!
• Threshold y to get a partitioning of the graph.

Distance Matrix Second generalized eigenvector

5
The first partition Third generalized eigenvector

Fourth generalized
The second partition
eigenvector

The third partition Normalized Cut

From Shi and Malik, 2000

6
Conclusion
http://cosmal.ucsd.edu/cal/
1. Modeling Music and words, Douglas Turnbull, Luke
Barrington, and Gert Lanckriet - ISMIR 06
2. Musical Boundary Detection using Boosting, Douglas
Turnbull, Gert Lankriet, Elias Pampalk, and Masataka Goto -
Submitted to ICASSP 07

Modeling music and words [TBL06] Modeling music and words [TBL06]
Data Features Modeling
Design a statistical system that learns a relationship
Parametric Model:
between music and words. Training Data
Set of GMMs
Vocabulary

T
Applications: T
Caption Document Vectors (y)
1. Annotation: Given a audio-content of a song, we can Parameter
‘annotate’ the song with semantically meaningful words. Estimation:
song → words Audio-Feature EM Algorithm
Extraction (X)

2. Retrieval: Given a text-based query, we can ‘retrieve’


relevant songs based on the audio content of the songs. Novel Song
words → songs (annotation) Caption

Inference
The parameter for the model are learned using a
heterogeneous data set of song and song reviews. Text
Query
(retrieval)

Supervised Musical Boundary Detection [TLPG06]


Consider the structure of a song:
– Musical Segment: a song is composed of segments:
• Introduction, Bridges, Verses, Choruses, Outro

– Musical Boundary: a boundary between two musical Framework for Anticipatory Machine
segments.
• e.g., the end of a verse and the beginning of a chorus
Improvisation and Style Imitation
Arshia Cont1,2, Shlomo Dubnov1 and Gérard Assayag2

1 Center for Research in Computing and the Arts (CRCA),


University of California in San Diego (UCSD).
2 Ircam-Centre Pompidou, Paris, France.
{Cont,assayag}@ircam.fr, sdubnov@ucsd.edu

7
Framework for Anticipatory Machine Improvisation and Style Imitation [CAD06] Framework for Anticipatory Machine Improvisation and Style Imitation [CAD06]

Approach Memory Models


• AI: • Using Factor Oracles:
– Linear time and space complexity
– Interactive Reinforcement Learning with an environment
– Suffix links: give the maximum repeated suffix given the context.
– Multiple-agents with collaborative and competitive learning – Forward Transitions: give maximal length context.
– Memory-based learning
– Main schema for interactive modeling:

Framework for Anticipatory Machine Improvisation and Style Imitation [CAD06]

Long-term planning/formal shapes

Real-time Multiple Pitch Observation


using Sparse Non-Negative Constraints

Arshia Cont
IRCAM, Realtime Applications Team, Paris.
Center for Research in Computing and the Arts (CRCA),
University of California in San Diego (UCSD).

http://crca.ucsd.edu/arshia/

Realtime Multiple-pitch Observation [CONT06] Realtime Multiple-pitch Observation [CONT06]

Implementation
Sparseness Comparison

Signal Proc. Frontend

NMD with Sparseness

8
Conferences Some Web Resources
• ICMC, International Computer Music Conference • Organizations
• ISMIR, Int. Conf. on Music Information Retrieval – Computer Music Association http://www.notam02.no/icma/
– Music information retrieval research http://www.music-ir.org/
• DAFX, Int. Conf. on Digital Audio Effects
– AUDITORY list home page http://www.auditory.org/
• NIME, New Interfaces for Musical Expression – Acoustic Society of America http://www.acoustics.org/
• ICAD, Int. Conf. on Auditory Display • This
• ICASSP, Int. Conf. on Audio, Speech and Signal – http://music.ucsd.edu/~sdubnov/ComputerAudition/
Processing – http://cosmal.ucsd.edu/cal/
• ACM Multimedia
• AES, Audio Engineering Society Conferences
• ASA, Acoustical Society of America Meetings

You might also like