Professional Documents
Culture Documents
Automatic Generation of Tamil Lyrics For Melodies
Automatic Generation of Tamil Lyrics For Melodies
net/publication/234775971
CITATIONS READS
23 5,497
3 authors, including:
Sobha Lalithadevi
Anna University – K B Chandrasekhar (AU-KBC) Research Centre
62 PUBLICATIONS 275 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sobha Lalithadevi on 08 January 2016.
40
Proceedings of the NAACL HLT Workshop on Computational Approaches to Linguistic Creativity, pages 40–46,
Boulder, Colorado, June 2009.
2009
c Association for Computational Linguistics
of the following three labels: Kuril (short vowel, tation scheme that will match the melody. Since our
represented by K), Nedil (long vowel, represented representation of melody is based on the ABC Nota
by N) and Mei (consonants, represented by M). For tion (Gonzato, 2003), which is textual, we approach
example, the word thAmarai (lotus) will be repre this problem as labeling the ABC notation using the
sented as NKN (long vowel followed by short KNM representation scheme.
vowel followed by another long vowel). This repre
sentation scheme, herein after referred as KNM rep
resentation, is used throughout our system train
ing, melody analysis and as input to the sentence
generation module.
3 Approach
41
the given interval. Each Note is represented by the where we integrate the specific features of the prob
character set A, B, C, D, E, F and G – which are lem into the machine learning models like CRF.
called as main notes and A#, C#, D#, F# and G#
which are called Sharp Notes. Thus, for any given 4.3 Feature Templates
Meter in the melody, we can find the sequence of
There are three models that need to be learnt, viz,
Notes with the corresponding duration for which the
labeling notes with KNM scheme, identifying word
Note is played in that meter.
boundaries and identifying line boundaries. We
For the purpose of generating lyrics, we need to
present below the features used to learn each of the
fit one syllable for each of the notes in the melody.
above.
4.2 Conditional Random Fields
4.3.1 Learning KNM labels
Conditional Random Fields(CRF) (Lafferty et al.,
In addition to the labels K, N and M, there are also
2001) is a Machine Learning technique that has per
other nonsyllable features that need to be identified
formed well for sequence labeling problems such as
in the melody. Thus, the complete list of labels in
POS tagging, Chunking and Named Entity Recogni
clude, K, N, KM, NM, TIE, OPEN, CLOSE, PRE
tion. It overcomes the difficulties faced in other
and BAR.
techniques like Hidden Markov Models(HMM) and
K – short vowel
Maximum Entropy Markov Model(MEMM).
N – long vowel
(Lafferty et al., 2001) define Conditional Ran
KM – short vowel followed by consonant
dom Fields as follows: “Let G = (V, E) be a graph
NM – long vowel followed by consonants
such that Y = (Yv) v ∈ V, so that Y is indexed by the TIE – presence of a Tie in the meter
vertices of G. Then (X,Y) is a conditional random OPEN – opening of a tie
field in case, when conditioned on X, the random CLOSE – closing of a tie
variables Yv obey the Markov property with respect PRE – Note that follows a tie
to the graph: p(Yv|X,Yw,w≠v) = p(Yv|X,Yw,w~v), BAR – End of meter.
where w~v means that w and v are neighbors in G”. The following are the list of features considered:
Here X denotes a sentence and Y denotes the label se • Current Note
quence. The label sequence y which maximizes the like • Previous Note + Current Note + Next Note
lihood probability pӨ(y|x) will be considered as the cor • Previoustoprevious Note + Previous Note
rect sequence, while testing for new sentence x with + Current Note + Next Note + Nexttonext
CRF model Ө. The likelihood probability pӨ(y|x) is ex Note
pressed as follows. • Current Note/Duration
• Previous Note/Duration + Current
Note/Duration + Next Note/Duration
• Previoustoprevious Note/Duration + Pre
vious Note/Duration + Current Note/Dura
tion + Next Note/Duration + Nexttonext
where λk and μk are parameters from CRF model θ Note/Duration.
and fk and gk are the binary feature functions that we
need to give for training the CRF model. This is
42
4.3.2 Word Boundary pus. The Bigram list contains only the frequency of
occurrence.
Another important aspect in analyzing the melody is
to spot potential word boundaries. While in many 5.2 Graph Construction
cases, the presence of bars could indicate potential
word boundaries, there are also cases where a given Given an input pattern (say 'KMKM NKM NKN'),
word can span a bar (especially due to the presence we construct a directed graph with the list of words
of Ties). Hence, we need to explicitly train our sys satisfying each pattern, as represented by Figure 2.
tem to identify potential word boundaries. The fea
tures used to identify the boundaries of words are
mostly the same as for learning the KNM labels, but
with the addition of considering two more previous
notes along with their durations.
4.3.3 Sentence Boundary
Figure 2. Graph Construction
As with Word Boundary ,we cannot assume sen
tence boundaries based on the musical notation and The edge from word Wij (of, say pattern KMKM) to
hence we also train our system to identify potential Wrs (of, say pattern NKM) is weighted based on the
sentence boundaries. Sentence boundary identifica frequency values collected from the corpus and is
tion happens after the word boundaries are identi calculated as follows:
fied and hence this additional feature is used along
with the abovementioned features for sentence # (Wij followed by Wrs)
boundary training. P(Wrs / Wij) = ___________________ (Eqn. 1)
# (Wij)
5 Sentence Generation
Since the Shortest Path Algorithm picks the path
The goal of the Sentence Generation module is to with the least cost, we need to weight the edges in
generate a meaningful phrase that matches the input such a way that a higher probability sequence gets
pattern given in KNM scheme. For example, given the least cost (C). Thus, we measure Cost(Wrs/Wij)
an input pattern such as 'KMKM NKM NKN', it as:
should generate a phrase consisting of three words
each of them matching their respective pattern. Cost(Wrs/Wij) = 1 P(Wrs/Wij) (from Eqn. 1) (Eqn. 2)
5.1 Corpus selection and preprocessing
By default, the cost from the START node to the
Since we are interested in generating lyrics for first list of words and the cost from the last list of
melody, the corpus we chose consisted mainly of words to END node is fixed as 1.
poems and short stories. The only preprocessing
5.3 Preferential selection of paths
involved was to remove any special characters
(such as “( ), $ % &, etc.) from the text. From this One of the shortcomings of using the Shortest Path
corpus, we index all Unigram and Bigram of Algorithm is that, for the given input pattern and the
Words. Each word is marked with its KNM syllable given Corpus, the algorithm will always generate
pattern and their frequency of occurrence in the cor the same phrase (with the least cost). In addition to
43
this problem, when the melody demands, we need and 'veyil' (sun). As can be seen, both the words
to generate rhyming words. Lastly, we need to han have the suffix 'yil' common with the input word.
dle the case where the corpus may not have a phrase Thus, as earlier, the cost of the edges in the paths
that matches the complete pattern. We tackle all the leading to such rhyming words will be set to 0, thus
above issues by biasing the Shortest Path Algorithm biasing the algorithm to pick these paths. One an
by changing the cost of the edges. other way would be have only those nodes corre
sponding to the rhyming words (discarding other
5.3.1 Bias initial word nonrhyming words). However, in the case where
no rhyming words are present in the corpus, this ap
In order to generate different phrases for the same
proach can lead to a graph with an incomplete path.
pattern (say KMKM NKM NKN), we pick a random
Hence we use the approach of biasing the graph
word that matches the initial pattern (KMKM) and
paths that can pick the rhyming words, if present
fix the cost of the edge from START to the random
and provide a nonrhyming word, if none was avail
word to 0. As the default cost from START to all
able.
leading words is 1, this biases the algorithm to find
a pattern that starts with the random word. Howev 5.3.3 EditDistance Matching
er, if there exists a phrase, whose “overall cost” is
still less than the one starting with the random There can also be cases when there is no phrase that
phrase, the algorithm will output the same phrase. exactly matches the given input pattern sequence,
In order to avoid this, we provide multiple random though the corpus might contain individual words
words and pick the one that truly generates a unique matching each pattern in the sequence. In this case,
phrase. we relax the matches using the EditDistance met
ric. Thus, for the given pattern NKN, we also list
5.3.2 Rhyming Words words that match NKK, KKN, etc. Since the input
When there is a need to generate phrases that rhyme patterns are deemed to fit the given melody, an
with any previously generated phrases, especially in EditDistance Matching can turn up words that need
line endings, we use the same biasing technique to not match the given melody and hence should be
prefer certain words over others. The motivation to used only when there are no phrases matching the
concentrate on line endings is based on our assump input pattern. Another approach, though practically
tion that the notes in melody would be similar for not possible, is to have a “big enough corpus” that
the rhyming words and thus our representation contains at least one phrase matching each pattern.
scheme involving Mei(M) (consonants) would han
6 Experiments
dle the stressed syllables. The path finding algo
rithm, can take as input a word and a position, with We conducted the experiments as two separate
which the new phrase should rhyme in the given po steps, one for the CRF engine and another for the
sition. In this case, we generate all the words that Sentence Generation module.
rhyme with the given word by using the Maximum For the CRF engine, we collected and used Tamil
substring matching technique. That is, the word film songs' tune and lyrics, as they were easily
with the maximum substring common to the input available from the web. The tunes were converted
word, in word endings, is considered as a rhyming to the ABC notation and their lyrics were converted
word. For example, given an input word 'kOyil' to the KNM representation scheme. The notes from
(temple), the rhyming words would be 'vAyil' (gate) the tune and the syllables in the corresponding lyric
44
(in their respective representation schemes) were hence it is run after the word boundary identifica
manually mapped with each other. An example tion is complete. Thus, the input to the CRF engine
training file for the CRF engine for learning the in this case would be like the one in (Table 3), with
KNM representation scheme is presented below: labels corresponding to sentence boundary such as
SB (sentence beginning) and SI (sentence interme
Note Duration Label diate):
B ½ K
Note Duration Word Sentence
C ½ N
Boundary Boundary
B ½ K
B ½ WB SB
A ½ KM
c ½ WI SI
G ½ K
B ½ WI SI
0 tie
A ½ WI SI
[ 0 open
G ½ WB SI
A ½ pre
0 Tie SI
G ½ K
[ 0 Open SI
] 0 close
A ½ Pre SI
B 4 K
G ½ WI SI
Table 1. KNM scheme learning – training file
] 0 close SI
B 4 WI SB
Similarly, for the word boundary identification, the
Table 3. Sentence boundary learning – training file
same input is used but with the labels corresponding
to word boundaries such as WB (word beginning),
For the Sentence Generation module, we used
WI (word intermediate), etc. (Table 2):
short stories, poems and Tamil lyrics across various
themes such as love, appreciation of nature, patrio
Note Duration Label
tism, etc. From this, all the special characters were
B ½ WB
removed and the list of Unigram and Bigram Words
C ½ WI
were collected along with their frequencies.
B ½ WI Based on the limited experiments performed on
A ½ WI the trained CRF model, we observe that the feature
G ½ WB set presented for Syllable identification seem to per
0 Tie form reasonably and identifies the syllables with
[ 0 open 70% accuracy for manually tagged melodies. How
A ½ pre ever, we could not objectively evaluate the Word
G ½ WI
and Sentence Boundary identification process as the
resulting boundaries can also be considered as valid
] 0 close
boundaries. In general, the word and sentence
B 4 WI
boundaries are the choice of the lyricist and hence
Table 2. Word boundary learning – training file
the results can be considered as another valid way
to generate lyric. Also, we feel that the number of
For sentence boundary identification, the output
training samples (10 melodies) supplied for training
from the word boundary identification is used and
45
the CRF engine is very less for it to reasonably ating lyrics that provide a coherent meaning. Also,
learn the nuances that are present in realword experiments can be conducted with different do
lyrics. main corpus to generate lyrics for a given situation
Some of the syllable patterns identified from the (such as Love, Death, Travel, etc.) Other sentence
tune and the corresponding sentences generated are generation strategies, such as an Evolutionary Algo
given below: rithm (as suggested in (Manurung, 2004)) can also
Pattern: 'KK KK KKK be attempted. Once a coherent meaningful lyric is
NKKM KMKK' generated, further improvements can be towards in
Output: ஒர சற வயத corporating poetic aspects in the lyric.
ஞாபகம வநதத
Translation: In small age References
I recollected
Demeter. 2001. How to write Lyrics
http://everything2.com/index.pl?node=How to write
As the syllable patterns get longer, we had to re lyrics.
sort to using Edit Distance in order to find matching Guido Gonzato. 2003. The ABCPlus Project http://abc
sentences. One such output is presented below: plus.sourceforge.net.
Pattern: ’KMKMKM KMKM NKN Hanna M. Wallach. 2004. Conditional Random Fields:
NKMKM NMKKM NKN’ An Introduction. Technical Report MSCIS0421.
Output: தமழல இஙக காணலாம Department of Computer and Information Science,
எனற மைைபபடன ொொாலல University of Pennsylvania.
Translation: We can see here in Tamil Hisar Maruli Manurung. 2004. An evolutionary ap
Proclaiming aloud proach to poetry generation. Ph.D. Thesis, University
of Edinburg.
7 Limitations and Future Work Hugo R. Goncalo Oliveira, F. Amilcar Cardoso, and F.
Camara Perreira. 2007. TraLa Lyrics: An approach
From the initial set of experiments, we see that it is to generate text based on rhythm. Proceedings of the
possible to generate a syllable pattern that closely Fourth International Joint Workshop on Computa
matches the input tune. Currently, we do not consid tional Creativity, IJWCC'07, London:4754.
er the identification of strong beats in the melody Hugo R. Goncalo Oliveira, F. Amilcar Cardoso, and F.
and are expecting the presence of Mei (M) to take Camara Perreira. 2007. Exploring difference strate
gies for the automatic generation of song lyrics with
care of stressed syllables. We also expect the same
TraLa Lyrics. Proceedings of the Portuguese Confer
strategy to work for other South Indian languages as
ence on Artificial Intelligence (EPIA 2007),
well. The current Lyric Generation algorithm is Guimarães, Portugal:5768
simplistic, in that it can generate short meaningful John Lafferty, Andrew McCallum, and Fernando
phrases, but generating longer phrases require Pereira. 2001. Conditional Random Fields: Proba
adding constraints (such as closest matching pat bilistic Models for Segmenting and Labeling Sequence
terns) that defeats the purpose of matching with the data. Proceedings of the Eighteenth International
tune. Also, the current method generates phrases Conference on Machine Learning (ICML 2001),
that are independent of the previous phrases. This Williams College, Willamstown, MA, USA:282289
leads to lyrics that are meaningful in parts, but Taku Kudo. 2005. CRF++: Yet Another CRF toolkit.
meaningless on the whole. http://crfpp.sourceforge.net.
Thomas H. Cormen., Charles E. Leiserson., Ronald L.
Future work can involve introducing “semantic
Rivest. 1990. Introduction to Algorithms. Prentice
similarity” across phrases in a lyric, thereby gener
Hall: 527531.
46