You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/305740201

Lyrics Word Clouds

Conference Paper · July 2016


DOI: 10.1109/IV.2016.27

CITATIONS READS
3 767

6 authors, including:

Michael Burch Thomas Walzer


University of Applied Sciences of the Grisons FHGR (Fachhochschule Graubünden) Hochschule Reutlingen
304 PUBLICATIONS   4,094 CITATIONS    10 PUBLICATIONS   24 CITATIONS   

SEE PROFILE SEE PROFILE

Uwe Kloos
Reutlingen University, Germany
39 PUBLICATIONS   340 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Eye Tracking in Public Transport Systems View project

Web-Based Interactive Hierarchy Visualization View project

All content following this page was uploaded by Michael Burch on 25 April 2019.

The user has requested enhancement of the downloaded file.


Lyrics Word Clouds
Michael Burch, Tobias Fluck, Julian Freund, Thomas Walzer, Uwe Kloos, and Daniel Weiskopf
VISUS, University of Stuttgart and Reutlingen University
Stuttgart and Reutlingen, Germany
Email: {michael.burch, daniel.weiskopf}@visus.uni-stuttgart.de
{tobias.fluck, julian.freund, thomas.walzer, uwe.kloos}@reutlingen-university.de

Abstract—Plenty of songs are composed, written, and released • Content: A song is searched for which the textual content
every year being recorded in a variety of databases. Those is known. Consequently, not the complete song has to be
do not only store the audio, but also additional data like the listened to, but we are interested in a more rapid detection.
artists’ names, the years of release, the lengths of the songs,
or number of visits, comments, and remarks of visitors and If several songs have to be explored rapidly this concept
the like. However, another important data is the lyrics, i.e., the refers to as weighted browsing (we know what to look
textual content which can give insights about the topic, genre, for).
or intention of the musicians. Getting an overview about the • Comparison: Comparing two or more songs can also be
textual content of a song, i.e., the lyrics can become a tedious done by just inspecting corresponding lyrics clouds either
challenge since listening to the songs or reading the texts is a time-
consuming task. To support users of song databases we propose in a small multiples display or by just browsing through a
a visualization tool that is able to generate word clouds from longer list of them. Such a browsing concept can rapidly
lyrics. Interaction techniques are incorporated in the tool to give support at such comparison tasks.
more detailed information about the occurrence of words in a • Completion: Only text snippets are known and a user is
song that finally help to find insights about the genre or to just interested in the songs in which these snippets occur. It
compare the content very rapidly, for example. Our visualization
tool is implemented as a web-based interface, that stores requests, can be helpful to base the search on lyrics clouds since
updates the local tool database based on those requests, and many of them filtered for the text snippets can be shown
finally, provides an interactive visualization for the user. rapidly to reduce the search space. When a matching song
is found the text snippets can be shown in a completed
I. I NTRODUCTION fashion.
Various songs are composed, written, and released every • Combination: A selected song can be inspected by
year, typically stored in music databases like Youtube, Last.fm, looking at the lyrics cloud or by reading the text, i.e.,
or the Free Music Archive to mention a few. ’More than one the lyrics. But as a details on demand feature we can
million songs are released annually (not sure if that number play the song to listen to it finally.
is just North America or global), but needless to say, there • Context: The lyrics cloud only provides an overview
are more songs than anyone can ever listen to in a lifetime if about the word frequencies but not about the distributions.
they did nothing else. No wonder it is so difficult to make a This information can be seen by a highlight function
living in the music space’ has been stated by Jaime Horwitz linking the lyrics to the lyrics cloud.
Rodriguez [1] reflecting that visual analysis can be of great We designed a word cloud-based visualization acquiring
help to get an overview about the data and make browsing and data from LyricWikia [2]. The system is designed in a way
exploration tasks much faster. that it internally and locally stores the preprocessed data in
Getting an overview about the textual content, i.e., the lyrics a database that is mirrored from the original data requested
is a challenging task, in particular for many of those songs. from the LyricWikia webpage. If a user requests a song that
Reading the complete song text or even listening to it becomes is not already stored in our local database it is looked up on
a challenging task when the goal is to get a rapid overview. To the web and the database is updated accordingly. This process
support people with such a task we designed the lyrics word makes requests for frequently asked songs faster since the
clouds which is an interactive visualization technique based on preprocessing step was already applied.
word clouds combined with a brushing and linking between We illustrate the usefulness of our visualization technique
the visualization and a more text-based representation shown in an application scenario in which we ask for a specific song
next to each other. This interactive visualization strategy is that is not already locally stored, but has to be looked up on
helpful to start from an overview and leading to a detailed the LyricWikia webpage. The word cloud overview is shown
representation of the lyrics as a text-based view highlighted as well as how the view can be combined with the original
by already found words or insights. lyrics highlighted with certain user-demanded words.
Our visualization approach is in particular useful for a
number of different scenarios where the user has a certain II. R ELATED W ORK
task in mind that he wishes to solve more rapidly than by just Nowadays we are able to rapidly search for the music
listening to the songs or reading the corresponding lyrics: and the songs we like or we are specifically interested in
by looking them up at web pages like Youtube, Last.fm,
the Free Music Archive, or LyricWikia to mention a few of
a really long list. Those services provide a wealth of data
about the songs but they typically do not provide a good
overview as a starting point for further explorations and insight
detections. Those data bases are more designed in a list-
based textual representation allowing to do specific individual
requests, sometimes enhanced by filtering techniques based on
certain keywords or song features. Visualization techniques are
rarely used to display the data content of those data bases.
In particular, text-based visualizations might be applicable
to such a data set scenario like word clouds that provide
a good overview about the frequently used words in the
lyrics, hence giving a coarse and rapid glimpse about the
content of a song. There are several word cloud layouts and Fig. 1. Illustration of the basic process in the lyrics word cloud application:
improvements focussing on that, like the work of Kaser and The client makes requests to the internal local database. If the requested song
Lemire [3] who try to reduce the white space in HTML-based is not already stored it is further requested from the LyricWikia webpage.
Then the local database is updated.
word clouds and provide more white space balanced versions.
Much of the former work focuses on generating more compact
representations giving more words the chance to be displayed
like Seifert et al. [4] who implemented a more space-filling
version of a layout algorithm applying convex polygons to
define boundary regions. The famous word cloud generation
tool Wordle [5] is applied in modified versions in approaches
like ManyWordle [6] or Rolled-out Wordles [7].
There are further word cloud techniques that make use
of clustering, for example [8], [9], or also layout strategies
typically known in the field of graph drawing and denoted
by force-directed placements [10]. Also the semantic rela-
tionships might be taken into account when laying out the
word cloud [11], [12]. The ThemeScape visualization [13],
for example, is based on topographical landscapes. Fig. 2. The local database contains three tables for managing artist, title, and
word list data.
Some word cloud ideas try to show relationships among
words in a more direct way which is not in focus of our work
but might be an option for future work. For example, in several requests consisting of name of the artists and the title of the
methods [14], [15], [16], [17], [18] those word relations are song to the server (see Figure 1) which is a unique request
made visible by using explicit links, tree representations, or for identifying the data. Next a database request decides about
by applying interactive highlighting techniques. Typically, one the further steps concerning the parsing of the data. In a first
word cloud for each text source is used, but in recent years request it is checked if the song of the requested artist is
researchers tried to find a way to combine those into a single already stored locally. If the song is not already in the local
word cloud [19], [20], [21], [22], [23]. Also time-varying word database a further request is sent to the LyricWikia server. If
clouds became of special interest [24], [15], [25], but since we the song is found in the LyricWikia database the complete
are dealing with static lyrics data we only exploit approaches HTML file will be send to the client where a parser is applied
for word clouds without specific focus on time dependency. to extract the important data for further processing.
III. DATA H ANDLING
B. Preprocessing
In this work we primarily deal with lyrics data but treat other
data sources like song writer names, lengths of the songs, year The parser is implemented as a PHP script running on the
of release and so on as secondary data that can be shown on server that processes the via POST sent input data of the client.
user’s demand. The lyrics data is used to generate an overview It is first tested if the input data leads to a result. An internal
about each song to support the rapid finding of keywords, i.e., database request decides the further steps done by the parser.
interesting words in the textual data. Is the song found in the database the lyrics as well as the word
list with the occurrence frequencies are sent back to the client
A. Data Extraction and Requests based on a JSON format. If the database request is without
We implemented a simple client-server architecture that is in result both parameters are sent to LyricWikia as a normal URL
particular advantegeous for a web-based tool. The client sends request as CURL. After this step the complete HTML page is
Fig. 3. Song requests can be done in a trivial way just as Google requests. To start the interactive visualization we just have to type in an interpret and a
corresponding song. The data is requested, preprocessed, and finally, visualized.

transformed into a DOM object which is finally parsed into A. Visual Design
the lyrics only.
To benefit from simple usability as a starting point for
Due to the fact that the structure of the LyricWikia pages further interactions we just provide two text fields similar to
is always the same we exploit unique character sequences requesting information using Google (see Figure 3). The user
enclosing the lyrics data to extract the textual data and to starts typing in a title plus an author and the tool tries to find
write it into a variable. If this variable remains empty, a the corresponding data in the local database. If this request
response is sent to the client that tells that there is no does not lead to a result, a request to the LyricWikia webpage
entry for the transmitted parameters. If the variable is not is activated.
empty, the corresponding text will be cleaned and preprocessed
If the data for the respective request is found it is sent
for further analysis and visualization steps. As a next step
back, preprocessed, and a word cloud is generated. For the
two variables are required that store the words with their
word cloud generation, the tool takes into account the already
occurrence frequencies in a JSON format and the original
preprocessed text into words attached to their occurrence
lyrics text. By this subdivision into two variables we are able
frequencies. The preprocessing is important since this is ben-
to integrate interaction techniques on the client side.
eficial for interaction, i.e., the lyrics data does not have to be
If all those steps are processed, database requests are preprocessed again each time if someone does a new request
executed and it is checked if the artist is already stored in for the same song.
the database, in a negative case it will be added. Moreover,
The visualization shows the lyrics text on the left hand side
the song title, the lyrics, and the word list are also added.
while the word cloud is displayed at the right hand side. This
gives an overview about both, the actual text in its real form
C. Local Database and a word frequency-based and color coded word cloud.
The color coding can already be used as visual linking to
We use a database with three tables (see Figure 2 for more understand both the overview about the word frequencies and
details), i.e., the artist, the song title, and the word list are where in the text those occur.
required. The table for the artist consists of a sequence of
identification numbers. Each number is generated when first
B. Interaction Techniques
writing the artist to the database, meaning the number becomes
unique giving the primary key. Also the name of the artist is Our visualization tool supports several simple interaction
stored. The second table contains the song data that reflects all techniques with which the number of displayed words can be
attributes attachable to a song such as title, lyrics text, length, reduced and with which views can be interactively linked.
year of release and the like. Again, an identification number is • Song request: The most important interaction technique
generated. A third table stores the word list with the occurence is the request for songs that are preprocessed first, inter-
frequencies for generating the word clouds. nally stored, and then transformed into a corresponding
lyrics word cloud.
IV. V ISUALIZATION T ECHNIQUE • Brushing and linking: Words from the lyrics word cloud
can be selected (brushing) that are then highlighted in
For visually encoding the lyrics data we make use of a the lyrics view (linking). This can be done for each word
word cloud in a user defined word cloud layout. This serves separately or by displaying all corresponding words in
as overview representation for the lyrics data in which the user the same color coding for a visual linking.
can interact to get more insights about the textual data. Several • Filtering: Once the song data is preprocessed it can be
interaction techniques are implemented to further support a filtered for word occurrences, substrings, and/or word
viewer with a visual analysis. lengths, for example.
Fig. 4. A request for ACDC and the song Highway to Hell results in the following lyrics word cloud.

• Word cloud layouts: Different word cloud layouts can • Details on demand: Hovering over a word gives details
be generated like spiral (default), Archimedian, or rect- about the occurrence frequency while additional requests
angular layout. can give details about artists, length, album, or year of
• Frequency encoding: The word occurrence frequency release to mention a few.
can be mapped to three different word size functions: • PNG export: The produced word cloud can be exported
logarithmic, square root, and proportional to the number as a PNG image file.
of words.
• Stop word elimination: Apart from generating a word V. A PPLICATION E XAMPLE
cloud from the complete lyrics, users are also supported In this section we show a stepwise illustration of our lyrics
by a stop word elimination process. This typically reduces data extraction and visualization tool. The first step demands
the number of displayed words without any meaning. for opening a browser and typing in the URL of the Lyrics
• Word highlighting: Selecting words by mouse clicks Word Clouds tool. A text field is shown as demonstrated in
leads to highlighting them in the corresponding lyrics text Figure 3 where we can type in artist and song name, then
in the same color coding as in the word cloud. press the button Generate Wordcloud.
• Word color coding: The user can select a color coding The artist and the song is requested and the data is pre-
for each of the words in the word cloud while a suitable processed into word occurrence frequencies. From the prepro-
color coding is chosen as a default setting right from the cessed data a lyrics word cloud is generated. In this specific
beginning. scenario we requested the song Highway to Hell from ACDC
• Song replay: Songs can be played as an extra details on and obtained the lyrics word cloud shown in Figure 4 with
demand feature. the actual lyrics text placed to the left hand side (Figure 5). In
a next step the user can decide to use the same color coding
Fig. 5. The lyrics text is shown on the left hand side while the lyrics word cloud is displayed to the right hand side.

Fig. 6. Color coding both views simultaneously supports visual comparison tasks.

for the lyrics text as in the lyrics word cloud. This supports •Data preprocessing/filtering: The data preprocessing
the visual comparison of the lyrics and the word occurrence and filtering demands for fast techniques that support
frequencies (Figure 6). interactivity.
By having a look at the lyrics word cloud for this song • Data storage: Since vast amounts of song data exist, a
we can for example try to get an overview about the word mirroring of the data to a local database demands for an
occurrences that we refer to as content. If other songs (of the efficient storage process.
same artists or of others) are requested, those can be visually • Runtime environment on client side: The runtime
compared by inspecting the word clouds, i.e., a comparison environment on the client side is important to let the lyrics
is done. If only text snippets are known like highway and hell word cloud run efficiently.
as in this example scenario we might be interested in seeing The algorithmic and performance scalability is in particular
other words and how frequent they occur in the lyrics, i.e., a important for achieving an interactive visualization.
completion is done. Reading the lyrics and visually inspecting • Word extraction: Efficiently extracting the words from
the connections between the text and the lyrics word cloud a lyrics text and counting the occurrence frequencies has
leads to a combination. Also the distributions of the word to be done once for each requested song.
frequencies can be explored in both the word cloud and the • Layout generation: Layout algorithms are important for
lyrics, i.e., the context can also be observed. the word cloud generation. Producing word clouds with
reduced white space regions can produce space-filling
VI. D ISCUSSION AND L IMITATIONS lyrics word clouds.
• Word accessibility: Achieving a fast brushing and link-
Various implementation challenges can occur while devel- ing interaction feature demands for accessing words
oping the lyrics word cloud visualization. rapidly.
• Web technologies: We had to understand and combine • Word color coding: The initial color coding of the words
several web technologies like HTML5, CSS 3, JavaScript, in the word cloud can be challenging since similar color
SVG as well as the JavaScript libraries D3.js and jQuery. hues can be perceptually difficult to observe.
Visual scalability is crucial to display lyrics word clouds [6] K. Koh, B. Lee, B. Kim, and J. Seo, “ManiWordle: Providing flexible
containing many different words, but the more words are in control over Wordle,” IEEE Transactions on Visualization and Computer
Graphics, vol. 16, no. 6, pp. 1190–1197, 2010.
use the more challenging also algorithmic and performance [7] H. Strobelt, M. Spicker, A. Stoffel, D. Keim, and O. Deussen, “Rolled-
scalability will become. out Wordles: A heuristic method for overlap removal of 2D data
representatives,” Computer Graphics Forum, vol. 31, no. 3, pp. 1135–
• Number of displayed words: The number of displayed
1144, 2012.
words has an influence on the interpretability of the word [8] Y. Hassan-Montero and V. Herrero-Solana, “Improving tag-clouds as
cloud. The mental map is important for visual compar- visual information retrieval interfaces,” in Proceedings of International
Conference on Multidisciplinary Information Sciences and Technologies,
isons which gets more challenging if the clouds contain 2006, pp. 25–28.
more and more words. Moreover, words may appear at [9] J. Schrammel, M. Leitner, and M. Tscheligi, “Semantically structured tag
different positions in different clouds demanding for a clouds: an empirical evaluation of clustered presentation approaches,”
in Proceedings of SIGCHI Conference on Human Factors in Computing
search task depending on the number of displayed words. Systems, ser. CHI ’09, 2009, pp. 2037–2040.
• Word cloud layout: The layout (spiral, Archimedian, [10] Y.-X. Chen, R. Santamarı́a, A. Butz, and R. Therón, “Tagclusters:
rectangular) also has an influence on visual scalability. Semantic aggregation of collaborative tags beyond tagclouds,” in Pro-
ceedings of the 10th International Symposium on Smart Graphics.
The user can interactively switch between several layouts. Springer, 2009, pp. 56–67.
• Complexity of datasets: The number of extracted words [11] Y. Wu, T. Provan, F. Wei, S. Liu, and K.-L. Ma, “Semantic-preserving
as well as the distribution of the word frequencies has an word clouds by seam carving,” Computer Graphics Forum, vol. 30, no. 3,
pp. 741–750, 2011.
impact on the lyrics word cloud generation. If the differ- [12] F. V. Paulovich, F. M. B. Toledo, G. P. Telles, R. Minghim, and L. G.
ence between frequencies is too large, also a logarithmic Nonato, “Semantic wordification of document collections,” Computer
or square root function can be applied to the occurrence Graphics Forum, vol. 31, no. 3, pp. 1145–1153, 2012.
[13] K. Fujimura, S. Fujimura, T. Matsubayashi, T. Yamada, and H. Okuda,
number. “Topigraphy: visualization for large-scale tag clouds,” in Proceedings of
• Interpretability of word clouds: In a word cloud we International Conference on World Wide Web, 2008, pp. 1087–1088.
typically only show individual words, longer word se- [14] M. Stefaner, “Visual tools for the socio-semantic web,” Master thesis,
University of Applied Sciences Potsdam, 2007.
quences have to be inspected in the corresponding lyrics [15] S. Lohmann, M. Burch, H. Schmauder, and D. Weiskopf, “Visual analy-
text. Maybe, word clouds for longer text snippets might sis of microblog content using time-varying co-occurrence highlighting
be worth investigating to get a better overview about the in tag clouds,” in Proceedings of the International Working Conference
on Advanced Visual Interfaces, ser. AVI ’12. ACM, 2012, pp. 753–756.
semantics. [16] F. Heimerl, S. Lohmann, S. Lange, and T. Ertl, “Word cloud explorer:
Text analytics based on word clouds,” in Proceedings of the 47th Hawaii
VII. CONCLUSION AND FUTURE WORK International Conference on System Sciences. IEEE, 2014, pp. 1833–
In this paper we illustrated how lyrics data can be requested 1842.
[17] P. Gambette and J. Véronis, “Visualising a text with a Tree Cloud,” in
from the LyricWikia database and how it can be transformed Proceedings of IFCS Biennial Conference and 33rd Annual Conference
into a lyrics word cloud. To reach this goal, the textual data of the Gesellschaft für Klassifikation e.V. Springer, 2010, pp. 561–569.
has to be preprocessed first and stored to a local database [18] M. Burch, S. Lohmann, D. Pompe, and D. Weiskopf, “Prefix tag
clouds,” in Proceedings of 17th International Conference on Information
to guarantee better and faster interactivity. Those interaction Visualisation, 2013, pp. 45–50.
techniques are integrated in our lyrics word cloud, for example, [19] C. Collins, F. B. Viégas, and M. Wattenberg, “Parallel Tag Clouds
to visually connect the lyrics to the overview-based word to explore and analyze faceted text corpora,” in Proceedings of IEEE
Symposium on Visual Analytics Science and Technology, ser. VAST ’09.
cloud visualization. We illustrate the usefulness of the tool IEEE, 2009, pp. 91–98.
by applying it to a song that has to be requested first from [20] R. Vuillemot, T. Clement, C. Plaisant, and A. Kumar, “What’s being
LyricWikia. We described limitations and challenges con- said near ”Martha”? Exploring name entities in literary text collections,”
in Proceedings of IEEE Symposium on Visual Analytics Science and
cerning implementation details, algorithmic and performance Technology, ser. VAST ’09. IEEE, 2009, pp. 107–114.
issues, and visual scalability. For future work, we plan to also [21] M. Burch, S. Lohmann, F. Beck, N. Rodriguez, L. D. Silvestro, and
integrate the music notes into our visualization although this D. Weiskopf, “Radcloud: Visualizing multiple texts with merged word
clouds,” in 18th International Conference on Information Visualisation,
kind of data is difficult to obtain. Also feedback of music 2014, pp. 108–113.
database users might be worth investigating while also more [22] F. B. Viegas, M. Wattenberg, F. van Ham, J. Kriss, and M. McKeon,
algorithmic analyses should be implemented making the lyrics “ManyEyes: A site for visualization at internet scale,” IEEE Transactions
on Visualization and Computer Graphics, vol. 13, no. 6, pp. 1121–1128,
word cloud technique a visual analytics approach. 2007.
[23] S. Lohmann, F. Heimerl, F. Bopp, M. Burch, and T. Ertl, “Concentri-
R EFERENCES Cloud: Word cloud visualization for multiple text documents,” in Pro-
[1] J. H. Rodriguez, 2016. [Online]. Available: https://www.quora.com/ ceedings of 19th International Conference on Information Visualisation,
How-many-song-tracks-are-created-every-year IV 2015, 2015, pp. 114–120.
[2] “Lyricwikia web page,” 2016. [Online]. Available: http://lyrics.wikia. [24] B. Lee, N. H. Riche, A. K. Karlson, and S. Carpendale, “SparkClouds:
com/wiki/Lyrics Wiki Visualizing trends in tag clouds,” IEEE Transactions on Visualization
[3] O. Kaser and D. Lemire, “Tag-Cloud Drawing: Algorithms for cloud and Computer Graphics, vol. 16, no. 6, pp. 1182–1189, 2010.
visualization,” in Proceedings of Workshop on Tagging and Metadata [25] W. Cui, Y. Wu, S. Liu, F. Wei, M. X. Zhou, and H. Qu, “Context-
for Social Information Organization, 2007. preserving, dynamic word cloud visualization,” IEEE Computer Graph-
[4] C. Seifert, B. Kump, W. Kienreich, G. Granitzer, and M. Granitzer, ics and Applications, vol. 30, no. 6, pp. 42–53, 2010.
“On the beauty and usability of tag clouds,” in Proceedings of the 12th
International Conference on Information Visualisation, 2008, pp. 17–25.
[5] J. Feinberg, “Wordle,” in Beautiful Visualization, J. Steele and N. Iliin-
sky, Eds. O’Reilly, 2010, pp. 37–58.

View publication stats

You might also like