You are on page 1of 54

Adaptive Resonance Theory in Social

Media Data Clustering Roles


Methodologies and Applications Lei
Meng
Visit to download the full and correct content document:
https://textbookfull.com/product/adaptive-resonance-theory-in-social-media-data-clust
ering-roles-methodologies-and-applications-lei-meng/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Advances in Data Science: Methodologies and


Applications Gloria Phillips-Wren

https://textbookfull.com/product/advances-in-data-science-
methodologies-and-applications-gloria-phillips-wren/

Big and Complex Data Analysis Methodologies and


Applications Ahmed

https://textbookfull.com/product/big-and-complex-data-analysis-
methodologies-and-applications-ahmed/

Orbital Data Applications for Space Objects Conjunction


Assessment and Situation Analysis 1st Edition Lei Chen

https://textbookfull.com/product/orbital-data-applications-for-
space-objects-conjunction-assessment-and-situation-analysis-1st-
edition-lei-chen/

Handheld Total Chemical and Biological Analysis


Systems: Bridging NMR, Digital Microfluidics, and
Semiconductors 1st Edition Ka-Meng Lei

https://textbookfull.com/product/handheld-total-chemical-and-
biological-analysis-systems-bridging-nmr-digital-microfluidics-
and-semiconductors-1st-edition-ka-meng-lei/
Metaheuristics for Data Clustering and Image
Segmentation Meera Ramadas

https://textbookfull.com/product/metaheuristics-for-data-
clustering-and-image-segmentation-meera-ramadas/

Recent Advances in Hybrid Metaheuristics for Data


Clustering Sourav De (Editor)

https://textbookfull.com/product/recent-advances-in-hybrid-
metaheuristics-for-data-clustering-sourav-de-editor/

Social Theory After The Internet Media Technology And


Globalization Ralph Schroeder

https://textbookfull.com/product/social-theory-after-the-
internet-media-technology-and-globalization-ralph-schroeder/

Mining Social Media Finding Stories in Internet Data


1st Edition Lam Thuy Vo

https://textbookfull.com/product/mining-social-media-finding-
stories-in-internet-data-1st-edition-lam-thuy-vo/

Mining Social Media Finding Stories in Internet Data


1st Edition Lam Thuy Vo

https://textbookfull.com/product/mining-social-media-finding-
stories-in-internet-data-1st-edition-lam-thuy-vo-2/
Advanced Information and Knowledge Processing

Lei Meng
Ah-Hwee Tan
Donald C. Wunsch II

Adaptive Resonance
Theory in Social
Media Data
Clustering
Roles, Methodologies, and Applications
Advanced Information and Knowledge
Processing

Editors-in-Chief
Lakhmi C. Jain, Bournemouth University, Poole, UK, and, University of South
Australia, Adelaide, Australia
Xindong Wu, University of Vermont, USA
Information systems and intelligent knowledge processing are playing an increasing
role in business, science and technology. Recently, advanced information systems
have evolved to facilitate the co-evolution of human and information networks
within communities. These advanced information systems use various paradigms
including artificial intelligence, knowledge management, and neural science as well
as conventional information processing paradigms. The aim of this series is to
publish books on new designs and applications of advanced information and
knowledge processing paradigms in areas including but not limited to aviation,
business, security, education, engineering, health, management, and science. Books
in the series should have a strong focus on information processing—preferably
combined with, or extended by, new results from adjacent sciences. Proposals for
research monographs, reference books, coherently integrated multi-author edited
books, and handbooks will be considered for the series and each proposal will be
reviewed by the Series Editors, with additional reviews from the editorial board and
independent reviewers where appropriate. Titles published within the Advanced
Information and Knowledge Processing series are included in Thomson Reuters’
Book Citation Index and Scopus.

More information about this series at http://www.springer.com/series/4738


Lei Meng Ah-Hwee Tan
• •

Donald C. Wunsch II

Adaptive Resonance Theory


in Social Media Data
Clustering
Roles, Methodologies, and Applications

123
Lei Meng Ah-Hwee Tan
NTU-UBC Research Center of Excellence in School of Computer Science and
Active Living for the Elderly (LILY) Engineering
Nanyang Technological University Nanyang Technological University
Singapore, Singapore Singapore, Singapore

Donald C. Wunsch II
Applied Computational Intelligence
Laboratory
Missouri University of Science and
Technology
Rolla, MO, USA

ISSN 1610-3947 ISSN 2197-8441 (electronic)


Advanced Information and Knowledge Processing
ISBN 978-3-030-02984-5 ISBN 978-3-030-02985-2 (eBook)
https://doi.org/10.1007/978-3-030-02985-2

Library of Congress Control Number: 2018968387

© Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

Scope

Coming into the era of Web 2.0, people are involved in a connected and interactive
Cyberworld, where the emergence of social networking websites has created
numerous interactive sharing and social network-enhanced platforms for users to
upload, comment, and share multimedia content online. It has led to a massive
number of web multimedia documents, together with their rich meta-information,
such as category information, user taggings and comments, and time-location
stamps. Such interconnected but heterogeneous social media data have provided
opportunities for better understanding traditional multimedia data, such as images
and text documents. More importantly, the different types of activities and inter-
actions of social users have pushed the bloom of artificial intelligence (AI) with
machine learning techniques, which shifts the typically data-centric research on
multimedia understanding to the user-centric research on social user understanding
and numerous personalized services, such as user profiling, group-based social
behavior analysis, community and social trend discovery, and various social rec-
ommender systems based on users’ online behaviors, friendship networks, prefer-
ence inferences, etc. Additionally, recent advances in mobile devices tend to link
people with both the cyber and physical worlds, introducing a new topic of
online-offline analysis into the current form of social network analytics. All these
changes pose new questions and open challenges, and increase the needs for new
forms of machine learning techniques.
Clustering is an important approach to the analysis and mining of social media
data to fulfill the aforementioned tasks. However, contrary to traditional multimedia
data, information from the social media data is typically massive, diverse, hetero-
geneous, and noisy. These characteristics of social media data raise new challenges
for existing clustering techniques, including the scalability for big data, the ability
to automatically recognize data clusters, the strategies to effectively integrate data

v
vi Preface

from heterogeneous resources, and the robustness to noisy features and ill-featured
patterns. Besides, online learning capability becomes a necessity in situations for
analyzing social media streams and capturing the evolving characteristics of social
networks and the underlying information. Moreover, social media data often has a
diverse range of topics while users typically have their own preferences for topics
hidden in the large amount of social media data, making incorporating user pref-
erences into the clustering process important to produce personalized results.
This book is aware of the opportunities and challenges for clustering algorithms,
and is therefore aimed at systematically introducing frontiers in modern social
media analytics and presenting a class of clustering techniques based on adaptive
resonance theory (ART) for the fast and robust clustering of large-scale social
media data. With applications in a range of social media mining tasks, this book
demonstrates that these algorithms can handle one or more of the aforementioned
challenges with characteristics such as linear time complexity to scale up for big
data, online learning capability, automatic parameter adaptation, robustness to noisy
information, heterogeneous information fusion, and the ability to incorporate user
preferences.

Content

This book has two parts: Theories (Part I) and Applications (Part II). Part I includes
three chapters on background and algorithms, where
• Chapter 1: introduces the characteristics of social media data, the roles and
challenges of clustering in social media analytics, and the authors’ approaches
based on the adaptive resonance theory (ART) to the aforementioned
challenges.
• Chapter 2: offers a literature review on typical types of clustering algorithms
(potentially) applicable to social media analytics, and the key branches of
clustering-based social media mining tasks.
• Chapter 3: is the cornerstone of this book, which proves the clustering mech-
anism of ART and illustrates a class of clustering algorithms based on ART that
handles the characteristics of different types of social media data for clustering.
In contrast, Part II provides real-world case studies on the major directions of
social media analytics using the ART-based solutions, where
• Chapter 4: investigates clustering the surrounding text (title, description, com-
ments, etc.) of user-posted images for personalized web image organization.
• Chapter 5: explores clustering composite Socially-enriched multimedia data, of
which each data item is (in part) described with different types of data, such as
images, surrounding text, and user comments.
Preface vii

• Chapter 6: presents a study on detecting user groups on social networks, where


the users with shared interests are discovered using their online posts and
behaviors, such as likes, sharing, and re-posting.
• Chapter 7: depicts a clustering-based approach to indexing and retrieving
multimodal data in an online manner, with an application for building a mul-
timodal e-commerce product search engine.
• Chapter 8: provides the conclusion for this book.

Audience

This book provides an up-to-date introduction on state-of-the-art clustering tech-


niques and the associated modern applications of social media analytics. It also
presents a class of clustering algorithms based on adaptive resonance theory
(ART) to address the challenges in social media data clustering.
The social web is growing in popularity and providing new forms of commu-
nication of the social Web, so this book is expected to serve as a starting tutorial for
researchers who are interested in clustering, ART, and social media mining, an
extensible research basis for further exploration, and a place to find practical
solutions to real-world applications on social media analytics.
This book will benefit readers from the following aspects:
1. Up-to-date Cutting-edge Research: This book summarizes state-of-the-art
innovative research on clustering and social media analytics in the 2010s,
published in top-tier and reputable conferences and journals across areas of
machine learning, data mining, and multimedia. The content of the book is
therefore valuable to fresh PhD students and researchers in the aforementioned
areas.
2. Fundamental Breakthrough in ART: Adaptive resonance theory (ART) has
been widely explored for both academia and industrial engineering applications,
with its fundamental papers cited over 13k times. Initiatives presented in this
book on the discovery and theoretical demonstration of the learning mechanism
of ART for clustering will attract researchers and practitioners working with
ART in related areas, such as computer science, cognitive science, and
neuroscience.
3. Extensible Research Basis: This book illustrates trajectories on how to develop
ART-based clustering algorithms for handling different social media clustering
challenges, in terms of motivation, methodology, theoretical foundations, and
their associations. It will help readers fully understand the research intentions of
this book and form a basis for researchers to follow and provide their own
contributions.
4. Practical Technical Solutions: Driven by real-world challenges, this book
illustrates ART-based algorithms using real-world applications with experi-
mental demonstration. Readers will systematically learn step-by-step procedures
viii Preface

to tackle real-world problems in social media data clustering, in terms of


algorithm design, implementation tradeoffs, and engineering considerations.
Therefore, this book will be interesting to researchers and practitioners,
searching for technical solutions for quick research and project setup.

Singapore Lei Meng


Singapore Ah-Hwee Tan
Rolla, USA Donald C. Wunsch II
Acknowledgments

This research is supported in part by the National Research Foundation, Prime Minister’s
Office, Singapore under its IDM Futures Funding Initiative and administered by the
Interactive and Digital Media Programme Office; the Ministry of Education Academic
Research Fund (MOE AcRF), Singapore, the DSO National Laboratories, Singapore
under research grant numbers DSOCL11258 and DSOCL16006; and the National
Research Foundation, Prime Ministers Office, Singapore under its IRC@Singapore
Funding Initiative.
Partial support for this research is also received from the Missouri University of
Science and Technology Intelligent Systems Center, the Mary K. Finley Missouri
Endowment, the Lifelong Learning Machines program from DARPA/Microsystems
Technology Office, and the Army Research Laboratory (ARL); and it was
accomplished under Cooperative Agreement Number W911NF-18-2-0260. The
views and conclusions contained in this document are those of the authors and
should not be interpreted as representing the official policies, either expressed or
implied, of the Army Research Laboratory or the U.S. Government. The U.S.
Government is authorized to reproduce and distribute reprints for Government
purposes notwithstanding any copyright notation herein.
The authors would like to thank the NTU-UBC Research Center of Excellence in
Active Living for the Elderly (LILY), the School of Computer Science and
Engineering (SCSE), Nanyang Technological University (NTU), and the
Department of Electrical & Computer Engineer, Missouri University of Science and
Technology (Missouri S&T), for their efforts in providing an ideal environment for
research. The authors would also like to express an intellectual debt of gratitude for
many generous mentors, especially Stephen Grossberg and Gail Carpenter for the
development of Adaptive Resonance Theory and related neural networks archi-
tectures upon which this work is built.

ix
Contents

Part I Theories
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Clustering in the Era of Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Research Issues and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Representation of Social Media Data . . . . . . . . . . . . . . . . 5
1.2.2 Scalability for Big Data . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Robustness to Noisy Features . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Heterogeneous Information Fusion . . . . . . . . . . . . . . . . . 8
1.2.5 Sensitivity to Input Parameters . . . . . . . . . . . . . . . . . . . . 8
1.2.6 Online Learning Capability . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.7 Incorporation of User Preferences . . . . . . . . . . . . . . . . . . 9
1.3 Approach and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Outline of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Clustering and Its Extensions in the Social Media Domain . . . . . . . . 15
2.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.2 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.3 Graph Theoretic Clustering . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.4 Latent Semantic Analysis . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.5 Non-Negative Matrix Factorization . . . . . . . . . . . . . . . . . 18
2.1.6 Probabilistic Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.7 Genetic Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.8 Density-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.9 Affinity Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1.10 Clustering by Finding Density Peaks . . . . . . . . . . . . . . . . 22
2.1.11 Adaptive Resonance Theory . . . . . . . . . . . . . . . . . . . . . . 22

xi
xii Contents

2.2 Semi-Supervised Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23


2.2.1 Group Label Constraint . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.2 Pairwise Label Constraint . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3 Heterogeneous Data Co-Clustering . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Graph Theoretic Models . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Non-Negative Matrix Factorization Models . . . . . . . . . . . 26
2.3.3 Markov Random Field Model . . . . . . . . . . . . . . . . . . . . . 26
2.3.4 Multi-view Clustering Models . . . . . . . . . . . . . . . . . . . . . 27
2.3.5 Aggregation-Based Models . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.6 Fusion Adaptive Resonance Theory . . . . . . . . . . . . . . . . 27
2.4 Online Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.1 Incremental Learning Strategies . . . . . . . . . . . . . . . . . . . 28
2.4.2 Online Learning Strategies . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Automated Data Cluster Recognition . . . . . . . . . . . . . . . . . . . . . . 29
2.5.1 Cluster Tendency Analysis . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.2 Posterior Cluster Validation Approach . . . . . . . . . . . . . . . 30
2.5.3 Algorithms Without a Pre-defined Number of Clusters . . . 30
2.6 Social Media Mining and Related Clustering Techniques . . . . . . . 31
2.6.1 Web Image Organization . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6.2 Multimodal Social Information Fusion . . . . . . . . . . . . . . 33
2.6.3 User Community Detection in Social Networks . . . . . . . . 33
2.6.4 User Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.6.5 Event Detection in Social Networks . . . . . . . . . . . . . . . . 34
2.6.6 Community Question Answering . . . . . . . . . . . . . . . . . . 35
2.6.7 Social Media Data Indexing and Retrieval . . . . . . . . . . . . 35
2.6.8 Multifaceted Recommendation in Social Networks . . . . . . 36
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Adaptive Resonance Theory (ART) for Social Media Analytics . . . . 45
3.1 Fuzzy ART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.1.1 Clustering Algorithm of Fuzzy ART . . . . . . . . . . . . . . . . 45
3.1.2 Algorithm Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 Geometric Interpretation of Fuzzy ART . . . . . . . . . . . . . . . . . . . . 48
3.2.1 Complement Coding in Fuzzy ART . . . . . . . . . . . . . . . . 48
3.2.2 Vigilance Region (VR) . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.3 Modeling Clustering Dynamics of Fuzzy ART Using
VRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 53
3.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 54
3.3 Vigilance Adaptation ARTs (VA-ARTs) for Automated
Parameter Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... 55
3.3.1 Activation Maximization Rule . . . . . . . . . . . . . . . . . ... 56
3.3.2 Confliction Minimization Rule . . . . . . . . . . . . . . . . . ... 57
Contents xiii

3.3.3 Hybrid Integration of AMR and CMR . . . . . . . . . . . . . . . 58


3.3.4 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4 User Preference Incorporation in Fuzzy ART . . . . . . . . . . . . . . . . 71
3.4.1 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.4.2 Geometric Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.5 Probabilistic ART for Short Text Clustering . . . . . . . . . . . . . . . . . 73
3.5.1 Procedures of Probabilistic ART . . . . . . . . . . . . . . . . . . . 74
3.5.2 Probabilistic Learning for Prototype Modeling . . . . . . . . . 75
3.6 Generalized Heterogeneous Fusion ART (GHF-ART)
for Heterogeneous Data Co-Clustering . . . . . . . . . . . . . . . . . . . . . 76
3.6.1 General Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.6.2 Clustering Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.6.3 Robustness Measure for Feature Modality Weighting . . . . 79
3.6.4 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 81
3.7 Online Multimodal Co-indexing ART (OMC-ART)
for Streaming Multimedia Data Indexing . . . . . . . . . . . . . . . . ... 82
3.7.1 General Procedures . . . . . . . . . . . . . . . . . . . . . . . . . ... 82
3.7.2 Online Normalization of Features . . . . . . . . . . . . . . . ... 83
3.7.3 Salient Feature Discovery for Generating Indexing
Base of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.7.4 Time Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . 86
3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Part II Applications
4 Personalized Web Image Organization . . . . . . . . . . . . . . . . . . . . . . . 93
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2 Problem Statement and Formulation . . . . . . . . . . . . . . . . . . . . . . 95
4.3 Personalized Hierarchical Theme-Based Clustering (PHTC) . . . . . 95
4.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.2 PF-ART for Clustering Surrounding Text . . . . . . . . . . . . 96
4.3.3 Semantic Hierarchy Generation . . . . . . . . . . . . . . . . . . . . 99
4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4.1 Evaluation Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4.2 NUS-WIDE Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.4.3 Flickr Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5 Socially-Enriched Multimedia Data Co-clustering . . . . . . . . . . . . . . . 111
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2 Problem Statement and Formulation . . . . . . . . . . . . . . . . . . . . . . 113
xiv Contents

5.3 GHF-ART for Multimodal Data Fusion and Analysis . . . . . . . . . . 114


5.3.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3.2 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.3.3 Learning Strategies for Multimodal Features . . . . . . . . . . 117
5.3.4 Self-Adaptive Parameter Tuning . . . . . . . . . . . . . . . . . . . 118
5.3.5 Time Complexity Comparison . . . . . . . . . . . . . . . . . . . . 119
5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.4.1 NUS-WIDE Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.4.2 Corel Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.4.3 20 Newsgroups Dataset . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6 Community Discovery in Heterogeneous Social Networks . . . . . . . . 137
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.2 Problem Statement and Formulation . . . . . . . . . . . . . . . . . . . . . . 139
6.3 GHF-ART for Clustering Heterogeneous Social Links . . . . . . . . . 139
6.3.1 Heterogeneous Link Representation . . . . . . . . . . . . . . . . . 139
6.3.2 Heterogeneous Link Fusion for Pattern Similarity
Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.3.3 Learning from Heterogeneous Links . . . . . . . . . . . . . . . . 141
6.3.4 Adaptive Weighting of Heterogeneous Links . . . . . . . . . . 142
6.3.5 Computational Complexity Analysis . . . . . . . . . . . . . . . . 143
6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
6.4.1 YouTube Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.4.2 BlogCatalog Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
7 Online Multimodal Co-indexing and Retrieval of Social Media
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.2 Problem Statement and Formulation . . . . . . . . . . . . . . . . . . . . . . 157
7.3 OMC-ART for Multimodal Data Co-indexing and Retrieval . . . . . 158
7.3.1 OMC-ART for Online Co-indexing of Multimodal
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.3.2 Fast Ranking for Multimodal Queries . . . . . . . . . . . . . . . 162
7.3.3 Computational Complexity Analysis . . . . . . . . . . . . . . . . 163
7.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
7.4.2 Evaluation Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.4.3 Parameter Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.4.4 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . 165
7.4.5 Efficiency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Contents xv

7.5 Real-World Practice: Multimodal E-Commerce Product Search


Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.5.1 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7.5.2 Prototype System Implementation . . . . . . . . . . . . . . . . . . 170
7.5.3 Analysis with Real-World E-Commerce Product
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.1 Summary of Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.2 Prospective Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Part I
Theories
Chapter 1
Introduction

Abstract The last decade has witnessed how social media in the era of Web 2.0
reshapes the way people communicate, interact, and entertain in daily life and incu-
bates the prosperity of various user-centric platforms, such as social networking,
question answering, massive open online courses (MOOC), and e-commerce plat-
forms. The available rich user-generated multimedia data on the web has evolved
traditional ways of understanding multimedia research and has led to numerous
emerging topics on human-centric analytics and services, such as user profiling,
social network mining, crowd behavior analysis, and personalized recommendation.
Clustering, as an important tool for mining information groups and in-group shared
characteristics, has been widely investigated for the knowledge discovery and data
mining tasks in social media analytics. Whereas, social media data has numerous
characteristics that raise challenges for traditional clustering techniques, such as the
massive amount, diverse content, heterogeneous media sources, noisy user-generated
content, and the generation in stream manner. This leads to the scenario where the
clustering algorithms used in the literature of social media applications are usu-
ally variants of a few traditional algorithms, such as K-means, non-negative matrix
factorization (NMF), and graph clustering. Developing a fast and robust clustering
algorithm for social media analytics is still an open problem. This chapter will give
a bird’s eye view of clustering in social media analytics, in terms of data charac-
teristics, challenges and issues, and a class of novel approaches based on adaptive
resonance theory (ART).

1.1 Clustering in the Era of Web 2.0

Social networking applications in the era of Web 2.0, such as Twitter, Flickr, and
Facebook, have transformed the World Wide Web into an interactive sharing plat-
form, i.e. the social Web, where users upload, comment, and share media content
within their social circles. Their popularity has led to an explosive growth of multi-
media documents online, together with their associated rich meta-information, such
as category, keywords, user description, comments, and time-location stamps. The
availability of such a massive amount of interconnected heterogeneous social media
© Springer Nature Switzerland AG 2019 3
L. Meng et al., Adaptive Resonance Theory in Social Media
Data Clustering, Advanced Information and Knowledge Processing,
https://doi.org/10.1007/978-3-030-02985-2_1
4 1 Introduction

data, on one hand, facilitates the semantic understanding of web multimedia docu-
ments, and the mining of the associations among heterogeneous data resources. On
the other hand, social users are connected by plentiful types of interactions, which
provide novel ways to analyze and understand user behaviors and social trends in
social networks.
Clustering is a key and commonly used technique for knowledge discovery and
mining from unstructured data resources. Given a collection of data, by representing
the data objects as signal patterns using feature vectors, clustering is a process of
identifying the natural groupings of the data patterns in the feature space according
to their measured similarities, so that the data objects in the same cluster are more
similar to each other than to those in other clusters. Thus, given a social network data
repository with user records in terms of online action trajectories, friendship, images,
blogs, subscriptions, and joint activity groups, clustering techniques could be utilized
to analyze an individual type of data, such as identifying the underlying categories
of web images and discovering the hot topics from recent blogs of social users.
Moreover, the multiple types of data could be treated as a whole to discover groups
of social users that have similar social behaviors which can further benefit various
applications, such as mining characteristics of different groups of users, detecting
certain groups of users, and recommending friends and activity groups to the users
with common interests.
However, contrary to traditional datasets, the social media data have several dis-
tinguishing characteristics. First, the social media data are usually large-scale, which
may contain millions of data objects. Secondly, the social media data typically cover
diverse content across numerous topics. Thirdly, the social media data may involve
data from heterogeneous resources. For example, the social network data of users
may involve relational records, images, and texts. Fourthly, considering social users
are free to upload data as they wish, the social media data, especially the text data,
typically involve a lot of useless or noisy information, so that the obtained feature
vectors of the data objects may be ill-featured noisy patterns. These noisy features
provide spurious relations between data patterns and may result in irregular or even
overlapped shapes of data groups belonging to different classes in the feature space.
These aforementioned characteristics of social media data raise new challenges and
requirements for existing clustering techniques, including scalability for big data, the
ability to automatically recognize the number of clusters given in a dataset, the strate-
gies to effectively integrate the data from heterogeneous resources for clustering, and
the robustness to noisy features.
In addition to the characteristics of static data, the streaming nature of the social
web results in dynamical changes of information in social networks. This stresses
the online learning capability of clustering algorithms for analyzing social media
streams and capturing the evolving characteristics of social networks and social
user information. Beyond that, social media data often cover a diverse range of
topics, while social users, due to their individual life experience, may have own
preferences and opinions in the organization and retrieval of information hidden in the
large amount of social media data. Therefore, exploring feasible ways to incorporate
user preferences, including both explicit user-provided information and implicit user
1.1 Clustering in the Era of Web 2.0 5

interest inferences, into the clustering process as one type of prior knowledge to
achieve personalized results tailored for users is necessary.
Due to the opportunities and challenges for clustering algorithms in modern social
media analytics, the development of new forms of clustering algorithms becomes a
necessity for handling social media data and diverse social media applications, which
can fulfill the following requirements:
• The developed clustering algorithms should require low computer memory and
computational cost to meet the scalability for big data;
• The developed clustering algorithms should have effective methods of identifying
the key features of patterns to alleviate the side-effect of noisy features;
• The developed clustering algorithms should be capable of effectively understand-
ing composite data objects which are represented by multiple types of data from
heterogeneous resources;
• The developed clustering algorithms should be able to automatically identify
the number of clusters among the big social media data instead of using a pre-
determined value to reduce the sensitivity to the input parameter values;
• The developed clustering algorithms should have online learning capability to
process social media data streams and dynamically evolve the cluster structures
revealing underlying information;
• The developed clustering algorithms should be able to incorporate user preferences
to generate personalized cluster structures for different social users.

1.2 Research Issues and Challenges

It is non-trivial to develop clustering algorithms that fulfill the aforementioned


requirements. This section describes in detail the six key challenges for clustering
algorithms to meet the distinctive characteristics of social media data.

1.2.1 Representation of Social Media Data

Due to the diversity of the social Web, social media data may exist in various forms,
but they usually follow four common data types, namely the relational data of social
users, uploaded images, published articles, and descriptive meta-information from
social users. Note that videos are typically processed as a set of key frames/images,
and in practice, the meta-information of videos, such as captions and comments, are
much more effective than the video content in feature representation. The represen-
tation and issues of each type of data are described below:
1. Relational Data: The relational data illustrate the relations or similar behavior
among social users, such as friendship networks and co-subscription networks.
The feature representation of a data object of this type, i.e. a user, is usually by
6 1 Introduction

constructing a feature vector, of which the length equals the number of users
and the elements are valued by the strength of the interaction between the user
and others [23]. For example, given a dataset of the friendship network of N
users, the feature vector of the ith user can be denoted by xi = [xi,1 , . . . , xi,N ],
where xi,n = 1 if the n-th user is a friend of the i-th user and xi,n = 0 otherwise.
Similarly, regarding the co-subscription network, the elements can be valued by
the number of co-subscriptions.
The representation of relational data has the problem of requiring a lot of
computer memory to construct the relational matrix of users when the number of
users is large. Additionally, the high dimensionality may also incur problems for
clustering algorithms in learning the similarities between user patterns because
of noisy features.
2. Images: The visual representation of image content is still a challenge today.
Current techniques for visual feature extraction are usually based on either hand-
crafted or data-driven approaches. The handcrafted features [12, 22] are usually
a concatenation of local and/or global features, such as color histogram, edge
detection, texture orientation and scale-invariant points, or data-driven features;
while the features produced by the data-driven approaches [17] are usually from
deep (convolutional) neural networks trained on large-scale image classification
datasets, such as the ImageNet dataset. Therefore, these features are inadequate
to represent the images at the semantic level, a problem known as semantic gap.
It leads to difficulties when grouping images of the same class that have very dif-
ferent appearances or when distinguishing those of different classes with similar
backgrounds.
3. Articles: The representation issue of text documents has been well studied in lit-
erature. Typically, articles are represented by the “Bag of Words” (BoW) model,
which is a feature vector containing all the keywords in the document collection.
The selection of keywords is usually based on the occurrence frequencies of words
or co-occurrence frequencies of groups of words, and the most commonly used
algorithm to weight the selected keywords is based on the frequencies of the key-
words in and across the documents, known as term frequency-inverse document
frequency (tf-idf). However, the web articles of social users typically have typos,
personalized words, and words that cannot reveal the semantics of the articles.
Therefore, using the BoW model for the representation of articles causes issues
of feature sparsity, high dimensionality, and noisy words.
Recent advances in word embedding (Word2vec) models [14, 19] enable the
mapping of individual words to fix-length feature vectors. It alleviates the prob-
lems of sparsity and high dimensionality, but still offers no solutions to the tradeoff
between information loss and noisy words.
4. Meta-information: In the context of social media data, meta-information usu-
ally refers to the surrounding text of images and articles, such as titles and user
comments, which provides additional knowledge to the data objects from other
perspectives. However, the feature representation of meta-information has two
problems. First, like the short text representation problem [18], meta-information
is usually very short so that the extracted tags cannot be effectively weighted by
1.2 Research Issues and Challenges 7

traditional statistical methods, such as tf-idf. Secondly, meta-information typi-


cally involves several key tags that reveal the characteristics of the data objects
and much more noisy tags which are meaningless or even indicate incorrect
relations between the data objects. Therefore, distinguishing the key tags from
a large number of noisy tags is also a problem for the feature construction of
meta-information, which is also related to the tag ranking problem [20, 21] in the
multimedia domain.

1.2.2 Scalability for Big Data

Social media data usually appear in a large scale. For example, the Google search
engine has indexed billions of web documents, such as web pages and images.
Besides, each search query to the search engine may results in over ten million
results. Therefore, the clustering techniques should be able to deal with a big dataset
in a reasonable running time.
Existing clustering techniques are usually based on K-means clustering, hierarchi-
cal clustering, spectral clustering, probabilistic clustering, density-based clustering
and matrix factorization algorithms. However, most of them incur heavy mathe-
matical computation. For example, hierarchical clustering and spectral clustering
algorithms usually have a cubic time complexity of O(n 3 ), and density-based clus-
tering algorithms usually have a quadratic time complexity of O(n 2 ), where n is
the number of data patterns. Although K-means clustering and matrix factorization
algorithms have a linear time complexity of O(n) with respect to the size of the
dataset, their computational cost also linearly increases with respect to the settings
of the number of clusters and the number of iterations.
Recent studies for clustering large-scale social media data, especially for social
network data [29], explore methods for simplifying the data structure to achieve
approximate results or parallel computation. However, the first approach usually
requires assumptions for the data to reduce weak relations among data patterns, and
the second approach needs one or more high-performance computers. Therefore,
developing efficient clustering algorithms or effective methods to accelerate existing
clustering algorithms is necessary for clustering social media data.

1.2.3 Robustness to Noisy Features

As mentioned in Sect. 1.2.1, social media data usually suffers from representation
issues due to the large amount of useless or noisy features, causing the produced
patterns to have a high dimensionality and irregular shapes of clusters in the high-
dimensional feature space.
8 1 Introduction

Most existing clustering algorithms, as discussed in Sect. 1.2.2, do not consider the
problem of noisy features. As such, they may make incorrect correlation evaluations
of patterns when calculating the similarities between patterns or doing mathematical
mappings to investigate the characteristics of patterns.
Under such situations, the clustering algorithms for social media data are required
to be capable of learning to identify the key features of patterns in order to alleviate
the side-effect of noisy features.

1.2.4 Heterogeneous Information Fusion

The rich, but heterogeneous, social media data provide multiple descriptions of the
data objects. However, a new challenge arises for traditional clustering algorithms
on simultaneously integrating multiple, but different, types of data for clustering.
In recent years, many heterogeneous data co-clustering algorithms [4–6, 11, 30]
have been proposed to extend traditional clustering algorithms so they are capable
of evaluating the similarity of data objects in and across different types of feature
data. However, most of them perform heterogeneous data fusion by simply com-
bining the objective functions of individual type of features. Since the multimodal
features from different sources have their own meanings and levels of feature val-
ues, this approach actually provides different weights for different types of features
when achieving global optimization. Although some of the algorithms consider the
weighting problem of features, they usually use equal or empirical weights for dif-
ferent types of features. Therefore, developing effective weighting algorithms for the
fusion of heterogeneous features remains a challenge.

1.2.5 Sensitivity to Input Parameters

Existing clustering algorithms typically require one or more input parameters from
the user. In most cases, they require the number of clusters in the dataset. Those
pre-determined parameters may significantly affect the performance of clustering
algorithms but usually vary in terms of different datasets, making it difficult for the
users to empirically choose suitable values for them.
Although the parameter selection for clustering algorithms [13, 27, 28], especially
the number of clusters in a dataset, has been studied in a large body of literature,
existing works are usually based on experimental evaluation on the quality of clusters,
such as the intra-cluster and between-cluster distance, generated under different
parameter settings of the same clustering algorithms.
Considering that social media data are typically large-scale and involve a diverse
range of topics, it is not desirable to enumerate the values of input parameters to
identify the fittest ones, which is time consuming and may not be accurate. Therefore,
parameter selection for specific clustering algorithms is still an open problem.
1.2 Research Issues and Challenges 9

1.2.6 Online Learning Capability

The large-scale and high-velocity nature of social media data, especially the streams
of user-generated content, raises the need of “online clustering”, which equips clus-
tering algorithms with online learning capability. It enables clustering algorithms to
perform real-time processing and learning from input data objects one at a time and
evolve the structure of data clusters without re-visiting past data.
This type of clustering algorithms is related to stream clustering [25], which has
attracted attention for over one decade. There are two main branches of research
identified, i.e. incremental learning and online learning. Incremental learning [1–3,
7, 16] aims to enable one- or several-pass processing of the dataset one by one or
in small batches instead of the whole mainly for reducing time and memory cost;
while online learning [10, 25] is exactly the same as the aforementioned online
clustering. However, these algorithms usually are k-means or hierarchical clustering
variants requiring the specification of either the number of clusters or more than two
parameters. As illustrated in Sect. 1.2.5, this affects the robustness of these algorithms
for large-scale and noisy social media data and makes human intervention intractable.
As such, there is a need to explore novel methodologies for online clustering,
which should be able to not only do online learning, but also do online adaptation of
most parameters, making it automatically self-aware of data characteristics.

1.2.7 Incorporation of User Preferences

Clustering is an automated process of discovering groups of patterns purely based


on fixed distance evaluation metrics in the feature space. Therefore, users have no
control over the clustering results. As different users may have different preferences
for organizing the data, the discovered information sometimes may not match the
user’s requirements.
Semi-supervised clustering is an approach that incorporates user-provided infor-
mation as prior knowledge to guide the clustering approach. However, existing algo-
rithms [8, 9, 24] typically require users to specify the relations of pairs of patterns,
such as whether two patterns should or should not be in the same cluster. Those
relations are subsequently used as constraints to enhance the clustering accuracy.
However, such user-provided knowledge is usually very implicit in the resulting
clusters.
Therefore, different methods for receiving and incorporating the user preferences
into clustering algorithms are expected to be exploited to guide the clustering process.
This will not only enhance the clustering performance, but also make the clustering
algorithms capable of discovering interesting clusters for the users.
10 1 Introduction

1.3 Approach and Methodology

To address the concerns discussed in Sect. 1.2 for social media data clustering, this
book presents a class of solutions based on the adaptive resonance theory (ART) and
its natural extension for handling multimodal data, termed fusion adaptive resonance
theory (Fusion ART).
ART [15] is a neural theory on how a human brain captures, recognizes and memo-
rizes information about objects and events that has led to the development of a family
of clustering models. These ART-based clustering algorithms perform unsupervised
learning by modeling clusters as memory prototypes and incrementally encoding
the input patterns one at a time, through a real-time searching and matching mech-
anism. More importantly, they do not require a pre-determined number of clusters.
Instead, a user-input parameter, i.e. the vigilance parameter, is used to control to
which degree an input pattern can be determined similar to a selected cluster. In this
way, the clusters can be automatically identified by incrementally generating new
clusters to encode novel patterns that are deemed dissimilar to existing clusters.
Fusion ART [26] extends ART from a single input feature channel to multiple ones,
and it serves as a general architecture for simultaneous learning from multi-modal
feature mappings. Besides the advantage of fast learning and low computational cost,
this approach recognizes similar clusters to the input pattern according to both the
overall similarity across all of the feature channels and the individual similarity of
each feature channel.
Chosen as the base models for social media analytics, the ART-based clustering
algorithms aim to address the aforementioned challenges of clustering social media
data in the following directions:
• Regarding information representation: as discussed in Sect. 1.2.1, existing
methods usually suffer from problems with high dimensionality and noisy fea-
tures. Especially for meta-information which is essentially short and noisy, there
are still no established statistical methods that can effectively discover the key tags
that reveal the semantics of the meta-information.
ART uses a weight vector to model the characteristics of the patterns in the same
cluster. The values of weight vectors are incrementally updated during clustering
process using a learning function, which suppresses the values of noisy features
while preserving the key features; and the similarity between an input pattern and
a cluster weight vector is obtained by the intersection of their feature distributions.
The developed ART-based algorithms have the advantage of identifying the key
features of the patterns in the same cluster using the learning functions of ART
which alleviates the problem of noisy features.
In Sect. 3.5, an ART-based clustering algorithm, called Probabilistic Fusion ART
(PF-ART), is presented for handling the surrounding short text of web images.
PF-ART uses the tag presence in the surrounding text of an image to construct
the feature vector and incorporates a novel learning function which models the
weight vector of a cluster using the probabilistic distribution of tag occurrences.
As such, the similarity measure for the meta-information becomes a match of key
1.3 Approach and Methodology 11

features between the input patterns and clusters. In this way, PF-ART resolves the
representation problem of meta-information by transforming the task of identifying
the features of meta-information in the feature construction stage to identifying
the semantics of data clusters during the learning stage.
• Regarding the scalability for big data: ART is an incremental clustering algo-
rithm. Thus, the feature vectors of all patterns are not required to be presented
into the computer memory at the same time when processing a very large dataset.
Besides, ART incrementally processes input patterns one at a time by performing
real-time searching and matching of suitable clusters, which ensures its linear time
complexity of O(n). Moreover, ART can converge in a few epochs and may obtain
a reasonable performance even in the first round of presentation. It leads to a small
increase in time cost with respect to the increase in the magnitude of the dataset.
Therefore, the ART-based algorithms will inherit the above advantages of ART
and be able to handle big data.
• Regarding the sensitivity to input parameters: the performance of ART mainly
depends on a single parameter, namely, the vigilance parameter, which controls the
minimum intra-cluster similarity. Determining to which degree the input pattern
can be deemed similar to the clusters, the vigilance parameter is a ratio value, thus
it is easier for users to understand and decide its value than the parameters in other
algorithms, such as the number of clusters and the distances between patterns.
Furthermore, Sect. 3.3 describes three methods for making the vigilance param-
eter self-adapted for individual clusters so that the performance of the ART-based
clustering algorithms are more robust to the input parameters. And in Sect. 6.4.1.3,
an empirical method for choosing a reasonable value fr the vigilance parameter is
experimentally demonstrated. It tunes the value of the vigilance parameter until
the number of small clusters is less than 10%.
• Regarding the fusion of heterogeneous information: Fusion ART provides a
general framework for integrating multimodal features. Specifically, Fusion ART
has multiple input channels and one category space, allowing input patterns to be
represented by multiple feature vectors. As such, it interprets the multi-modal data
co-clustering task as a mapping from the multiple feature spaces to the category
space. Besides, Fusion ART employs a vigilance parameter for each input channel
so that the patterns in the same cluster should be consistently similar to each other
in every feature space.
Section 3.6 illustrates a Generalized Heterogeneous Fusion ART (GHF-ART)
that extends Fusion ART to allow different feature channels to have different
feature representation and learning functions to handle the heterogeneity in mul-
timodal data. More importantly, by incorporating an adaptive function to adjust
the weights across the feature channels in the choice function, GHF-ART offers
an effective approach to unify and synchronize multiple types of features for sim-
ilarity measure.
• Regarding the robustness to noisy features: the learning function of ART adapts
a cluster’s weight vector during the learning process by incrementally decreasing
12 1 Introduction

the weight values when the accepted input patterns have lower values at the corre-
sponding features. In this way, the key features of this cluster can be identified by
suppressing the inconsistent features while preserving the key and consistent ones.
It also makes the matching between input patterns and clusters essentially measure
the matching of the shared key features. Also, with a reasonable value for the vigi-
lance parameter, ART will generate small clusters to encode the ill-featured noisy
patterns that are isolated in the feature space. Therefore, the well-formed clusters
will not be affected by the noisy patterns. By preserving those mechanisms, the
developed ART-based clustering algorithms also have a strong immunity to noisy
features.
• Regarding the online learning capability: ART favors the clustering nature of
incremental learning, where data objects are processed one at a time. The only
obstacle hindering ART from online learning is the requirement of the maximum
and minimum values of each feature for input feature normalization. Section 3.7
presents an online unsupervised learning algorithm based on GHF-ART, named
Online Multimodal Co-indexing Adaptive Resonance Theory (OMC-ART), which
allows online adaptation of the learned cluster patterns and data objects therein
to exactly what they should be when an input pattern introduces changes in the
bounding feature values. Moreover, it does not incur an increase in the overall
time complexity of GHF-ART. OMC-ART has been applied to the indexing and
retrieval of e-commerce products requiring frequent updates and has shown its
potential for building flexible multimodal search engines for the products by using
either images, keywords or a combination of both (see Chap. 7 for details).
• Regarding the incorporation of user preferences: PF-ART, as described in
Sect. 3.5, extends ART to receive three forms of user preferences:
1. Users are allowed to figure out groups of data objects belonging to the same
class. Considering the incremental clustering manner of ART, PF-ART is able
to create a set of pre-defined clusters in the category space for modeling each
group of patterns before clustering. This method can be viewed as a partitioning
of the category space where the interesting regions to the users are discovered.
Those clusters will be incrementally expanded and generalized by encoding
the subsequent input patterns. In this way, users are likely to obtain interesting
groups of data objects from the clustering results.
2. Users are allowed to provide additional information, such as short sentences
and tags, to describe the data objects. Those user preferences can be modeled
as a feature vector for describing the data patterns, which can be received by
an additional feature channel in PF-ART.
3. PF-ART allows the users to tune the vigilance parameter to produce personal-
ized cluster structures of datasets. As PF-ART utilizes the vigilance parameter
to control the intra-cluster similarity of patterns, a larger value for the vigilance
parameter results in a generation of clusters with more specific semantics.
1.4 Outline of the Book 13

1.4 Outline of the Book

This book has two parts and eight chapters:


• Part I includes this chapter and Chaps. 2, 3, where
– This chapter introduces backgrounds on social media analytics and gives a bird’s
eye view of the roles and challenges of clustering in social media analytics and
the solutions discussed in this book which is based on adaptive resonance theory
(ART);
– Chapter 2 presents a review on the main types of clustering algorithms and their
extensions for social media analytics and the key applications of social media
data clustering;
– Chapter 3 offers initiatives on the learning mechanism of ART in clustering and
describes the class of ART-based algorithms for handling social media clustering
challenges, as listed in Sect. 1.2.
• Part II includes Chaps. 4–8, where Chap. 8 concludes the book, and Chaps. 4–7
provide real-world case studies on using clustering for social media analytical
applications, including
1. Personalized clustering of short text for social image organization;
2. Heterogeneous data co-clustering for composite multimedia data;
3. Heterogeneous social network clustering for social user community discovery
and user interest mining;
4. Multimodal online clustering for multimedia streaming data indexing and
searching.

References

1. Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012)


Streamkm++: a clustering algorithm for data streams. J Exp Algorithmics (JEA) 17. No 2.4
2. Ailon N, Jaiswal R, Monteleoni C (2009) Streaming k-means approximation. In: Advances in
neural information processing systems, pp 10–18
3. Barbakh W, Fyfe C (2008) Online clustering algorithms. Int J Neural Syst 18(3):185–194
4. Bekkerman R, Jeon J (2007) Multi-modal clustering for multimedia collections. In: CVPR, pp
1–8
5. Bickel S, Scheffer T (2004) Multi-view clustering. In: ICDM, pp 19–26
6. Bisson G, Grimal C (2012) Co-clustering of multi-view datasets: a parallelizable approach. In:
ICDM, pp 828–833
7. Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering
problems. In: Proceedings of the annual ACM symposium on theory of computing, pp 30–39
8. Chen Y, Dong M, Wan W (2007) Image co-clustering with multi-modality features and user
feedbacks. In: MM, pp 689–692
9. Chen Y, Rege M, Dong M, Hua J (2007) Incorporating user provided constraints into document
clustering. In: ICDM, pp 103–112
14 1 Introduction

10. Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the
ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142
11. Chen Y, Wang L, Dong M (2010) Non-negative matrix factorization for semisupervised het-
erogeneous data coclustering. TKDE 22(10):1459–1474
12. Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image
database from national university of Singapore. In: CIVR, pp 1–9
13. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters
in large spatial databases with noise. In: KDD, pp 226–231
14. Goldberg Y, Levy O (2014) Word2vec explained: deriving Mikolov et al’s negative-sampling
word-embedding method. arXiv:1402.3722
15. Grossberg S (1980) How does a brain build a cognitive code. Psychol Rev 87(1):1–51
16. Guha S, Meyerson A, Mishra N, Motwani R, O’Callaghan L (2003) Clustering data streams:
theory and practice. IEEE Trans Knowl Data Eng 15(3):515–528
17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Pro-
ceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
18. Hu X, Sun N, Zhang C, Chua TS (2009) Exploiting internal and external semantics for the
clustering of short texts using world knowledge. In: Proceedings of ACM conference on infor-
mation and knowledge management, pp 919–928
19. Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In:
Advances in neural information processing systems, pp 2177–2185
20. Li X, Snoek CGM, Worring M (2008) Learning tag relevance by neighbor voting for social
image retrieval. In: Proceedings of ACM multimedia, pp 180–187
21. Liu D, Hua X, Yang L, Wang M, Zhang H (2009) Tag ranking. In: Proceedings of international
conference on world wide web, pp 351–360
22. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis
60(2):91–110
23. Meng L, Tan AH (2014) Community discovery in social networks via heterogeneous link
association and fusion. In: SIAM international conference on data mining (SDM), pp 803–811
24. Shi X, Fan W, Yu PS (2010) Efficient semi-supervised spectral co-clustering with constraints.
In: ICDM, pp 532–541
25. Silva JA, Faria ER, Barros RC, Hruschka ER, de Carvalho AC, Gama J (2013) Data stream
clustering: a survey. ACM Comput Surv (CSUR) 46(1). No 13
26. Tan AH, Carpenter GA, Grossberg S (2007) Intelligence through interaction: towards a unified
theory for learning. LNCS, vol 4491. Springer, Berlin, pp 1094–1103
27. Wang L, Leckie C, Ramamohanarao K, Bezdek J (2012) Automatically determining the number
of clusters in unlabeled data sets. IEEE Trans Knowl Data Eng 21(3):335–350
28. Wang W, Zhang Y (2007) On fuzzy cluster validity indices. Fuzzy Sets Syst 158(19):2095–2117
29. Whang JJ, Sui X, Sun Y, Dhillon IS (2012) Scalable and memory-efficient clustering of large-
scale social networks. In: ICDM, pp 705–714
30. Zhou D, Burges CJC (2007) Spectral clustering and transductive learning with multiple views.
In: ICML, pp 1159–1166
Chapter 2
Clustering and Its Extensions
in the Social Media Domain

Abstract This chapter summarizes existing clustering and related approaches for
the identified challenges as described in Sect. 1.2 and presents the key branches of
social media mining applications where clustering holds a potential. Specifically, sev-
eral important types of clustering algorithms are first illustrated, including clustering,
semi-supervised clustering, heterogeneous data co-clustering, and online clustering.
Subsequently, Sect. 2.5 presents a review on existing techniques that help decide
the value of the predefined number of clusters (required by most clustering algo-
rithms) automatically and highlights the clustering algorithms that do not require
such a parameter. It better illustrates the challenge of input parameter sensitivity of
clustering algorithms when applied to large and complex social media data. Further-
more, in Sect. 2.6, a survey on several main applications of clustering algorithms to
social media mining tasks is offered, including web image organization, multi-modal
information fusion, user community detection, user sentiment analysis, social event
detection, community question answering, social media data indexing and retrieval,
and recommender systems in social networks.

2.1 Clustering

Clustering, aimed at identifying natural groupings of a dataset, is a commonly used


technique for statistical data analysis in many fields, such as machine learning, pattern
recognition, image and text analysis, information retrieval, and social network analy-
sis. This section presents a literature review on the important clustering techniques for
multimedia data analysis in terms of different theoretical basis. To gain a systematical
understanding on the clustering taxonomy, please look at past efforts [168, 169].

2.1.1 K-Means Clustering

K-means clustering [109] is a centroid-based partitional algorithm, which partitions


the data objects, represented by feature vectors, into k clusters. It iteratively seeks k
cluster centers in order to minimize the intra-cluster squared error, defined as
© Springer Nature Switzerland AG 2019 15
L. Meng et al., Adaptive Resonance Theory in Social Media
Data Clustering, Advanced Information and Knowledge Processing,
https://doi.org/10.1007/978-3-030-02985-2_2
Another random document with
no related content on Scribd:
A Long Journey

At any rate the Gregorian chant flourished and was so loved that
Benedict Biscop, and other monks interested in music, came from
far-off England to learn the chant invented by St. Gregory. A long
journey! In 675 Biscop sent to Rome for singers and built
monasteries very close to a pagan temple, where the Anglo-Saxons
still worshipped the Roman Sun god, Apollo, also god of music.
These he filled with beautiful relics, paintings and stained glass
windows, Bibles and service books illuminated in gold and color,
which he brought from Rome.
Bringing things from Rome may sound easy to you, but fancy the
travel and inconvenience when there were no steamships, no
railroads, no aeroplanes, but only Roman roads, which however
marvelous, were long and wearisome by foot or by horse, or mule
and rude wagons. This shows how much the people of Britain desired
music and beauty in their church services.
Venerable Bede

About this time, there lived a man in England so loved and


respected that he was called the Venerable Bede. Although music had
no such variety, melody and richness as today, just see what the
Venerable Bede says about it: “Music is the most worthy, courteous,
pleasant, joyous and lovely of all knowledge; it makes a man
gentlemanly in his demeanor, pleasant, courteous, joyous, lovely, for
it acts upon his feelings. Music encourages us to bear the heaviest
afflictions, administers consolation in every difficulty, refreshes the
broken spirit, removes headache and cures crossness and
melancholy.”
Isn’t it remarkable for a man to have said this so long ago, when
scientists, today, have just begun to think that music may have a
power of healing ills of the mind and of the body! Truly—“there is
nothing new under the sun!”
So Bede used the plain chant of Gregory and through his influence,
spread this dignified music throughout England, and wherever a
monastery was founded, a music school was started.
The Venerable Bede writes that Ethelbert of Kent, King of Britain,
was a worshipper of Odin and Thor, Norse gods, but he married a
French Princess who was a Christian. One day, writes the Venerable
Bede, forty monks led in solemn procession by St. Augustine, passed
before the king singing a chant. After hearing this marvelous hymn,
he became a Christian and gave permission to the English to become
worshippers of Christ instead of Norse and Druid gods. This hymn
which converted Ethelbert in 597 A.D. was sung thirteen hundred
years later (1897) in the same place, Canterbury, by another group of
Benedictine monks!
At first the songs were sung unaccompanied, but later as in the
time of David, the Church allowed instruments. The lyre and the
harp were used first but the cymbals and the dulcimer, somewhat
like our zither, were considered too noisy.
The Venerable Bede called music made by instruments artificial
music, and that of the human voice, natural music. Whether at that
time the viol, the drum, the organ or the psaltery (an instrument like
the dulcimer) were used in the Church, is not known positively.
After Bede’s death, Alcuin, a monk and musician, continued his
work. He was appointed by Charlemagne, Emperor of France, to
teach music in the schools of Germany and France to spread the use
of the Gregorian chant.
A Curious Music System

In 900 A.D. an important thing happened, by which the reading


and learning of music was much simplified. A red line was drawn
straight across the page and this line represented “F” the tone on the
fourth line of the bass staff. The neumes written on this red line were
“F” and the others above or below, were of higher or lower pitch. This
worked so well, that they placed a yellow line above the red line and
this they called “C.” These two lines were the beginnings of our five
line staff, but much happened between the two-line days and the five.
At this time people did not sing in parts, known as they are to us—
soprano, alto, tenor, bass, but everybody sang the same tune, that is,
sang in unison, and when men and women or men and boys sang
together, the men’s voices sounded an octave lower than the
women’s and the boy’s. Some voices have naturally a high range and
others low, and no doubt in these plain chant melodies the singers
who could not reach all the tones comfortably, dropped
unconsciously to a lower pitch, and in that way, made a second part.
Soon the composers made this melody in the medium range of the
voice a part of their pieces instead of trusting the singers to make it
up as they went along. The principal tune sung or carried by most of
the singers was given the name tenor (from the Latin teneo, to hold
or carry). We use the same word to indicate the man’s voice of high
range.
Hucbald and Organum

Hucbald (840–930), a Flemish monk, first wrote a second part,


always a fifth above or a fourth below the tenor or “subject.” (The
Latin name for the subject is cantus firmus—fixed song.) Hucbald
probably used the fifth and fourth because they were perfect
intervals, and all others except the octave, were imperfect. There
were often four parts including the cantus firmus, for two parts were
doubled. This succession of fourths and fifths sounds very crude and
ugly (just try the example), but these people of the Middle Ages must
have liked it, for it lasted several centuries and was an attempt at
making chords. This music was called organum or diaphony (dia-
two, phony-sound: two sounds). As early as 1100, singers tried out
new effects with the added parts and introduced a few imperfect
intervals, thirds and sixths, and tried singing occasionally in contrary
motion to the subject,—this was called discant from a Greek word
meaning discord. Maybe at first it sounded discordant but soon it
came to mean any part outside of the cantus firmus or subject. (See
musical illustrations.)

Organum (IXth and Xth Centuries)

Diaphony
Discant (XIIth Century)

Example of Organum, Diaphony, Discant

There was also a kind of diaphony in which a third voice was


written as a bass, a fifth below the cantus firmus, but it was actually
sung an octave higher than it was written and sounded much better
that way. As it was not a bass at all it received the name of false bass
or faux bourdon. This was the beginning of chords such as we use.
So, Hucbald started the science of harmony,—the study of chords.
Hucbald called this ars organum—the art of organating or
organizing.
Hucbald also invented a system of writing music on a staff. It was
not a staff such as we use today for he wrote in the spaces the initials
T and S. T meant that the singer was to sing a whole tone, S, a
semitone or half-step. He used a six line staff and wrote words in
script instead of notes like this:
Guido d’Arezzo and His Additions to Music

The next great name in music history is Guido d’Arezzo, a


Benedictine monk (995–1050), famous for his valuable additions to
music.
He invented the four-line staff, using both lines and spaces and
giving a definite place on the staff to each sound:

Yellow line C————————————


Black line .............
Red line F————————————
Black line .............

In the Middle Ages, the men did most of the singing so the music
was written in a range to suit their voices. C is middle C, and F the
bass clef.
All music had to be written by hand and the monks made
wonderful parchment copies of works composed for the church
services. They soon grew careless about the yellow lines and red
lines, so Guido placed the letters C and F at the beginning of the lines
instead of using the colored lines.
Sometimes there were three lines to a staff, sometimes four, five,
and even eleven! The use of clefs showing which line was C or F,
made reading of music much easier. At the end of the 16th century
the question of the number of lines to the staff was definitely
decided, then they used four lines for the plain chant and five for all
secular music. By calling the fifth line of the eleven, middle C, two
staffs of five lines resulted—the grand staff of today.
Here is a table to show you how clefs grew:
Hucbald built his scales in groups of four tones like the Greek
tetrachords but Guido extended this tetrachord to a hexachord or
six-toned scale, and by overlapping the hexachords, he built a series
of scales to which he gave the name, gamut, because it started on the
G which is the first note of our grand staff (lowest line, bass clef) and
the Greek word for G is “Gamma.”

In the lowest hexachord, the B is natural, in the second hexachord


there is no B and in the third hexachord, the B is flattened. Our sign
(♭) for flat comes from the fact that this B was called a round B and
the sign ( ♮ ) for natural was called a square B. The sharp ( ♯ ) came
from the natural and both meant at first raising the tone a half-step.
Guido once heard the monks in the monastery of Arezzo singing a
hymn in honor of St. John the Baptist. He noticed that each line of
the Latin poem began on ascending notes of the scale,—the first line
on C, the second on D, and so on up to the sixth on A. It gave him the
idea to call each degree of the hexachord by the first syllable of the
line of the Latin hymn, thus:
Utqueant laxis,
Resonare fibris,
Mira gestorum,
Famuli tuorum,
Solve polluti,
Labia reatum.

Hymn to St. John the Baptist

Here is a translation:
Grant that the unworthy lips of Thy servant
May be gifted with due harmony,
Let the tones of my voice
Sing the praises of Thy wonders.

We still call our scale degrees ut (frequently changed to do), re, mi,
fa, sol, la. The French today use these syllables instead of the letters
of the alphabet, and Guido is known as the man who originated this
solmization (the word taken from the syllables sol and mi).
Where did the syllable si, the seventh degree of the scale, come
from? This hymn was written to St. John and in Latin his name is
Sancte Ioannes, the initials of which form the syllable si which came
into use long after Guido’s time.
This system was very difficult for the singers to learn as it was
quite new to them, so Guido used his hand as a guide to the singers.
Each joint represented a different syllable and tone, and a new scale
began on every fourth tone. Look at the Guidonian hand on the next
page.
Guido was so great a teacher and musician that he was given credit
for inventing much that already existed. He gathered all the
knowledge he could find into a book, that was sent to the
monasteries and music schools. He put in much that never before
had been written down, explained many things that had never been
clear, and added much that was new and useful.
Sometimes his name was written Gui or Guion. When he lived
people had no last names but were called by the name of their native
towns; as Guido was born in Arezzo, a town of Tuscany, he was called
Guido d’Arezzo; Leonardo da Vinci, the great painter, was born in
the village of Vinci; and the great Italian composer Pierluigi da
Palestrina came from Palestrina.
Guido’s work was considered revolutionary and not in accord with
the old ways which the church fathers reverenced. Because of plots
against him, he was cast into prison. But the Pope, realizing his
greatness and value, saved him. The inventors of new ideas always
suffer!
Mensural Music or Timed Music

Before Guido invented it, there had been no system of counting


time.
If you are studying music, you know all about time signatures and
what metre a piece is in, from the ¾, ⁶⁄₈, ⁹⁄₈, ²⁄₄, ⁴⁄₄ or sign Ⅽ at the
beginning of the composition, but you probably do not know how or
when these signs came into use. In the Gregorian plain song and in
Organum, there was practically no variety of rhythm and no need for
showing time or marking off the music into measures. The accents
fell quite naturally according to the words that were sung, much as
you would recite poetry. But as music grew up and became more
difficult, it was necessary for a chorus singing in three or four
different parts, to sing in time as well as in tune, in order, at least, to
start and finish together!
The first metre that was used was triple (three beats to the
measure). It was called perfect and was indicated by a perfect circle,

, the symbol of the Holy Trinity and of perfection. Duple


metre (two beats to the measure) was imperfect and was indicated by

an incomplete circle, . Our sign for common time (four beats to

the measure), comes from this incomplete circle. ⁹⁄₈ was

written ; ¾ was ; ⁶⁄₈ was ; and ⁴⁄₄ was .


A monk named Franco, from Cologne, on the Rhine, early in the
twelfth century, invented these time signatures, and notes which in
themselves indicated different time values. Hucbald’s neumes were
no longer suited to the new music, and besides time signatures it
became necessary to have a music language showing very clearly and
definitely the composer’s rhythm.
Franco used four kinds of notes. Here they are translated into the
time values of today.
Organs

In the 10th century, organs came into use in the churches, but they
were ungainly and crude, sounding only a few tones, and were
probably only used to keep the singers on pitch. The organ had been
invented long before this, and had been used in Greece and Egypt. It
was built on the principle of Pan’s Pipes and was very simple. There
were many portable organs, called portatives, small enough to be
carried about.
One organ (not a portative!) at Winchester, England, had four
hundred pipes and twenty-six pairs of bellows. It took seventy men
to pump air into it and two men to play it by pounding on a key with
their fists or elbows. The tone was so loud that it could be heard all
over the town. Fancy that!
During these centuries, music was growing slowly but surely. Out
of organum and discant and faux bourdon, arose a style called
counterpoint, in which three, four or more melodies were sung at the
same time. The writing of counterpoint, or line over line, is like a
basket weave for the different melodies weave in and out like pieces
of willow or raffia forming the basket. Later will come the chorale,
written in chords or up and down music like a colonnade or series of
columns. Keep this picture in mind. (St. Nicolas Tune, Chapter XI.)
The word point means note so counterpoint means note against
note. The word was first applied in the 13th century to very crude
and discordant part-writing. But, little by little the monks learned
how to combine melodies beautifully and harmoniously and we still
use many of their rules.
Gradually great schools of church music flourished in France,
Germany, Spain, England, Italy and the Netherlands in the 14th, 15th
and 16th centuries.
Bit by bit this vast musical structure was built. It did not grow
quickly; each new idea took centuries to become a part of music, and
as often the idea was not good, it took a long time to replace it.
CHAPTER VIII
Troubadours and Minnesingers Brought Music to Kings and People

Except for the first few chapters in this book, we have told you of
music made by men who wanted to improve it. You have seen how
the fathers of the Church first reformed music, and gave it a
shorthand called neumes; before that, the music laws of the
Egyptians, the scales and modes of the Arab, the Greek scales which
the churchmen used in the Ambrosian and the Gregorian modes.
Then came the two-lined staff, and the beginnings of mensural or
measured music by which they kept time. Then you saw how two
melodies were fitted together and how they grew into four parts. All
this we might call “on purpose” music. At the same time, in all the
world, in every country, there was Song ... and never have the world
and the common people (called so because they are neither of the
nobility nor of the church) been without folk song which has come
from the folks of the world, the farmers, the weavers, and the
laborers.
The best of these songs have what the great composers try to put
into their music—a feeling of fresh free melody, design, balance, and
climax, but more of this in the chapter on Folk Song.
This chapter is to be about Troubadours, Trouvères, and
Minnesingers, who have left over two thousand songs. In most of
these, they made up both words and music, but sometimes they used
new words made up for folk tunes that everyone knew, or for
melodies from the plain-chants which they had heard in church;
sometimes they used the same melody for several different poems,
and often they set the same words to several melodies. Many of these
troubadour songs and minnelieder became the people’s own folk
songs.
But now you must hear of the folk who lived hundreds of years
before these poet singers. Unknowingly, out of the heart and soul
and soil of their native lands, they made songs and sang poetry and
played sometimes other peoples’ song, scattering their own wherever
they went.
From these traveling singers and players, in all countries, came the
professional musicians who were minstrels, bards, troubadours, etc.,
according to when, where and how they lived.
The Why of the Minstrel

The people sang and played not only because they wanted to, or
because they loved it, but because they were the newspaper and the
radio of their time, singing the news and doings of the day. These
minstrels who traveled from place to place “broadcasted” the events.
No music was written down and no words were fastened by writing
to any special piece. The singer would learn a tune and when he sang
a long story (an epic) he would repeat the tune many times so it was
necessary to find a pleasing melody, or singers would not have been
very welcome in the courts and market places. These musical news
columns entertained the people who had few amusements. The
wandering minstrels with their harps or crwth (Welsh harps), or
whatever instrument they might have used in their particular
country, were welcomed with open arms and hearts.
This sounds as if these singers and players traveled, and indeed
they did! They sprang up from all parts of Europe and had different
names in different places. There were bards from Britain and
Ireland, skalds from the Norse lands, minstrels from “Merrie
England,” troubadours from the south of France, trouvères from the
north of France, jongleurs from both north and south who danced
and juggled for the joy of all who saw them, and minnesingers and
meistersingers in Germany.
Druids and Bards

Centuries before this, Homer the great Greek poet was called the
Blind Bard and he chanted his poems, the Iliad and the Odyssey, to
the accompaniment of the lyre, the favorite instrument of the Greeks.
But when we speak of bards in this chapter we mean the poets and
musicians of ancient Britain, when that island was inhabited and
ruled by the Druids, 1000 B.C. We do not know when the bards first
began to make music or when they were first called bards, but it is
certain that for many centuries before the Christian era, these rude,
barbarous people of the countries we know as Wales, England,
Ireland and Scotland, had many songs, dances and musical
instruments.
Look at a map of France, and see how much like a teapot it is
shaped. The western part, the spout, is Brittany! As its name shows,
this part of France was inhabited by the same race of people as were
in Britain, they spoke the same language, had the same religion and
made the same music. These people were Celts and their priests were
called Druids. Much we said about primitive people is true of these
early Britons. They expressed their feelings, and tried to protect
themselves from Nature and human foes by means of religious rites
and ceremonies in which music and dancing played the leading part.
They had no churches, but held religious services in the open
under the oak trees. They piled boulders on top of each other to form
altars, or built large circular enclosures of huge flat rocks, inside of
which they gathered for worship, or to assist at some ceremonial in
which sacrifices of animals and occasionally of human beings were
made. These human sacrifices occurred once a year at the Spring
Festival which was celebrated in much the same fashion as in Greece.
These masses of stone are found not only in the British Isles, the
most famous of which is Stonehenge (which was recently bought by
an American), but there are also many of these so-called cromlechs
and menhirs in Brittany.
It is curious how often men and women do the same things at
times and places so completely separated that they could not have
been influenced by each other, but did what was natural for them. It
seems that between the state of being primitive or savage and of
being cultured, mankind must pass through certain states of mind
and certain bodily actions common to all men. In tracing the growth
of any habits and actions of people,—in government, religion,
amusements, art, music, manners and customs, and language, we
find the same customs constantly repeated among different races. If
you remember this point, you will be interested to watch, in this
book, the difference between these experiences common to all
mankind and those which later on, were caused by the influence that
one race had on another through meeting, through conquest and
through neighborly contact.
The bards belonged to the priesthood and were Druids. They sang
in verse the brave deeds of their countrymen, praises of the gods and
heroes, and legends of war and adventure, accompanying themselves
on primitive harps, or on an instrument something like the violin
without a neck, called a crwth. They wore long robes and when they
were acting as priests, these were covered with white surplices
somewhat like the gowns of our own clergy. From a bit of
information handed down by the bards, we learn that in Ireland, the
graduate bard wore six colors in his robes, said to be the origin of the
plaid of the Scotch Highlanders; the king wore seven colors; lords
and ladies, five; governors of fortresses, four; officers and gentlemen,
three; soldiers, two, and the people were allowed to wear only one.
Even their dress seemed important and marked the rank!
There were three kinds of bards: priestly bards who took part in
the religious rituals and were also the historians, domestic bards who
made music in honor of their masters, and heraldic bards whose
duties were to arouse patriotism through songs in praise of their
national heroes. They had to pass examinations to become bards,
and the lower ranks were tested for knowledge and ability before
being promoted to the higher ranks. Recently there has been a
revival in Wales of the Eisteddfod, or song contests of the Druids.
“Minstrelsy,” or singing to the crwth or harp, lived on long after
Druidism had been replaced by the Christian faith. Did you ever
wonder where the custom came from of mistletoe at Christmas time?
Or of dancing around a Maypole? Or building bonfires for May-day
and St. John’s eve? Celebrating All-Hallowe’en with pumpkins and
black cats? And of having Christmas trees? Well, these customs are
all relics of Druidism of 2000 or more years ago.
Skalds

In the land of the fierce Vikings or Norsemen, who inhabited


Scandinavia, Iceland and Finland before and during the Middle Ages,
there were bards called Skalds or Sagamen. They recited and sang
stories telling of their Norse gods, goddesses and heroes, Woden,
Thor, Odin, Freya, Brynnhild, and of the abode of the gods, Walhalla.
These ballads formed the national epics called sagas and eddas,
from which Richard Wagner drew the story for his immortal music
dramas, the Nibelungenlied.
Odin, who was considered a Norse god, probably was a Saxon
prince who lived in the 3rd century, A.D. He revived the Norse
mythology and rites with the aid of minstrels, seers, and priests. His
teachings lasted until the reign of Charlemagne, a devout Christian,
who put an end to pagan rites.
In the 5th century came the Saxons, Hengist and Horsa,
descendants of Odin, and much of Britain fell under their rule; with
them, came the skalds whose duty it was to celebrate the deeds of
their lords. They appeared at the great state banquets, and also on
the battle fields, encouraging the warriors with their songs of
heroism, and comforting the wounded soldiers.
When the Danes, the Angles and the Jutes came to Britain in this
same century, the country was called England or Angle-land.
Harpers and gleemen followed in the footsteps of the Scandinavian
skalds. These musician-singers went as honored guests from court to
court, and received valuable presents. A popular gleeman was given
the title of poet-laureate, and crowned with a laurel wreath.
The songs were taught orally and learned by heart, as there was no
notation at this early date (500 A.D.). They accompanied themselves
on small harps which could be carried easily. The harp was handed
around the banquet table so that each guest in turn might sing a song
as his share of the entertainment. Singing and composing poetry
were a necessary part of a gentleman’s education.
The “Venerable” Bede (Chapter VII) wrote that “Cædmon the poet
(600 A.D.) never could compose any trivial or vain songs, but only
such as belonged to a serious and sacred vein of thought ... he was

You might also like