You are on page 1of 8

Performance Analysis of Transformation Methods in

Multi – label Classification

P.K.A. Chitra1, S. Appavu Alias Balamurugan2

pkachitra @,2app_s @
Research Scholar, 2Professor and Research Coordinator
Anna University,
Department of Electronics and Communication Engineering,
K.L.N. College of Information Technology,
Sivagangai,Tamil Nadu, India

Abstract. The association between the instance query example and the class labels are mutual-
ly exclusive in traditional single label examples. But in real life applications like musical cate-
gorization, functional genomics, text and document categorization, one instance query example
may belong to a subset of class labels i.e mutually inclusive. Because of the highly correlated
label structure, the traditional single label classification algorithms won’t be sufficient. We
need effective algorithms to work with multiple labels. The multi label classification algorithms
are classified into two ways: (i) transform the multi label problem in to single label binary
problem and (ii) make the existing single label algorithms to cope with multi label problems. In
this paper we present theoretical concepts behind multi – label classification and also we did a
comparative analysis of transformation methods with two tools MEKA and MULAN over
different application domains. 6 Example based, 6 Label based, and 4 Ranking based measures
are used to evaluate the efficacy of the different transformation methods.

Keywords: Multi label – Classification, Problem Transformation method, Binary Relevance,

Label Power Set, Pruned Set, Classifier Chain, RAkEL

1 Introduction
Traditional Single – labeled supervised classification maps an example exactly to
single output label. Let Q be the set of examples, Q = { q1, q2, q3, ….., qn } and ℒ be the
set of labels, ℒ = { ℓ1 , ℓ2 , ℓ3 … . . , ℓ𝑚 }, the single label classification (SLC) is defined
SLC (ħ) : qi ↦ ℓ𝑖 where ℓ =1 (1)
But in real life one example may be associated with the many labels. For ex-
ample in scene classification an instance may belong to beach, tree, people or city etc.
In music categorization, an instance may belong to different emotions like happy, sad,
pleased etc. Like that in medical diagnosis, a patient may suffer from diabetes and
cancer. So every instance may belong to more than one label, and to satisfy this need
the multi label classification assigns single query example to many labels i.e one idea
to multiple concepts.
The Multi - Label Classification (MLC) [8] is a generalization of supervised
single label classification task where each data instance may be associated with a set
of class labels as opposed to one label. But the each label contains only binary values.
The task of multi label classification (MLC) is to map an example instance q i {Q} to
an label set ℓ € {£}
ħ: χ → 2ℒ (2)
MLC (ħ) : qi ↦ 𝑚 𝑖=1 ℓ where ℓ ≥ 2

Nowadays the number of applications involving data with multiple target labels
gets increased. So, to learn this, the multi label classification has received increased
attention in recent years. Multi label classification is categorized into two ways: (i)
Problem Transformation Method and (ii) Algorithm Adaptation Method. The problem
Transformation method transforms the multi labeled data into single labeled data and
then the traditional single label classification methods are applied over the trans-
formed single labeled data. Here the transformed single labeled data are of binary
classifiers, so traditional single label classifiers are enough for us to make the classifi-
er model. The problem here is, during the multi to single label conversation lot of
information may lost. Even the problem transformation methods are fast in their na-
ture because of the above said nature, the problem transformation methods are less
efficient. In other hand, the algorithm adaptation methods, accepts the single labeled
classifiers as they are and change them to adopt for multi labeled data. So the algo-
rithm adaptation methods are very effective since there is no loss of information in the
data. But in this paper, we are going to make an experimental evaluation of 5 problem
transformation methods over data sets using the machine learning tools for multi la-
beled data.

1.1 Objective of multi – label learning

Supervised learning algorithm first learns the association between the examples
and the related labels then based on the gathered knowledge it builds a model to pre-
dict the unseen examples. The objective of the multi label learning is to
(i) Predict the label set of unseen examples,
(ii) Rank all labels according to relevance with unseen examples, through ana-
lyzing training examples with known label sets.

1.2 Challenges
(i) Loss of label correlation i.e Discovering and modeling label dependencies
(ii) Output Label Sparsity i.e Output space is 2ℒ instead of ℒ
(iii) Insufficient measurements to include label and feature dimensions.

1.3 Organization of the Paper

In this paper section 1 describes the introduction and the need for multi – label
classification, the objective and the challenges faced by the recent multi – label data
researcher. Section 2 give the overall view of the Problem Transformation Methods
and 5 different PT methods like Binary relevance (BR), Label Power Set or Label
Combination (LP / LC), Pruned Set (PS), Classifier Chain (CC), Random k – Label
(RAkEL). Section 3 describes the different base classifiers chosen. Section 4 gives the
details of different application domains and the tolls needed to perform experimenta-
tion. Section 5 describes the overall evaluation metrics used for multi – label classifi-
cation. Section 6 presents the experimentation results and at final Section 7 ends with
the conclusion.

2 Problem Transformation Methods

It transforms the multi label learning task into several single-label learning tasks
i.e it decomposes the multi-label problem into several independent single label classi-
fication problems. Then the single label classification algorithm is used as the base
classifier to solve the converted single label classification problem to predict the out-
put model. Figure 1 explains the overall Problem Transformation method. The prob-
lem transformation methods chosen for the experimentation here are: Binary Relev-
ance (BR), Label Power Set (LP), Pruned Sets (PS), Classifier Chains (CC), and
RAndom k-labELsets (RAkEL).


Multi – Single Output

label Prob- – label Model
lem Problem

Single –
label Base

Fig 1 – Overview of PT Method

2.1 Binary Relevance (BR)

It transforms a multi label problem into n binary problems and each traditional
single label binary base classifier is responsible for predicting the association of sin-
gle label. But it do not model label correlations explicitly [1], instead it constructs a
decision boundary for each label individually. BR is one – vs – all paradigm. It learns
one classifier for each label, using all the examples and the respective label as positive
and other remaining examples as negative. Then each binary classifier predicts
whether its label is relevant for that example or not. In that way BR makes a set of
relevant labels as the final prediction.

2.2 Label Power set (LP / LC)

Each unique label set taken into account to create one output label for each query
i.e single class containing all the labels for query in concern [3]. The Label Power Set
creates a new set of class labels as the output. Suppose if a query instanced is asso-
ciated with C1, C2, C4, then the newly transformed single label output will be C1, 2, 4.
The new transformed class is a single label problem and any single label base classifi-
er can be used on that. It can make the prediction for new instance if it is already
present in the training data set otherwise it cannot i.e it over fits the training data set.
For a new instance the LP method assigns the most probable class.

2.3 Pruned Set (PS)

This is an extension of Label Power Set. It addresses the label sparsity problem of
Label Power Set by pruning the less frequent labels from the dataset [4]. It takes la-
bels correlations into account. It identifies the less important label sets and prunes
them according to the pruning parameter as set by the user. The labels that are to be
removed has infrequent occurrence in the data set. By pruning the unnecessary label
sets the complexity get reduced. This method is suitable for data sets with large num-
ber of labels. It preserves the core label relationship even after the pruning.

2.4 RAndom k-labELsets (RAkEL)

It breaks a larger training data set into a random n number of small size subsets.
For each training subset k – labels are selected randomly and they are learnt with
Label Power Set method [9]. The output of all LP classifier is collected and the aver-
age of all output is determined as the final output. A threshold value t is used to make
the final decision. This method takes the advantage of LP method and makes a better
prediction. The values n and k place crucial role in the prediction process. Smaller
value of k and larger number of n makes the better prediction.

2.5 Classifier Chain (CC)

It consists of Q number of Binary Classifiers and all are linked by a chain [11].
The ith classifier in the chain is responsible for the label𝑖 ∈ £. The feature space of
each link in the chain is extended with the 0/1 label associations of all previous links.

3 Base Class Configurations

The based classifiers used in this study are: Decision Tree (J48), Support Vector
Machines (SVM), K – Nearest Neighbor Classifier (KNN), Random Forest (RF),
Naïve Bayes (NB), and Multi Layer Perceptron (MLP). All these base classifiers are
capable enough to solve the binary classification problems. We used all the imple-
mentations of the base classifiers from WEKA library which is the foundation for
MULAN library.
4 Data Sets & Tools
The data sets used here for the experimentation are Solare_flare, Emotions [7], and
Yeast [2]. The description of these 3 data sets is given in the following table 1.

Name # instances # dimension # labels Label Cardi-

Solar_flare 320 26 5
Emotions 593 72 6 1.87
Yeast 2417 103 14 4.24
Table 1 – description of data sets
The comparisons were done with the following machine learning tools:
MULAN [5] [10], a A Java library for multi-label learning under the machine learn-
ing framework WEKA and MEKA [6] an extension of WEKA to support multi la-
belled data.

5 Evaluation Measures
The evaluation measures for single label classification problems presents the per-
formance of the classifier in terms of correctness the example label pair classification.
In multi label classification problem each example instance is associated with a label
set but the classification of the example may be partially correct or incorrect. So the
evaluation measures used for the single label classification problems are inadequate
for multi label classification problems. Multi label data can be measured with the
number of examples, number of attribute input space, and the number of labels.
There are 3 types of evaluation measures for multi label learning: they are (i)
Example Based (ii) Label Based and (iii) Ranking Based Measures. The Example
Based measure calculates the average difference between the actual and the predicted
set of labels over the set of examples given in the test data set. The example based
measures discussed here are Hamming Loss, Accuracy, Precision, Recall, F 1_Score,
and Subset Accuracy. The label based measures we used here are Macro_Precision,
Macro_Recall, Macro_F1, Micro_Precision, Micro_Recall, Micro_F1. The ranking
based measures works on the basis of the ranking of the labels. It makes the variation
between the predicted labels against the actual labels in the data set. The ranking
based measures used for this experimentation are: one – error, coverage, ranking loss
and average precision.

6 Results and discussion

Figure 2 shows that the BR method for solar_flare data set performs well
with MLP as its base classifier. At the same hand the SVM base classifier provides its
contribution next to MLP method. With LP method the Naïve base as the base clas-
sifier well suits the situation. Both RF and MLP base classifiers perform well next to
NB base for LP method. As the hamming loss measure value except all methods, the
SVM performs well for LP method. For CC method, the J48, SVM, NB base classifi-
er gives better prediction for the solar_flare data set. For PS method J48 and NB base
classifiers performs well. For PS method the SVM method performs less than other
base classes. With RAkEL method, the tree based classifiers performs well. The fig-
ure 2 shows that as an overall the BR method performs well for the solar_flare data.
As well as the KNN, MLP, RF base classifiers gives better prediction.

Solar_Flare Data
0.3 BR
0.2 LP
0.15 CC
0.05 PS

Fig 2 – PT methods Vs Hamming Loss for Solar_flare data

Figure 3 shows the results of all PT methods for Emotions data set. For BR
method, the J48 base classifier performs poor than other 5 classifiers. With LP me-
thod, the SVM performs best and as same as BR method, the tree based classifiers
performs poor. With CC method the RF classifier performs best and it gives low
hamming loss value which means less misclassification error. For PS method, the
MLP classifier does better job than others. With RAkEL method, the RF classifier
performs better. The figure 3 shows that the BR and LP method performs well for the
emotions data set and the J48 and KNN classifiers shows their full contribution for
the prediction process with this data set.
Emotions Data


0.2 BR

0.15 LP

Fig 3 – PT methods Vs Hamming Loss for Emotions data set

Figure 4 shows the results of PT methods for yeast data set. For BR method,
except NB classifier, remaining all other classifiers performs well in all aspects. For
LP method, the J48 and MLP classifiers perform better. For CC method, the RF,
KNN and NB classifiers performs well. For PS method, the J48 and SVM classifiers
make the better prediction of labels. for RAkEL ensemble method, the tree based
classifiers like J48 and RF base classifiers performs well but the MLP classifier
makes the poor performance. The figure 4 shows the hamming loss measure for the
yeast data with 5 PT methods. From this figure it is clear that the BR and LP methods
perform well.
Yeast Data
0.35 BR
0.25 LP
0.15 CC

Fig 4 – PT methods Vs Hamming Loss for yeast data set

7 Conclusion
In this paper, experimental study on five different multi – label problem
transformation methods and different evaluation metrics were presented using differ-
ent application domains. This study gives useful insights on the working principle of
different methods and a comparative performance analysis is done to see the efficacy
of the different problem transformation methods. With all the 3 data sets the BR me-
thod shows the better performance than other methods because of the independence of
the label correlation nature and the speed.

8 References

[1] Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern
Recognition 37(9), 1757–1771 (2004)
[2] A. Elisseeff, J. Weston, A Kernel method for multi-labelled classification, in: Proceedings of the
Annual ACM Conference on Research and Development in Information Retrieval, 2005, pp. 274–
[3] J. Read. A Pruned Problem Transformation Method for Multi-label classification. In Proc. 2008 New
Zealand Computer Science Research Student Conference (NZCSRS 2008), pages 143–150, 2008.
[4] Jesse Read, Bernhard Pfahringer, and Geoff Holmes. Multi-label Classification Using Ensembles of
Pruned Sets. In ICDM ’08: Proceedings of the 2008 Eighth IEEE International Conference on Data
Mining, volume 0, pages 995–1000, Washington, DC, USA, 2008. IEEE Computer Society
[7] K. Trohidis, G. Tsoumakas, G. Kalliris, I. Vlahavas, Multilabel classification of music into emo
tions, in: Proceedings of the 9th International Conference on Music Information Retrieval, 2008,
pp. 320–330.
[8] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining Multi-label Data”, Data Mining and
Knowledge Discovery Handbook, O. Maimon, L. Rokach (Ed.), Springer, 2nd edition, 2010.
[9] G. Tsoumakas and I. Vlahavas. Random k-Labelsets: An Ensemble Method for Multilabel Clas
sification. In Proceedings of the 18th European Conference on Machine Learning (ECML 2007),
pages 406–417, Warsaw, Poland, September 2007
[10] G. Tsoumakas, R. Friberg, E. Spyromitros-Xiou, I, Kataks, and J. Vilcek, “Mulan software - java
classes for multi-label classification available at:
[11] A. Wieczorkowska, P. Synak, and Z. Ras, “Multi-label classification of emotions in music”,
Proc of the International Conference on Intelligent Information Processing and Web Mining ,
307–315, 2006