0 Up votes0 Down votes

1 views8 pagesbasics of multi label learning

Mar 06, 2018

© © All Rights Reserved

PDF, TXT or read online from Scribd

basics of multi label learning

© All Rights Reserved

1 views

basics of multi label learning

© All Rights Reserved

- CNIL - The Ethical Matters Raised by Algorithms and AI (Dec. 2017)[70]
- Lesson Plan F1.1-DMDW
- cacm12
- 10.1.1.101.3070
- Beyond Clothing Ontologies Modeling Fashion With Subjective Influence Networks
- Data Mining Lab Record for IV B.tech
- An Fingerprint
- Empirical Methods
- Manual de PCI Geomatica
- HANDWRITTEN OBJECTS RECOGNITION USING REGULARIZED LOGISTIC REGRESSION AND FEEDFORWARD NEURAL NETWORKS
- Introductionn
- Clustering Part2a
- Classification of Technical Objects
- entropy-15-00416-v2
- Improving Classifiers
- A Binarization Algorithm for Historical Arabic Manuscript Images using a Neutrosophic Approach
- CIARP 2015
- resumefinal
- Breast Cancer
- CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION

You are on page 1of 8

1

pkachitra @ gmail.com,2app_s @ yahoo.com

1

Research Scholar, 2Professor and Research Coordinator

1

Anna University,

2

Department of Electronics and Communication Engineering,

K.L.N. College of Information Technology,

Sivagangai,Tamil Nadu, India

Abstract. The association between the instance query example and the class labels are mutual-

ly exclusive in traditional single label examples. But in real life applications like musical cate-

gorization, functional genomics, text and document categorization, one instance query example

may belong to a subset of class labels i.e mutually inclusive. Because of the highly correlated

label structure, the traditional single label classification algorithms won’t be sufficient. We

need effective algorithms to work with multiple labels. The multi label classification algorithms

are classified into two ways: (i) transform the multi label problem in to single label binary

problem and (ii) make the existing single label algorithms to cope with multi label problems. In

this paper we present theoretical concepts behind multi – label classification and also we did a

comparative analysis of transformation methods with two tools MEKA and MULAN over

different application domains. 6 Example based, 6 Label based, and 4 Ranking based measures

are used to evaluate the efficacy of the different transformation methods.

Label Power Set, Pruned Set, Classifier Chain, RAkEL

1 Introduction

Traditional Single – labeled supervised classification maps an example exactly to

single output label. Let Q be the set of examples, Q = { q1, q2, q3, ….., qn } and ℒ be the

set of labels, ℒ = { ℓ1 , ℓ2 , ℓ3 … . . , ℓ𝑚 }, the single label classification (SLC) is defined

as

SLC (ħ) : qi ↦ ℓ𝑖 where ℓ =1 (1)

But in real life one example may be associated with the many labels. For ex-

ample in scene classification an instance may belong to beach, tree, people or city etc.

In music categorization, an instance may belong to different emotions like happy, sad,

pleased etc. Like that in medical diagnosis, a patient may suffer from diabetes and

cancer. So every instance may belong to more than one label, and to satisfy this need

the multi label classification assigns single query example to many labels i.e one idea

to multiple concepts.

The Multi - Label Classification (MLC) [8] is a generalization of supervised

single label classification task where each data instance may be associated with a set

of class labels as opposed to one label. But the each label contains only binary values.

The task of multi label classification (MLC) is to map an example instance q i {Q} to

an label set ℓ € {£}

ħ: χ → 2ℒ (2)

i.e

MLC (ħ) : qi ↦ 𝑚 𝑖=1 ℓ where ℓ ≥ 2

Nowadays the number of applications involving data with multiple target labels

gets increased. So, to learn this, the multi label classification has received increased

attention in recent years. Multi label classification is categorized into two ways: (i)

Problem Transformation Method and (ii) Algorithm Adaptation Method. The problem

Transformation method transforms the multi labeled data into single labeled data and

then the traditional single label classification methods are applied over the trans-

formed single labeled data. Here the transformed single labeled data are of binary

classifiers, so traditional single label classifiers are enough for us to make the classifi-

er model. The problem here is, during the multi to single label conversation lot of

information may lost. Even the problem transformation methods are fast in their na-

ture because of the above said nature, the problem transformation methods are less

efficient. In other hand, the algorithm adaptation methods, accepts the single labeled

classifiers as they are and change them to adopt for multi labeled data. So the algo-

rithm adaptation methods are very effective since there is no loss of information in the

data. But in this paper, we are going to make an experimental evaluation of 5 problem

transformation methods over data sets using the machine learning tools for multi la-

beled data.

Supervised learning algorithm first learns the association between the examples

and the related labels then based on the gathered knowledge it builds a model to pre-

dict the unseen examples. The objective of the multi label learning is to

(i) Predict the label set of unseen examples,

(ii) Rank all labels according to relevance with unseen examples, through ana-

lyzing training examples with known label sets.

1.2 Challenges

(i) Loss of label correlation i.e Discovering and modeling label dependencies

(ii) Output Label Sparsity i.e Output space is 2ℒ instead of ℒ

(iii) Insufficient measurements to include label and feature dimensions.

In this paper section 1 describes the introduction and the need for multi – label

classification, the objective and the challenges faced by the recent multi – label data

researcher. Section 2 give the overall view of the Problem Transformation Methods

and 5 different PT methods like Binary relevance (BR), Label Power Set or Label

Combination (LP / LC), Pruned Set (PS), Classifier Chain (CC), Random k – Label

(RAkEL). Section 3 describes the different base classifiers chosen. Section 4 gives the

details of different application domains and the tolls needed to perform experimenta-

tion. Section 5 describes the overall evaluation metrics used for multi – label classifi-

cation. Section 6 presents the experimentation results and at final Section 7 ends with

the conclusion.

It transforms the multi label learning task into several single-label learning tasks

i.e it decomposes the multi-label problem into several independent single label classi-

fication problems. Then the single label classification algorithm is used as the base

classifier to solve the converted single label classification problem to predict the out-

put model. Figure 1 explains the overall Problem Transformation method. The prob-

lem transformation methods chosen for the experimentation here are: Binary Relev-

ance (BR), Label Power Set (LP), Pruned Sets (PS), Classifier Chains (CC), and

RAndom k-labELsets (RAkEL).

Training

Test

Data

Data

label Prob- – label Model

lem Problem

Prediction

Single –

label Base

Classifier

It transforms a multi label problem into n binary problems and each traditional

single label binary base classifier is responsible for predicting the association of sin-

gle label. But it do not model label correlations explicitly [1], instead it constructs a

decision boundary for each label individually. BR is one – vs – all paradigm. It learns

one classifier for each label, using all the examples and the respective label as positive

and other remaining examples as negative. Then each binary classifier predicts

whether its label is relevant for that example or not. In that way BR makes a set of

relevant labels as the final prediction.

Each unique label set taken into account to create one output label for each query

i.e single class containing all the labels for query in concern [3]. The Label Power Set

creates a new set of class labels as the output. Suppose if a query instanced is asso-

ciated with C1, C2, C4, then the newly transformed single label output will be C1, 2, 4.

The new transformed class is a single label problem and any single label base classifi-

er can be used on that. It can make the prediction for new instance if it is already

present in the training data set otherwise it cannot i.e it over fits the training data set.

For a new instance the LP method assigns the most probable class.

This is an extension of Label Power Set. It addresses the label sparsity problem of

Label Power Set by pruning the less frequent labels from the dataset [4]. It takes la-

bels correlations into account. It identifies the less important label sets and prunes

them according to the pruning parameter as set by the user. The labels that are to be

removed has infrequent occurrence in the data set. By pruning the unnecessary label

sets the complexity get reduced. This method is suitable for data sets with large num-

ber of labels. It preserves the core label relationship even after the pruning.

It breaks a larger training data set into a random n number of small size subsets.

For each training subset k – labels are selected randomly and they are learnt with

Label Power Set method [9]. The output of all LP classifier is collected and the aver-

age of all output is determined as the final output. A threshold value t is used to make

the final decision. This method takes the advantage of LP method and makes a better

prediction. The values n and k place crucial role in the prediction process. Smaller

value of k and larger number of n makes the better prediction.

It consists of Q number of Binary Classifiers and all are linked by a chain [11].

The ith classifier in the chain is responsible for the label𝑖 ∈ £. The feature space of

each link in the chain is extended with the 0/1 label associations of all previous links.

The based classifiers used in this study are: Decision Tree (J48), Support Vector

Machines (SVM), K – Nearest Neighbor Classifier (KNN), Random Forest (RF),

Naïve Bayes (NB), and Multi Layer Perceptron (MLP). All these base classifiers are

capable enough to solve the binary classification problems. We used all the imple-

mentations of the base classifiers from WEKA library which is the foundation for

MULAN library.

4 Data Sets & Tools

The data sets used here for the experimentation are Solare_flare, Emotions [7], and

Yeast [2]. The description of these 3 data sets is given in the following table 1.

nality

Solar_flare 320 26 5

Emotions 593 72 6 1.87

Yeast 2417 103 14 4.24

Table 1 – description of data sets

The comparisons were done with the following machine learning tools:

MULAN [5] [10], a A Java library for multi-label learning under the machine learn-

ing framework WEKA and MEKA [6] an extension of WEKA to support multi la-

belled data.

5 Evaluation Measures

The evaluation measures for single label classification problems presents the per-

formance of the classifier in terms of correctness the example label pair classification.

In multi label classification problem each example instance is associated with a label

set but the classification of the example may be partially correct or incorrect. So the

evaluation measures used for the single label classification problems are inadequate

for multi label classification problems. Multi label data can be measured with the

number of examples, number of attribute input space, and the number of labels.

There are 3 types of evaluation measures for multi label learning: they are (i)

Example Based (ii) Label Based and (iii) Ranking Based Measures. The Example

Based measure calculates the average difference between the actual and the predicted

set of labels over the set of examples given in the test data set. The example based

measures discussed here are Hamming Loss, Accuracy, Precision, Recall, F 1_Score,

and Subset Accuracy. The label based measures we used here are Macro_Precision,

Macro_Recall, Macro_F1, Micro_Precision, Micro_Recall, Micro_F1. The ranking

based measures works on the basis of the ranking of the labels. It makes the variation

between the predicted labels against the actual labels in the data set. The ranking

based measures used for this experimentation are: one – error, coverage, ranking loss

and average precision.

Figure 2 shows that the BR method for solar_flare data set performs well

with MLP as its base classifier. At the same hand the SVM base classifier provides its

contribution next to MLP method. With LP method the Naïve base as the base clas-

sifier well suits the situation. Both RF and MLP base classifiers perform well next to

NB base for LP method. As the hamming loss measure value except all methods, the

SVM performs well for LP method. For CC method, the J48, SVM, NB base classifi-

er gives better prediction for the solar_flare data set. For PS method J48 and NB base

classifiers performs well. For PS method the SVM method performs less than other

base classes. With RAkEL method, the tree based classifiers performs well. The fig-

ure 2 shows that as an overall the BR method performs well for the solar_flare data.

As well as the KNN, MLP, RF base classifiers gives better prediction.

Solar_Flare Data

0.4

0.35

0.3 BR

0.25

0.2 LP

0.15 CC

0.1

0.05 PS

0 RAkEL

J48 SVM NB KNN MLP RF

Figure 3 shows the results of all PT methods for Emotions data set. For BR

method, the J48 base classifier performs poor than other 5 classifiers. With LP me-

thod, the SVM performs best and as same as BR method, the tree based classifiers

performs poor. With CC method the RF classifier performs best and it gives low

hamming loss value which means less misclassification error. For PS method, the

MLP classifier does better job than others. With RAkEL method, the RF classifier

performs better. The figure 3 shows that the BR and LP method performs well for the

emotions data set and the J48 and KNN classifiers shows their full contribution for

the prediction process with this data set.

Emotions Data

0.3

0.25

0.2 BR

0.15 LP

0.1

CC

0.05

PS

0

RAkEL

J48 SVM NB KNN MLP RF

Figure 4 shows the results of PT methods for yeast data set. For BR method,

except NB classifier, remaining all other classifiers performs well in all aspects. For

LP method, the J48 and MLP classifiers perform better. For CC method, the RF,

KNN and NB classifiers performs well. For PS method, the J48 and SVM classifiers

make the better prediction of labels. for RAkEL ensemble method, the tree based

classifiers like J48 and RF base classifiers performs well but the MLP classifier

makes the poor performance. The figure 4 shows the hamming loss measure for the

yeast data with 5 PT methods. From this figure it is clear that the BR and LP methods

perform well.

Yeast Data

0.5

0.45

0.4

0.35 BR

0.3

0.25 LP

0.2

0.15 CC

0.1

PS

0.05

0 RAkEL

J48 SVM NB KNN MLP RF

7 Conclusion

In this paper, experimental study on five different multi – label problem

transformation methods and different evaluation metrics were presented using differ-

ent application domains. This study gives useful insights on the working principle of

different methods and a comparative performance analysis is done to see the efficacy

of the different problem transformation methods. With all the 3 data sets the BR me-

thod shows the better performance than other methods because of the independence of

the label correlation nature and the speed.

8 References

[1] Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern

Recognition 37(9), 1757–1771 (2004)

[2] A. Elisseeff, J. Weston, A Kernel method for multi-labelled classification, in: Proceedings of the

Annual ACM Conference on Research and Development in Information Retrieval, 2005, pp. 274–

281.

[3] J. Read. A Pruned Problem Transformation Method for Multi-label classification. In Proc. 2008 New

Zealand Computer Science Research Student Conference (NZCSRS 2008), pages 143–150, 2008.

[4] Jesse Read, Bernhard Pfahringer, and Geoff Holmes. Multi-label Classification Using Ensembles of

Pruned Sets. In ICDM ’08: Proceedings of the 2008 Eighth IEEE International Conference on Data

Mining, volume 0, pages 995–1000, Washington, DC, USA, 2008. IEEE Computer Society

[5] http://mulan.sourceforge.net/

[6] http://meka.sourceforge.net/

[7] K. Trohidis, G. Tsoumakas, G. Kalliris, I. Vlahavas, Multilabel classification of music into emo

tions, in: Proceedings of the 9th International Conference on Music Information Retrieval, 2008,

pp. 320–330.

[8] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining Multi-label Data”, Data Mining and

Knowledge Discovery Handbook, O. Maimon, L. Rokach (Ed.), Springer, 2nd edition, 2010.

[9] G. Tsoumakas and I. Vlahavas. Random k-Labelsets: An Ensemble Method for Multilabel Clas

sification. In Proceedings of the 18th European Conference on Machine Learning (ECML 2007),

pages 406–417, Warsaw, Poland, September 2007

[10] G. Tsoumakas, R. Friberg, E. Spyromitros-Xiou, I, Kataks, and J. Vilcek, “Mulan software - java

classes for multi-label classification available at:

http://mlkd.csd.auth.gr/multilabel.html#Software

[11] A. Wieczorkowska, P. Synak, and Z. Ras, “Multi-label classification of emotions in music”,

Proc of the International Conference on Intelligent Information Processing and Web Mining ,

307–315, 2006

- CNIL - The Ethical Matters Raised by Algorithms and AI (Dec. 2017)[70]Uploaded byBogdan Ivanisevic
- Lesson Plan F1.1-DMDWUploaded byshanthinisampath
- cacm12Uploaded bysarvesh_mishra
- 10.1.1.101.3070Uploaded byEdward Lok
- Beyond Clothing Ontologies Modeling Fashion With Subjective Influence NetworksUploaded byRicardo
- Data Mining Lab Record for IV B.techUploaded bymadhu4a
- An FingerprintUploaded byFong EeMay
- Empirical MethodsUploaded byac3216
- Manual de PCI GeomaticaUploaded byCatalino Castillo
- HANDWRITTEN OBJECTS RECOGNITION USING REGULARIZED LOGISTIC REGRESSION AND FEEDFORWARD NEURAL NETWORKSUploaded bykian_aut
- IntroductionnUploaded byvndna
- Clustering Part2aUploaded byPhạm Trường An
- Classification of Technical ObjectsUploaded bykmambi
- entropy-15-00416-v2Uploaded byHussein Razaq
- Improving ClassifiersUploaded bysaudade96
- A Binarization Algorithm for Historical Arabic Manuscript Images using a Neutrosophic ApproachUploaded byDon Hass
- CIARP 2015Uploaded byGermán Capdehourat
- resumefinalUploaded byapi-355151291
- Breast CancerUploaded byHelloproject
- CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATIONUploaded byCS & IT
- 1802.01468Uploaded byEl Mágico
- mlUploaded byMadhu Vardhineedi
- mulvariate homeworkUploaded byDurbadal Ghosh
- qualitative reseach,complete.docxUploaded byAlleah Rppm
- Spam_Detection_Paper.pdfUploaded byGeorvic Alejandro Tur Rojas
- machine learningUploaded bySai Sudhar Sun
- PaperUploaded byVardhan
- Machine LearningUploaded byAsif Bin Latif
- 1-s2.0-S1877050917305562-main.pdfUploaded byrabehi
- MLUploaded bySaswat Meher

- Listado de MaterialesUploaded byRichard Hernan Reyes Jara
- SOM U C Clustering September 2011Uploaded byVahid Moosavi
- Proportional ControlUploaded byyuj o
- DD_Tabelle_gb-03.pdfUploaded bySofian Nasr
- Bab13 McLeod - Decision Support SystemsUploaded bySandyNugrohoSaputra
- Using+HTKUploaded byngocthelong
- AdequacyUploaded bySumeyya Sarica
- Virtual Reality.pptUploaded byfairy178
- BMS.docxUploaded byAbdul Rawoof Shaik
- Complexity ScienceUploaded byUmmu Habibah
- Connected-Component Labeling (1)Uploaded byAryan Sharma
- BigM MethodUploaded byAkhlaq Husain
- 1-s2.0-S1574013709000173-mainUploaded byYessica Rosas
- A Simultaneous Closed-loop Automatic Tuning Method for Cascade Controllers (2011)Uploaded byRené Pereira
- An Introduction to Fuzzy ControlUploaded by'Luqman Hakim
- 1083Uploaded byhisyamstark
- Control Engineering Lab ManualUploaded byGT
- Books About Biosemiotics - Biosemiotic StudiesUploaded bymirrorfill
- [Sankar K. Pal, Pabitra Mitra] Pattern RecognitionUploaded bymaitriza09
- Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks.pdfUploaded byAl C Jr
- Fuzzy Logic OverviewUploaded byChristian Torres
- Lab Assignment 5Uploaded byJibin George
- Introduction to PID Controllers - Theory Tuning and Application to Frontier AreasUploaded bySchreiber_Dieses
- HCI-Lecture 1- Introduction.pptxUploaded byMoneeb Abbas
- Ai - MaterialUploaded bychandu
- jrr_13-2_rabanUploaded byDL Giroletti
- [Charu_C._Aggarwal]_Neural_Networks_and_Deep_Learn(z-lib.org).pdfUploaded byNam
- The Fun Palace as Virtual Architecture: Cedric Price and the Practice of Indeterminacy by Stanley MatthewsUploaded byJack Gillbanks
- Lab 3Uploaded bySanzhar Orazbayev
- f 02760013Uploaded byMahi Malik

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.