Reverse Clustering Formulation, Interpretation and Case Studies

You might also like

You are on page 1of 117

Studies in Computational Intelligence 957

Jan W. Owsiński · Jarosław Stańczak ·


Karol Opara · Sławomir Zadrożny ·
Janusz Kacprzyk

Reverse
Clustering
Formulation, Interpretation and Case
Studies
Studies in Computational Intelligence

Volume 957

Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new develop-
ments and advances in the various areas of computational intelligence—quickly and
with a high quality. The intent is to cover the theory, applications, and design
methods of computational intelligence, as embedded in the fields of engineering,
computer science, physics and life sciences, as well as the methodologies behind
them. The series contains monographs, lecture notes and edited volumes in
computational intelligence spanning the areas of neural networks, connectionist
systems, genetic algorithms, evolutionary computation, artificial intelligence,
cellular automata, self-organizing systems, soft computing, fuzzy systems, and
hybrid intelligent systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/7092


Jan W. Owsiński Jarosław Stańczak
• •

Karol Opara Sławomir Zadrożny


• •

Janusz Kacprzyk

Reverse Clustering
Formulation, Interpretation and Case Studies

123
Jan W. Owsiński Jarosław Stańczak
Polish Academy of Sciences Polish Academy of Sciences
Systems Research Institute Systems Research Institute
Warsaw, Poland Warsaw, Poland

Karol Opara Sławomir Zadrożny


Polish Academy of Sciences Polish Academy of Sciences
Systems Research Institute Systems Research Institute
Warsaw, Poland Warsaw, Poland

Janusz Kacprzyk
Polish Academy of Sciences
Systems Research Institute
Warsaw, Poland

ISSN 1860-949X ISSN 1860-9503 (electronic)


Studies in Computational Intelligence
ISBN 978-3-030-69358-9 ISBN 978-3-030-69359-6 (eBook)
https://doi.org/10.1007/978-3-030-69359-6
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Switzerland AG 2021
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

We witness nowadays an explosive growth and development of methods and


techniques, related to data analysis, this growth being conditioned, on the one hand,
by the rapidly expanding availability of data in virtually all domains of human
activity, and, on the other hand, the very substantive progress in technical and
scientific capabilities of dealing with the increasing volumes of data. All this
amounts to a dramatic change, especially in quantitative terms.
Yet, as researchers and practitioners involved in the work on methodological
side of data analysis know very well, many of the fundamental substantive prob-
lems in this domain still require solutions, or at least—better solutions—than those
available now. This concerns, in particular, such fundamental areas as clustering,
classification, rule extraction, and so on. The primary issue is here constituted by
the opposition between precision or accuracy and speed or computational cost
(when the problem at hand is already truly well-defined). One cannot forget, nei-
ther, of the very strong data dependence of effectiveness and efficiency of many
of the methodologies being applied nowadays, making the situation even more
difficult.
The present book addresses this nexus of issues, aiming, in this case, apparently
at the interface of clustering and classification, but, in fact, being relevant to a much
broader domain, with much broader implications in terms of applicability and
interpretation. Namely, it describes the paradigm of “reverse clustering”, introduced
by the present authors. The paradigm concerns the situation, in which we are given
a certain data set, composed of entities, observations, objects…, which is usual for
the data analysis situation, and, at the same time, we are given, or we consider, a
certain partition of this data set. We do not assume a priori anything about the data
set, nor about the partition, and, essentially importantly, about the relation between
the data set and the partition. Thus, the partition may be the result of a definite kind
of analysis of the given data set, but may, as well, result from quite a different
mechanism (e.g. a division of the set of objects according to some variable or
criterion not contained in the data set at hand).

v
vi Preface

Under these circumstances—the data set and the partition being given—we try
to reconstruct the partition on the basis of the data set, using cluster analysis. We try
to find the entire clustering procedure that will yield, for this given data set, a
partition that is as close to the given one as possible. Thus, the result of the pro-
cedure is both the clustering procedure, defined by a number of attributes (clus-
tering method, its parameters, variable selection, distance definition,…) and the
concrete partition found.
It is obvious that the paradigm borders upon classification (for a very specific
formulation/interpretation of the situation faced), but extends to a much broader
domain, in which the perception of the problem itself and the meaning of solutions
can vary very widely. This is, in particular, shown in the present book.
In the current stage of work, the results obtained and largely contained in this
book pertain mainly to the substantive aspect of the paradigm, while the technical
aspects of the respective algorithms are, as of now, left to future research.
The reverse clustering paradigm constitutes a new perspective on quite a broad
spectrum of problems in data analysis, and, as the book shows, it can provide very
interesting, instructive and significant results, under a wide variety of interpreta-
tional assumptions. We sincerely hope, therefore, that this book does not only give
the Readers a new material and fresh insight into some problems of data analysis,
but may also provoke them to deeper studies in the direction here indicated.

Warsaw, Poland Jan W. Owsiński


Jarosław Stańczak
Karol Opara
Sławomir Zadrożny
Janusz Kacprzyk
Introduction

This book is devoted to an approach or a paradigm, developed by the authors and


applied to a series of cases, of diverse character, mostly based on real-life data; the
approach (or paradigm) belonging to the broadly understood domain of data
analysis—more precisely: classification and cluster analysis. We call the approach
“reverse clustering” because of its logic, which is formulated as follows:

Assume we dispose of a set of data, X, composed of n objects or observations,


indexed i, i = 1,…,n, each of these being described by a vector of m features
or variables, indexed k, the respective vector being denoted xi = {xi1,…,xik,
…,xim}. At the same time, assume we dispose of a partition of the set X of
objects into subsets, this partition being denoted PA. For these data, we try to
obtain a partition PB that is as close to PA as possible, by applying clustering
algorithms to the set X. Thereby, we find both the partition PB that is as close
as possible to PA and the concrete clustering procedure, with all its param-
eters, which yields the partition PB.

The above does not explicitly state the purpose of the exercise (to say nothing
of the technical details), but it can easily be deduced that what is aimed at is closely
related to the notion of classification. While the close relation with classification is
not only obvious, but definitely true, the paradigm has a much wider spectrum of
applications and meanings, as this is explained in Chap. 2 of the book, following
the more precise presentation of this paradigm, given in Chap. 1.
The paradigm is constituted, first, by the above statement of the problem, which
then has to be expressed in pragmatic technical terms, involving
(1) the space of clustering algorithms with its granularity (what algorithms are
accounted for and what parameters, defining the entire clustering procedure, are
being subject of the search for PB);

vii
viii Introduction

(2) the measure of similarity between the partition of the set X, given at the outset,
i.e. PA, and the partitions, obtained from the clustering algorithms, this measure
being maximised (or the measure of distance between them, being minimised);
and
(3) the technique of search for the PB given the data of the concrete problem.
This paradigm is, however, also, and perhaps even more importantly, constituted
by the interpretation of the entire setting, and the particular instances of this
interpretation—as mentioned, treated at length in Chap. 2. This is important insofar
as it places the paradigm against the background of the data analysis domain, with
special emphasis on classification and related fields. These various interpretation
instances are associated primarily with the status of the partition PA, namely its
source, the degree of credibility we assign to it, as well as its actual or presumed
connection with the data set X. Depending on these, and on the results obtained, the
status of the obtained partition PB, including validity and applicability, will also
vary significantly.
Owing to this variety of interpretations, the paradigm may find application in a
broad spectrum of analytic, but also cognitive, situations. The subsequent chapters
of the book, starting with the third one, are exactly devoted to the presentation
of the cases treated, which definitely differ not only as to their substance matter
(domain, from which the data come), but, largely, as to the interpretation of the
actual problem and the results obtained. The implication is that the paradigm can be
used in many data analytic circumstances for diverse purposes, whenever the
structuration of the data set into groups is appropriate.
The paradigm of reverse clustering has been presented already in several papers
by the same team of authors, e.g. in Owsiński et al. (2017a, b), Owsiński, Stańczak
and Zadrożny (2018). The present book aims at a more complete presentation of the
paradigm and its interpretations. The book does not go into the computational and
numerical issues and details, which are, of course, of very high importance.
Namely, the main purpose of the book is to present the approach and its capacities
in terms of various kinds of situations, problems and interpretations of respective
results. We do indeed hope it conveys the intended message in an effective and
interesting manner.
The book is structured in the following manner: first, Chap. 1 presents the
scheme of the approach, characterised, in particular, as it has been used in the cases
illustrated in this book, along with notation used. Then, Chap. 2 outlines the context
of the reverse clustering, starting with other approaches, which concern similar
kinds of problems, related to data analysis, including also an ample reference to the
very general idea of reverse engineering, as well as explainable artificial intelli-
gence or data analysis. Then, the context is shortly analysed in terms of more
detailed specific problems, arising in connection with both the reverse clustering
procedure and the data analytic methods in a more general perspective (like, e.g.
selection of variables, or definitions of distance). This chapter contains also a very
important section on the potential interpretations of the reverse clustering paradigm
and its results. Chapter 3 constitutes a very short introduction to the cases studied
Introduction ix

and illustrated in the book, which are then presented in the consecutive chapters:
Chap. 4 is devoted to the motorway traffic data, Chap. 5 to environmental con-
tamination data, Chaps. 6 and 7 to two separate cases of typologies or classifications
of administrative units in Poland, and, finally, Chap. 8 to some more academic
exercises. The book closes with Chap. 9 summarising the work done and proposing
some new vistas.
This book is intended to offer the Readers truly interesting and novel perspec-
tives in data analysis, regarding the diverse ways of formulating and approaching
problems, and understanding the results, and we shall be very satisfied if it did it at
least in a perceptible degree.

Jan W. Owsiński
Jarosław Stańczak
Karol Opara
Sławomir Zadrożny
Janusz Kacprzyk
Contents

1 The Concept of Reverse Clustering . . . . . . . . . . . . . . . . . . . . . . .... 1


1.1 The Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 1
1.2 The Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 4
1.3 The Elements of Vector Z: The Dimensions of the Search
Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 5
1.4 The Criterion: Maximising the Similarity Between Partitions
PA and PB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 11
1.5 The Search Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 12
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 13
2 Reverse Clustering—The Essence and The Interpretations . . . . . . . . 15
2.1 The Background and the Broad Context . . . . . . . . . . . . . . . . . . . 15
2.2 Some More Specific Related Work . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 The Interpretations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3 Case Studies: An Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1 A Short Characterisation of the Cases Studied . . . . . . . . . . . . . . . 37
3.2 The Interpretations of the Cases Treated . . . . . . . . . . . . . . . . . . . 40
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 The Road Traffic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1 The Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 The Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 The Chemicals in the Natural Environment . . . . . . . . . . . . . . . . . . . 53
5.1 The Data and the Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 The Procedure: Determining the Partition PA . . . . . . . . . . . . . . . . 56
5.3 The Procedure: Reverse Clustering . . . . . . . . . . . . . . . . . . . . . . . 58

xi
xii Contents

5.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61


References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6 Administrative Units, Part I . . . . . . . . . . . . . . . . . . . . . . . . . ...... 63
6.1 The Background: Polish Administrative Division
and the Province of Masovia . . . . . . . . . . . . . . . . . . . . . . ...... 63
6.2 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 64
6.3 The Analysis Regarding the Administrative Categorization
of Municipalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 67
6.4 A Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 71
6.5 The Analysis Regarding the Functional Categorization
of Municipalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 72
6.6 Conclusions and Discussion . . . . . . . . . . . . . . . . . . . . . . ...... 76
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...... 78
7 Administrative Units, Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.1 The Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 The Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.3 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8 Academic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.2 Fisher’s Iris Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.3 Artificial Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
9 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.1 Interpretation and Use of Results . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.2 Some Final Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
List of Figures

Fig. 2.1 The scheme of the reverse clustering problem formulation . . . . .. 25


Fig. 2.2 The scheme of potential cases of interpreting the paradigm
of reverse clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 28
Fig. 2.3 An illustration of division of a set of objects according
to the rule of “putting together the dissimilar and separating
the similar”: colours indicate the belongingness to three groups:
blue, red and green . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 30
Fig. 3.1 Rough indication of interpretations of the cases treated against
the framework of Fig. 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 41
Fig. 4.1 Median hourly profiles of traffic for the classes of the days
of the week . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 44
Fig. 4.2 Hourly profiles of traffic intensity for individual hours
of the week. Colours, assigned to successive days, denote
the clusters, forming the initial partition PA . . . . . . . . . . . . . . . .. 44
Fig. 4.3 Visual interpretation of clusters described in Table 4.2 . . . . . . .. 49
Fig. 5.1 Concentration levels for Pb: areas in the order of increasing
Pb concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55
Fig. 5.2 Concentration levels for Cd: areas in the order of increasing
Cd concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55
Fig. 5.3 Concentration levels for Zn: areas in the order of increasing
Zn concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 56
Fig. 5.4 Concentration levels for S: areas in the order of increasing
S concentrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 56
Fig. 5.5 The distribution of points (“areas”) in the space of
concentrations for a particular elements and pairwise; b
enlarged for Zn and S (upper box) and of Pb and Cd (lower
box); see the text further on for the interpretation of colours . . .. 57
Fig. 6.1 Data on municipalities of the province of Masovia with
administrative categorisation into three categories on the plane
of the first two principal components (colours refer to the results
from Table 6.6). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 69

xiii
xiv List of Figures

Fig. 6.2 Map of the province of Masovia with the indication of the
municipalities classified in three clusters resulting from the
reverse clustering according to the data from Table 6.3. Red
area in the middle corresponds to Warsaw and its
neighbourhood, the bigger red blobs correspond to subregional
centres (Radom, Płock, Siedlce and Mińsk Mazowiecki) . . . . . .. 70
Fig. 6.3 Map of Masovia province with the partition PB
from Table 6.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 76
Fig. 6.4 Map of Masovia province with the partition PB
from Table 6.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 77
Fig. 7.1 Two examples of the procedures, leading to the potential
prior categorization of the sort similar to the one of interest
here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 80
Fig. 7.2 Map of Poland with indication of municipalities, which
belonged in the solution of Table 7.2 to the “correct”
categories from the initial partition and those that belonged
to the other ones (“incorrect”) . . . . . . . . . . . . . . . . . . . . . . . . . .. 84
Fig. 7.3 Map of Poland, showing the partition of the set of Polish
municipalities obtained with the own evolutionary method
and the k-means algorithm, composed of 12 clusters,
corresponding to Table 7.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 85
Fig. 8.1 An example of the artificial data set with “nested clusters”,
subject to experiments with reverse clustering . . . . . . . . . . . . . .. 92
Fig. 8.2 An example of the artificial data set with “linear broken
structure”, subject to experiments with reverse clustering . . . . . .. 92
Fig. 9.1 Map of the province of Masovia showing the municipality
types, obtained from the reverse clustering performed
with DBSCAN algorithm, characterised in Table 9.1 . . . . . . . . . . 100
Fig. 9.2 The meta-scheme of application of the reverse clustering
paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
List of Tables

Table 1.1 Values of the Lance-Williams coefficients for the most


popular of the hierarchical aggregation clustering
algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8
Table 1.2 Elements of calculation of the Rand index of similarity
between partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11
Table 4.1 Summary of results for the first series of experiments
with traffic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 46
Table 4.2 Results for traffic data for the entire vector of parameters Z,
with the use of hierarchical aggregation (values of Rand
index = 0.850, of adjusted Rand = 0.654). The upper part
of the table shows the coincidence of patterns in particular
Aq, based on the days of the week, and obtained Bq . . . . . . .. 47
Table 4.3 Results for the traffic data obtained with the “pam”
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 48
Table 5.1 Pollution data for Baden-Württemberg (Germany),
used in the exemplary calculations: total concentrations,
in mg/kg of dry weight (Pb-Lead, Cd-Cadmium, Zn-Zinc,
S-Sulphur) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 54
Table 5.2 Numbers of areas in the classes, defined for the elements
Zn and S contents in the herb layer . . . . . . . . . . . . . . . . . . . .. 58
Table 5.3 Contingency table for the partition PA assumed
and the one obtained in Series 1 of calculations, PB,
with the k-means algorithm and data only for Pb and Cd . . .. 59
Table 5.4 Contingency table for the partition PA assumed
and the one obtained in Series 1 of calculations, PB,
with the hierarchical aggregation algorithm and data only
for Pb and Cd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 60
Table 5.5 Contingency table for the partition PA assumed and the one
obtained in Series 1 of calculations, PB, with the DBSCAN
algorithm and data for all four elements. . . . . . . . . . . . . . . . .. 60

xv
xvi List of Tables

Table 5.6 Contingency table for the partition PA assumed and the one
obtained in Series 2 of calculations, PB, with the hierarchical
merger algorithm and data for all four elements . . . . . . . . . . .. 60
Table 6.1 Functional typology of municipalities of the province
of Masovia (data as of 2009) . . . . . . . . . . . . . . . . . . . . . . . . .. 65
Table 6.2 Variables describing municipalities, accounted
for in the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 66
Table 6.3 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with own evolutionary
algorithm using k-means . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67
Table 6.4 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with own evolutionary
algorithm using hierarchical aggregation . . . . . . . . . . . . . . . .. 67
Table 6.5 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with own evolutionary
algorithm using DBSCAN . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68
Table 6.6 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with DE algorithm
using “pam” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68
Table 6.7 Contingency matrix for the administrative breakdown
of municipalities of the province of Masovia in Poland
and reverse clustering performed with DE algorithm using
“agnes” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 68
Table 6.8 Examples of variable weights for two runs of calculations,
presented in Tables 6.3 and 6.4 . . . . . . . . . . . . . . . . . . . . . . .. 71
Table 6.9 Contingency matrix for the administrative breakdown
of municipalities of the province of Wielkopolska in Poland
and clustering performed with the Z vector obtained
for Masovia in the case shown in Table 6.3
(k-means algorithm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 71
Table 6.10 Contingency matrix for the administrative breakdown of
municipalities of the province of Wielkopolska in Poland and
clustering performed with the Z vector obtained for Masovia
in the case shown in Table 6.4 (hierarchical aggregation
algorithm). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 72
Table 6.11 The contingency matrix for the functional typology of
municipalities of Masovia from Table 6.1 and reverse
clustering with own evolutionary method using the k-means
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73
List of Tables xvii

Table 6.12 The contingency matrix for the functional typology


of municipalities of Masovia from Table 6.1 and reverse
clustering with own evolutionary method using hierarchical
aggregation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 73
Table 6.13 The contingency matrix for the functional typology
of municipalities of Masovia from Table 6.1 and reverse
clustering with DE using “pam” algorithm . . . . . . . . . . . . . . .. 74
Table 6.14 The contingency matrix for the functional typology
of municipalities of Masovia from Table 6.1 and reverse
clustering with DE using “agnes” algorithm. . . . . . . . . . . . . .. 75
Table 7.1 Functional typology of Polish municipalities . . . . . . . . . . . . .. 81
Table 7.2 Contingency table for the proposed functional typology
of Polish municipalities and the reverse clustering partition
obtained with own evolutionary method using k-means
algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 82
Table 7.3 Variable weights in the solution illustrated in Table 7.2 . . . .. 83
Table 7.4 Contingency table for the proposed functional typology
of Polish municipalities and the reverse clustering partition
obtained with own evolutionary method using hierarchical
aggregation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 86
Table 8.1 The results obtained for the Iris data with the DE
method—comparison of “pam” and “agnes” algorithms
and two selections of vector Z components (notation
as in Table 4.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 90
Table 8.2 Contingency table for the DE method applied to the Iris
data with the “pam” algorithm . . . . . . . . . . . . . . . . . . . . . . . .. 90
Table 8.3 Contingency table for the DE method applied to the Iris
data with the “agnes” algorithm . . . . . . . . . . . . . . . . . . . . . . .. 90
Table 8.4 The reverse clustering results for the Iris data obtained
with the own evolutionary method using DBSCAN, k-means
and hierarchical merger algorithms . . . . . . . . . . . . . . . . . . . . .. 91
Table 9.1 Contingency matrix for the typological categorisation of the
municipalities of the province of Masovia in Poland obtained
with reverse clustering using own evolutionary algorithm and
the DBSCAN algorithm (for explanations see Chap. 6) . . . . .. 99
Chapter 1
The Concept of Reverse Clustering

1.1 The Concept

This book presents an approach, or a paradigm, within which we try to develop a


reverse engineering type of procedure, aimed at reconstructing a certain partition1
of a data set, X, X = {x i }, i = 1,…,n, into p subsets (clusters), Aq , q = 1,…,p. We
assume that each object, indexed by i, is characterized by m variables, so that x i =
(x i1 ,…,x ik ,…,x im ).
Having the partition PA = {Aq }q , given in some definite manner, we now try to
figure out the details of the clustering procedure which, when applied to X, would
have produced the partition PA or its possibly accurate approximation.
That is, we search in the space of configurations, with a particular configuration
denoted by Z, this space being spanned by the following parameters:
(i) the choice of the clustering algorithm, and characteristic parameters of the
respective algorithm(s);
(ii) the selection or other operations on the set of variables (e.g. weighing,
subsetting, aggregation), and
(iii) the definition of a similarity/distance measure between objects, used in the
algorithm.
The partition, resulting from applying the clustering procedure with a candidate
configuration of the above parameters is denoted PB and is composed of clusters Bq’ ,
q’ = 1,…,p’, PB = {Bq’ }q’ . The search is performed by optimizing with respect to a
certain criterion, denoted Q(PA , PB ), defined on the pairs of partitions.
So, as we denote the set of parameters, comprising a configuration, that is being
optimized in the search, by Z (notwithstanding the potential differences in the actual
content of Z), and the space of values of these parameters by , then we are looking
in  for a Z * that minimizes Q(PA , PB ), where PB (Z * ) is a partition of X obtained

1 The concept of a Reverse Cluster Analysis has been introduced by Ríos and Velásquez (2011) in
case of the SOM based clustering, but it is meant there in a rather different sense than associating
original data points with the nodes in the trained network.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 1
J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_1
2 1 The Concept of Reverse Clustering

using the configuration Z * . Formally, we can treat Z as a transformation of the data


set X (a cluster operator) and, thus denote the optimization problem for a given data
set X and its known partition , where denotes the set of all
partitions of X, as follows:

(1.1)

Z ∗ = arg min Q(PA , Z (X )). (1.2)


Z ∈

(Notice that this optimization problem is in line with the reverse engineering, or
backward engineering paradigm, i.e. a procedure that aims at finding out for some
object or process what has been the underlying design, architecture or implementation
process that led to the appearance of the object in question—more on this subject in
Chap. 2.)
Because of the irregularity of circumstances of this search (the nature of the search
space and the values of the performance criterion, see later on for some more details
on this), the solution of the optimization problem defined above is a challenging task.
In our experiments, presented later on in the book, optimisation is performed with
the use of evolutionary algorithms.
Altogether, we try to reconstruct a way in which PA has been obtained from X, this
way, represented through the configuration, Z, pertaining to the broadly conceived
procedure of cluster analysis, for the very simple reason that clustering is a natural
way to produce partitions of sets of data. In some specific circumstances one might
imagine other approaches to the said reconstructing, but we stick to the apparently
most natural one. A short discussion of this subject is also provided in Chap. 2 of
the book.
In order to bring the problem formulated here closer to some real life situation,
let us consider the following three examples of situations:

Example 1: a car dealer. Assume a secondhand car dealer disposes of a set of


data on (potential) customers, who visit the website of the dealer, call this set Y,
and the set of data on those, who actually bought a car from the dealer, the set X.
Naturally, the set X is much smaller than Y (it may constitute, say, less than 1% of
Y ). In this case, a partition of X, PA, might be based on the makes and/or types of
cars purchased by customers represented in the data set X. The dealer might wish to
identify the “rules” leading to the partition PB of the set of the purchasing customers,
X, disregarding the labeling by car makes / types, yet such that approximate possibly
well PA . The obvious objective would be to identify how the groups of customers
interested in particular makes/types of cars may form. Having such a procedure
(Z * ) identified one may apply it to the set Y and obtain the “classes” of (potential)
customers of particular makes/types which, in turn, may be more effectively targeted
with the promotional offers or just information, regarding definite makes / types
of cars. Thus, upon finding the Z * that produces PB that is the closest to PA , one
might hope that by applying Z * to Y it would be possible to define the classes of
1.1 The Concept 3

the (potential) customers, at whom appropriate offers could be effectively addressed


during their search through the website. These classes would form the partition PB
= Z * (Y ).

Example 2: categorization of chemical compounds. Assume that i ∈ I is the


index of a set X of chemical compounds, which are classified in PA according to
their theoretically known properties, primarily related to their toxic properties, or,
more generally, their environmental impact. These properties, along with the associ-
ated classification PA , are based on their composition and structure, and the known
consequences thereof. On the other hand, let us assume that for each compound
i, a vector/object x i of the actual measurements and assessments “from the field”
is available, reflecting the actual action and characteristics of the respective i-th
compound in the concrete, diverse, environmental conditions. Thus, the set X may
be interpreted as a set of such vectors, X = {x i }i∈I or as a matrix X = [x ik ], where k
is an index of an attribute characterizing object x i . These may be related to both the
(induced or deduced) impact on the biotic and abiotic environment, and the charac-
teristics of more physical character, such as penetration speed and reach, persistence,
adhesion, etc. In addition, there may be multiple observations for a single compound
i and, thus, X is actually a bag (multiset). Now, the (“best”, i.e., “closest” to PA )
partition PB we obtain for X = {x i }i∈I , especially regarding the clustering of x i ’s,
reveals partly the influence of the variety of environmental situations on the actual
action of the compounds, but, definitely, also sheds light “backwards” on the appro-
priateness of the categorization PA , motivating, perhaps, to the search for additional
dimensions, characterizing the compounds analyzed. This can take on the form of
an iterative back-and-forth procedure, with subsequent PA (t) and PB (t), obtained in
consecutive iterations t, hopefully getting closer to each other, if not converging.

Example 3: the actual toxicity of mushrooms. Even though this case might be
regarded as anecdotal, mushrooms do constitute an important part of cuisine and
diet in many cultures, and also in many of them lead, every year, to deaths or severe
hospitalizations. It is also well known that owing to the biological properties of
mushrooms their toxicity is highly variable, and the actual effects heavily depend
upon the way they are prepared (e.g. boiling mushrooms in water and then pouring
this water away) and consumed, as well as upon the consumer, her/his general and
current characteristics (like, e.g., age, weight, or alcohol currently consumed). The
partition PA is meant to correspond to the classes of toxicity / edibility of the particular
species, with the aim of communicating these characteristics to the wide public
in a possibly clear manner. Thus, PA , prepared by the experts is juxtaposed with
the partition PB , obtained from the set X of descriptions x i of the actual medically
described poisoning cases, as well as interviews with experienced cooks, specialized
in mushroom dishes. The juxtaposition is intended to lead to better justified and
cogently characterized classification PB (Z * ), supposedly communicated to the wide
public, including general edibility assessments, cooking indications, advice as to the
identification and first help, etc.
4 1 The Concept of Reverse Clustering

1.2 The Notation

We shall now sum up the notation already introduced, extending it whenever


necessary with the notions that will be used further on:
X = {x i }i ∈ I – a set of objects under consideration; this symbol, depending
on the context, may be interpreted in a slightly different way
(see further below);
n– number of objects (observations) in the data set considered;
i– index of the objects, i = 1,…,n;
I = {1,…,i,…,n} – the set of indices of the objects; this set of indices is often
equated, for simplicity, with the set of objects;
m– number of variables (features, attributes, characteristics),
describing the objects2 ;
k– index of the variables of the objects, k = 1,…,m;
x ik – value of variable k for object i; this value belongs to a domain
k associated with variable k;
xi – complete description of the object i in the form of the vector
of values x i = [x i1 ,…,x ik ,…,x im ];
X– also: an n x m matrix, containing the descriptions of all n
objects, according to all m variables;
Ex – the Cartesian product 1 × … × k × … × m of domains of
all variables/attributes which are used to characterize object
x;;
P– a partition of the set of objects X = {x i }i ∈ I , often understood
as the set of their indices, I, into disjoint non-empty subsets
(clusters), P = {Aq }q=1,…,p , jointly covering
 the whole set X,
i.e., ∀q Aq = ∅, ∀q1, q2 Aq1 ∩ Aq2 = ∅, q Aq = X ;
Aq – a cluster (subset of I), indexed by q; q = 1,…,p, where p is the
number of clusters; thus, P = {Aq }; the clusters are assumed
to be disjoint and to exhaust (cover) the set I (hence, we do not
consider, at least not in this book, the fuzzy or rough clusters);
PA – the partition, which is provided together with X as the datum
of the concrete problem;
Z– the vector of parameters (a configuration) of the clustering
procedure comprising the very procedure itself, applied to X,
yielding a partition P = Z(X) of X;
– the universe of possible/considered vectors (configurations)
Z;

2 We do not consider here, in this book, the issue of missing data. Thus, it is assumed that for all
n objects each of m variable values is specified. Although the reverse clustering paradigm applies
also to the case of missing values, the book is devoted to the presentation of the main aspects and
implications of the paradigm, without delving into the multiple, even if important, side issues.
1.2 The Notation 5

Q(.,.) – a measure of similarity or distance between two partitions; we


shall also use notation Q for the quality functions of partitions,
when referred to explicitly;
PB – the partition, obtained from the entire procedure, as suppos-
edly the closest to PA ;
d(.,.) – the distance measure between objects; for objects, charac-
terised in X, we admit a simpler notation: d ij , where i,j ∈
I;
D(.,.) – the distance measure between sets of objects.
A, B, … - a general notation of subsets of I;
X, Y, … - also: general notation of the data sets, describing sets of
objects.

1.3 The Elements of Vector Z: The Dimensions


of the Search Space

We shall now give some additional details, which are associated with the concrete
implementation of the concept introduced here, according to the three aspects of the
space of configurations, specified before. Thereby, we shall be specifying the content
of the vector Z, composed of the individual parameters, subject to choice.
The choice of the clustering algorithms and their parameters.
Concerning the search with respect to the clustering algorithm, throughout this
volume we shall be confined to three families of algorithms:
1. The k-means-type algorithms with some of its varieties, like, e.g. k-medoids;
2. The classical progressive merger algorithms, such as single linkage, complete
linkage etc., and
3. A representative of the local density based algorithms, in this case the DBSCAN
algorithm.
No other kinds of clustering algorithms were accounted for in the experiments
reported in this volume, but, actually, considering the clustering algorithms proper,
the ones mentioned constitute the major part of those numerous clustering algorithms
that could be included in the search. It was important for us to consider the approaches,
which are by their very nature oriented at solving of the clustering problem—it should
namely be mentioned, for clarification, that the metaheuristics, very often used also
for clustering purposes, are by no means clustering algorithms themselves, and do
not contain in themselves the rationality, oriented at a possibly good partitioning of
a data set, but, quite generally, at finding an optimum solution.
We do by no means provide here any review of clustering methods, this domain
being the subject of a multitude of books and papers, both general, survey-like, and
devoted to concrete methods and algorithms, to say nothing of a myriad of appli-
cations. For the sake of completeness we mention such general references, dealing
with clustering, as Mirkin (1996), de Falguerolles (1977), Hayashi et al. (1996),
6 1 The Concept of Reverse Clustering

Banks et al. (2004, 2011), Wierzchoń and Kłopotek (2018), Bramer (2007), Owsiński
(2020), as well as, more focused on specific problems in clustering, Adolfsson et al.
(2019), Figueiredo et al. (1999), Guha et al. (2003), or Simovici and Hua (2019).
The k-means-type algorithms.
The k-means algorithms are based on the following general procedure:
1. for the given data set X = {x i }i∈I generate in some way p points3 in E x
(centroids), denote them x q , q = 1,…,p;
2. assign each object x i from X to the closest centroid x q , thus, for each x i distances
d(x i ,x q ) are calculated for q = 1,…,p, and x i is assigned to x q* , for which d(x i ,x q* )
= minq d(x i ,x q ); thereby, the clusters Aq are formed;
3. for the obtained clusters Aq determine the new centroids x q , being the “repre-
sentatives” of the clusters, e.g. as the means of the elements of clusters, assigned
to clusters in the previous step;
4. if the stopping criterion, e.g., the lack of essential changes between the centroids
in subsequent steps of the algorithm, is not satisfied (yet), go to 2, otherwise
terminate.
This simple procedure was initially formulated by Steinhaus (1956), and soon
afterwards was also developed by Lloyd (1957), but the main impact came from
Forgy (1965), Ball and Hall (1965), and MacQueen (1967). The fuzzy-set based
version of the general k-means method, which became enormously popular and
known as fuzzy c-means, was formulated by Bezdek (1981) (see also, for fuzzy
partitions, Dunn 1974, and Bezdek et al. 1999), following which quite a number of
varieties and algorithmic proposals within the k-means-like algorithm family were
forwarded (see, for instance, Lindsten et al. 2011, Dvoenko 2014, the recent work
of Kłopotek, 2020, or the discussions of equivalence with the Kohonen’s SOMs,
originally formulated by Kohonen 2001).
Nowadays, this generic procedure is being implemented in a variety of manners,
differing, in particular, as to the status of the x q —whether they are chosen from
among the objects x i (k-medoids version) or can be any elements of E x (the classical
k-means) and the way, in which they are determined, and it is available through a
number of open access and paid libraries.
The procedure, along with its varieties, is known to converge quickly (in a couple
or a dozen of iterations of the procedure above) to a local minimum, depending upon
the starting point (the initial points, “centroids”, from step 1) and the nature of the
set X. Since it converges quickly, it remains feasible to start it many times over from
diverse initial sets of centroids in order to increase the chances of finding the global
optimum.
The local minimum that is reached through the functioning of the above procedure
is, naturally, the minimum of the following criterion function:
   
Q(P) = d xi , x q .
q i∈Aq

3 Usually,
instead of p we would use k, as in „k-means” but this would overlap with an earlier
assumed meaning of k as an index of variables/attributes characterizing objects to be clustered.
1.3 The Elements of Vector Z: The Dimensions of the Search Space 7

The distance function used is the Euclidean metric squared, in order to preserve
the properties, associated with the choice of cluster mean as the representative of the
cluster. It is obvious that the above Q(P) is monotonic with respect to p, its minimum
for consecutive p’s decreasing with the increase of p down to Q(P) = 0 for p = n.
That is why the k-means type algorithms are applied with the number of clusters, p,
specified.
In the light of the above it becomes clear that the parameters of the vector Z,
associated with the k-means algorithm are the very choice of the algorithm (k-means
or one of its varieties, usually k-medoids as an alternative) and the number of clusters.
Although the choice of the distance definition appears to have an influence on the
results obtained from the k-means algorithms, it is not treated here, as considered
later on in this chapter.
The classical hierarchical merger algorithms.
The second group of algorithms accounted for in the here reported study of the
reverse clustering is the group of most classical clustering algorithms, consisting in
stepwise mergers of objects and then clusters. These algorithms are all constructed
as follows:
1. start from the set of objects, X, treating each object as a separate cluster (p = n);
calculate the distances d qq’ for all pairs of objects (indices) in I; these distances
are, therefore, treated in this step as inter-cluster distances, Dqq’ ;
2. find the minimum distance Dq*q** = minqq’ Dqq’ ; join/merge the clusters, indexed
by q* and q**, between which the distance is minimum, thereby forming a new
partition, with p: = p − 1;
3. check, whether p > 1; if not, terminate the procedure (all objects have been
merged into one all-embracing cluster);
4. recalculate the inter-cluster distances (i.e. the distances between the cluster
resulting from the merging of Aq* and Aq** in the previous step, on the one hand,
and all the other clusters on the other hand, the distance Dq*q** “disappearing”);
go to 2.
This—again—very simple procedure gives rise to a variety of concrete algorithms,
which differ by the inter-cluster distance recalculation step 4. The algorithms from
this group find their ancestor in the so-called “Wrocław taxonomy” by Florek et al.
(1956), who were the first to formulate what is now called “single-linkage” algorithm,
along with some of its more general properties. The essential step in the development
of the family of these algorithms came with the papers by Lance and Williams
(1966, 1967). They introduced the general formula, according to which the distance
recalculation step is performed:

Dq ∗ ∪q ∗∗ ,q = a1 Dq ∗ q + a2 Dq ∗∗ q + bDq ∗ q ∗∗ + c|Dq ∗ q − Dq ∗∗ q |

where q* ∪ q** denotes the index of the cluster resulting from the merging of
clusters q* and q**, with the values of the coefficients, corresponding to the particular
8 1 The Concept of Reverse Clustering

Table 1.1 Values of the Lance-Williams coefficients for the most popular of the hierarchical
aggregation clustering algorithms
Algorithm a1 a2 b c
Single linkage (nearest 1/2 1/2 0 −1/2
neighbor)
Complete linkage (farthest 1/2 1/2 0 1/2
neighbor)
Unweighted average nq* /(nq + nq* ) nq** /(nq + nq** ) 0 0
(UPGMA)
Weighted average 1/2 1/2 0 0
(WPGMA)
Centroid (UPGMC) nq* /(nq + nq* ) nq** /(nq + nq** ) − nq* nq** /(nq* + nq** ) 0
Median (WPGMC) 1/2 1/2 −1/4 0

implementations of the procedure, i.e. the particular progressive merger algorithms,


shown in Table 1.1 for the most popular of these algorithms.
These algorithms have become quite commonly used because of their intuitive
appeal and the fact that the consecutive mergers lead to the tree-like image (the
dendrogram), which, accompanied by the value of distance, for which the mergers
occur, provides very valuable information. Like in the case of k-means, a choice of
these algorithms is available from multiple sources. Yet, the applicability of these
algorithms is negatively affected by the fact that the entire distance matrix has to be
kept, searched through and updated.
It must be added here that the algorithms from the group differ as to the shape
of clusters they can detect or form, a clear difference separating, in particular,
single linkage from virtually all other algorithms. Namely, the single linkage has a
tendency towards the formation of chains of points (objects), of whatever shapes and
dimensions, while the remaining algorithms tend to form compact, usually spherical
groups.
The obvious parameters of this group of algorithms in terms of the elements of
vector Z are the above listed values of a1 , a2 , b and c. Thereby, no special distinction
is necessary of the particular algorithms. However, it must be added that in many
cases we allowed these coefficients to vary more freely than this is envisaged by the
Lance-Williams formula and the corresponding table of coefficient values (i.e. only
with some constraints on the values of these coefficients), implying, potentially, the,
as of now, non-existing algorithms.4
The density based algorithms—DBSCAN.
The local density-based algorithms form a much less compact and consistent group
than the two previously considered types of algorithms. A more systematic approach
to the density-based techniques was initiated by Raymond Tremolières (Tremolières

4 Actually,
the Lance-Williams parameterisation was extended later on in order to encompass yet
more of similar algorithms, but this is of no interest for the main purpose of this book.
1.3 The Elements of Vector Z: The Dimensions of the Search Space 9

1979, 1981), but then they were virtually forgotten for a long time, mainly in view
of computational issues. They gained again popularity when, on the one hand, the
requirement of single-passage analysis of data sets became important (even before
the time of data streams analysis), in view of the volumes of available data to consider,
and, on the other hand, the new kinds of density techniques, much more computa-
tionally effective than those from before, have been proposed (see, e.g., Yager and
Filev 1994, or, more recently, Rodriguez and Laio 2014). These algorithms, in prin-
ciple, analyse the interrelations, based on distances/proximities of a limited number
of objects. One of the most commonly used algorithms in this group is DBSCAN,
due in its most popular form mainly to Ester et al. (1996), although it is claimed that
already Ling (1972) proposed the algorithm that was very similar to DBSCAN.
In this algorithm, the objects (points in E x ) are classified into three categories: core
points (implying that they are “inside” clusters), density reachable points (which may
form the “border” or the “edges” of clusters), and outliers or noise points. This clas-
sification is based on an essentially heuristic procedure, which refers to two param-
eters (these two parameters being, therefore, also the elements of the vector Z in our
approach), namely: the radius ε, within which we look for the “closest neighbours”
of a given point, and the minimum number of points, required to classify a given
region in E x as “dense”, originally denoted minPts. Based on these two parameters the
procedure classifies the objects into the three categories mentioned, and afterwards
establishes the clusters on the basis of the notion of density connectedness.
The algorithm is popular due to its fast performance and also owing to its inde-
pendence of the shape of the clusters it identifies, or forms. On the other hand, it
definitely strongly depends upon the choice of the two parameters, and, although a
similar criticism is true for, say, k-means, and its parameter p, executing k-means
for a (short) series of p’s is not a problem and may circumvent the arbitrariness of
the choice of the value of p, while finding the right pair of ε and minPts is quite
challenging, in general.
The weighing or selection of the variables.
In the search for the partition possibly similar to the given PA , operations may also
be performed on the set of variables, accounted for. Thus, two alternative options
can be applied: (i) weighing of each of variables, preferably on the scale between 0
(not considered at all) and 1 (considered as in the original data set), (ii) the binary
choice of variables, i.e. either considered or dropped (corresponding to the choice of
weights from among 1 and 0).
It is definitely not typical for clustering to proceed explicitly with such operations
on variables. Usually, such operation is performed in the preprocessing phase, often
even without explicit consideration of clustering as a possible next phase. Yet, in the
framework of reverse clustering, in some cases, this appears to be justified, especially
as it may not be known where does the partition PA come from and what is its relation
to the characterization of X.
10 1 The Concept of Reverse Clustering

Distance definitions.
It is well known that some of the clustering procedures depend to an extent, some-
times considerably, on the distance definitions used. This is absolutely clear for
the k-means family of algorithms, where squared Euclidean distance is virtually a
“must”, for formal reasons, although in some variations of this algorithm this is
no longer a strict requirement. Some definite implementations of specific algorithms
(e.g. from the hierarchical aggregation family) also work differently, depending upon
the distance definitions. The most important aspect in this regard is connected with
the influence, exerted by the objects, located far away from the other ones, the impact
of the increasing dimensionality on the significance of distance, or the differences
in densities in various regions of E x . In view of this influence, it was assumed in
the exercises in reverse clustering, illustrated in this book, that a flexible distance
definition be adopted, namely the general Minkowski distance:

    h 1/ h
d xi , x j = xik − x jk ,
k

where for h = 1 we get the Manhattan (city-block) metric, and for h = 2 the Euclidean
metric. When h tends to infinity, the distance above approximates the Chebyshev
metric, according to which, simply,
 
d xi , x j = maxk (xik − xik ).

Again, like with the Lance-Williams parameters of the hierarchical aggregation


algorithms, we allow for arbitrary (non-negative) values of h, when trying to recon-
struct the way PA has been obtained. Thereby, non-classical distance definitions could
be ultimately used.
Summing up the set of parameters, constituting the vector Z, let us enumerate
them again:

1: the indicator of choice of the clustering algorithm (k-means, hierarchical


merger, or DBSCAN);
2 to 5: the parameters of the clustering algorithms (maximum of 4 numbers
for hierarchical merger algorithms);
6 to 6+m-1: the variables and their weights or binary indicators;
6+m: the exponent h.
1.4 The Criterion: Maximising the Similarity Between Partitions … 11

Table 1.2 Elements of


Numbers of pairs of objects Partition P1
calculation of the Rand index
of similarity between In the same In different
partitions cluster clusters
Partition P2 In the same a b
cluster
In different c d
clusters

1.4 The Criterion: Maximising the Similarity Between


Partitions P A and P B

The search, realised in the space, outlined in the previous section, is performed
with respect to the fundamental criterion of the difference / affinity between the
two partitions, i.e. partition PA , which is given, and P, which is produced by the
clustering procedure, defined by Z, that is, the partition P = Z(X). Ultimately, for
the assessment of the clustering results, the classical Rand index (see Rand 1971)
was selected.5 Rand index measures the similarity of two partitions, P1 and P2 , of
a set of objects, in the following, simplest and highly intuitive manner, based on
the categorisation of pairs of objects, which is illustrated in Table 1.2. Namely, we
consider two partitions, P1 and P2 , and check, for each pair of objects from X (or I)
whether they are in the same cluster or in the different clusters.
Of course, a + b + c + d = n(n−1)/2. We aim at a (objects in the same clusters
in both partitions) and d (objects in different clusters in both partitions) as high as
possible, with b and c being as small as possible, according to the formula

  a+d
Q P 1, P 2 = .
a+b+c+d

Thus, if the two partitions are identical, then Q(P1 ,P2 ) = 1, while Q(P1 ,P2 )
= 0 when they are “completely different” (actually, this occurs only in the sole,
very specific case, for the two partitions, of which one is constituted by a single,
all-embracing cluster, and the other one is composed of all objects being separate
singleton clusters).
In view of the probabilistic properties of this Rand index (its expected value for
two random partitions is not zero), often its adjusted version (see Hubert and Arabie,
1985), denoted Qa (.,.), is being used, accounting for the deviation of the mean from
the actual expected chance value. This adjusted Rand index is defined as:

  a − E x p(a)
Qa P 1, P 2 =
Max(a) − E x p(a)

5 Some more general remarks on this subject shall be forwarded in the next chapter, when discussing

the broader background of the entire approach.


12 1 The Concept of Reverse Clustering

where Exp(a) is the expected value of the index, while the introduction of Max(a)
ensures that the maximum value of the respective measure is equal 1. These two
values can be calculated for two partitions, of which one consists of p1 clusters,
having, respectively, n11 , n12 ,…,n1p1 elements (objects), while the other partition
is composed of p2 clusters, having, respectively, n21 , n22 ,…,n2p2 elements, in the
manner as follows:

 p1 n 1q  p2 n 2q
q=1 · q=1
2 2
E x p(a) =

n
2

and



1  p1 n 1q p2 n 2q
Max(a) = + .
2 q=1 2 q=1 2

Denœud and Guénoche (2006) suggested that for larger datasets, this kind of
adjustment increases the discriminatory power of the Rand index. Therefore, in some
of the cases reported in this book, we use it as the similarity measure between
partitions. Likewise, in some calculations, definite penalty terms were introduced
for constraining the values of the elements of Z if the possibility arose of their
uncontrolled increase. Generally, however, the original Rand index was kept to as
the main criterion of the search for PB and is virtually kept in all cases as the index of
quality of the solution, if not the actual optimisation criterion (in some cases boiling
down to simply the number of “wrongly classified” objects).

1.5 The Search Procedure

Although this book is not devoted to the analysis of numerical and computational
aspects of the reverse clustering approach—a definitely very important issue—in the
framework of the presentation of the gist and the interpretations of the paradigm, we
shall shortly characterise here the computational aspect, as well.
Thus, in view of the expected very cumbersome landscape and highly complex
choice conditions (“constraints”) it was decided to use the evolutionary algorithms
as the search tools. In actual experiments two kinds of evolutionary algorithms were
used (see also a slightly ampler description in Sect. 4.2 of the book). The first of
them was developed by one of the authors of this book (see Stańczak 2003) and is
characterised by the two-level adaptation, namely at the level of individuals, which
is standard for the evolutionary algorithms, and also at the level of operators, which
are used in a highly flexible manner with respect to different individuals, depending
1.5 The Search Procedure 13

upon the history of modifications, concerning the given individual. The second evolu-
tionary algorithm tried out in some of the experiments was the differential evolution
method (see Storn and Price 1997), the version from the R package (Mullen et al.
2011; R Core Team 2014) being used.
In both these versions of the evolutionary algorithms the individuals are coded
in a relatively straightforward way according to the parameters of the vector Z,
characterised before. The certainty of reaching the proper solution was not always
satisfactory, this fact being appropriately noted and commented upon in the reports
from the particular experiments, contained in the successive chapters of the book.

References

Adolfsson, A., Ackerman, M., Brownstein, N.C.: To cluster, or not to cluster: an analysis
of clusterability methods. Pattern Recognit. 88, 13-26 (2019)
Ball, G., Hall, D.: ISODATA, a novel method of data analysis and pattern classification. Technical
report NTIS AD 699616. Stanford Research Institute, Stanford, CA (1965)
Banks, D., McMorris, F., Arabie, Ph., Gaul, W. (eds.): Classification, clustering, and data mining
applications. In: Proceedings of the Meeting of the International Federation of Classification
Societies (IFCS’2004). Springer, Berlin (2004)
Banks, D., House, L., McMorris, F., Arabie, Ph., Gaul, W.: Classification Clustering and Data
Mining Applications. Springer, Berlin (2011)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New
York (1981)
Bezdek, J.C., Keller, J., Krisnapuram, R., Pal, N.R.: Fuzzy Models and Algorithms for Pattern
Recognition and Image Processing. The Handbooks of Fuzzy Sets, vol. 4. Springer Verlag (1999)
Bramer, M.: Principles of Data Mining. Springer, New York (2007)
de Falguerolles, A.: Classification automatique: un critère et des algorithms d’échange. In: Diday, E.,
Lechevallier, Y. (eds.) Classification automatique et perception par ordinateur. IRIA, Le Chesnay
(1977)
Denœud, L., Guénoche, A.: Comparison of Distance Indices between Partitions. In: Batagelj,
V., Bock, H.H., Ferligoj, A., Žiberna, A. (eds.) Data Science and Classification Studies in
Classification, Data Analysis, and Knowledge Organization, pp. 21–28. Springer, Berlin (2006)
Dunn, J.: Well separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Dvoenko, S.: Meanless k-means as k-meanless clustering with the bi-partial approach. In: Proceed-
ings of the 12th International Conference on Pattern Recognition and Image Processing,
pp. 50–54. Minsk, Belarus, UIIP NASB, 28–30 May 2014
Ester, M., Kriegel H.-P., Sander J. and Xu X.-w. (1996) A density-based algorithm for discovering
clusters in large spatial databases with noise. In: E. Simondis, J. Han and U. M. Fayyad., eds., Proc.
of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96).
AAAI Press, 226–231.
Figueiredo, M.A.T., Leitão, J.M.N., Jain, A.K.: On fitting mixture models. In: Hancock, E.R.,
Pelillo, M. (eds.) Energy Minimization Methods in Computer Vision and Pattern Recognition
(EMMCVPR 1999). Lecture Notes in Computer Science, vol. 1654. Springer, Berlin, Heidelberg
(1999)
Florek, K., Łukaszewicz, J., Perkal, J., Steinhaus, H., Zubrzycki, S.: Taksonomia Wrocławska (The
Wrocław Taxonomy; in Polish). Przegl˛ad Antropologiczny, 17(1956)
Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications.
In: Biometric Society Meeting, Riverside, California (1965). Abstract in Biometrics (1965) 21,
768
14 1 The Concept of Reverse Clustering

Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory
and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)
Hayashi, Ch., Yajima, K., Bock, H.H., Ohsumi, N., Tanaka, Y., Baba, Y. (eds.): Data Science,
Classification, and Related Methods. Springer (1996)
Hubert, L., Arabie, Ph.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Klopotek, M.A.: An aposteriorical clusterability criterion for k-Means ++ and simplicity of
clustering. SN Comput. Sci. 1(2), 80 (2020)
Kohonen, T.: Self-Organizing Maps. Springer, Berlin-Heidelberg (2001)
Lance, G.N., Williams, W.T.: A Generalized Sorting Strategy for Computer Classifications. Nature
212, 218 (1966)
Lance, G.N., Williams, W.T.: A general theory of classification sorting strategies: 1 hierarchical
systems. Comput. J. 9, 373–380 (1967)
Lindsten, F., Ohlsson, H., Ljung, L.: Just Relax and Come Clustering! A Convexification of k-
Means Clustering. Technical Report, Automatic Control, Linköping University, LiTH-ISY-R-
2992 (2011)
Ling, R.F.: On the theory and construction of k-clusters. Comput. J. 15(4), 326–332 (1972). https://
doi.org/10.1093/comjnl/15.4.326
Lloyd, S.P.: Least squares quantization in PCM. In: Bell Telephone Labs Memorandum. Murray
Hill, NJ (1957); reprinted in IEEE Trans. Information Theory, IT-28 (1982), 2, 129–137
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: LeCam,
L.M., Neyman, J. (eds.) Proceedings of the 5th Berkeley Symposium Mathematics Statistics
Probability, 1965/66, pp. 281–297. University of California Press, Berkeley, I (1967)
Mirkin, B.: Mathematical Classification and Clustering. Springer, Berlin (1996)
Mullen, K.M., Ardia, D., Gil, D.L., Windover, D., Cline, J.: DEoptim: An R Package for Global
Optimization by Differential Evolution. J. Stat. Softw. 6, 1–26 (2011). https://www.jstatsoft.org/
v40/i06/.
Owsiński, J.W.: Data Analysis in Bi-Partial Perspective: Clustering and Beyond. Springer Verlag
(2020)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria (2014). https://www.R-project.org/.
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66
(336), 846–850 (1971)
Ríos, S.A., Velásquez, J.D.: Finding representative web pages based on a SOM and a reverse cluster
analysis. Int. J. Artif. Intell. Tools 20(1), 93–118 (2011)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 322, 1492
(2014)
Simovici, D.A., Hua K.X.: Data ultrametricity and clusterability. CoRR abs/1908.10833 (2019)
Stańczak, J.: Biologically inspired methods for control of evolutionary algorithms. Control Cybern.
32(2), 411–433 (2003)
Steinhaus, H.: Sur la division des corps matériels en parties. Bulletin de l’Academie Polonaise des
Sciences IV (C1.III), 801–804 (1956)
Storn, R., Price, K.: Differential Evolution—a Simple and Efficient Heuristic for Global Optimiza-
tion over Continuous Spaces. J. Global Optim. 11(4), 341–359 (1997)
Tremolières, R.: The percolation method for an efficient grouping of data. Pattern Recognit.
11(1979)
Tremolières, R.: Introduction aux fonctions de densité d‘inertie, p. 234. Université Aix-Marseille,
WP, IAE (1981)
Wierzchoń, S.ł., Kłopotek, M.: Modern Algorithms of Cluster Analysis. Studies in Big Data, vol.
34. Springer (2018)
Yager, R.R., Filev, D.P.: Approximate clustering via the mountain method. IEEE Trans. Syst. Man
Cybern. 24, 1279–1284 (1994)
Chapter 2
Reverse Clustering—The Essence
and The Interpretations

2.1 The Background and the Broad Context

In general, the very essence of clustering is to group a set of data into subsets (groups)
such that the elements assigned to the particular clusters are more similar to each other
than the elements assigned to different clusters. This simple, maybe even primitive
definition is very powerful as it reflects an extremely wide class of problems we face
in both everyday life and in virtually all of the more sophisticated acts we undertake.
Cluster analysis is the field of knowledge, at the crossroads of applied mathematics
and computer science, that is concerned with such a class of problems.
The most straightforward and basic justification for carrying out cluster analysis
of a data set is to gain insight into the (“geometrical” or “model-wise”) structure
of such data set, primarily in terms of the possibility of dividing it into plausible
subsets (including, potentially, singletons), or in terms of the very existence of such
a division. If this is so, i.e. the division is sound and the subsets are well conditioned,
one may go further in inferring the nature and very meaning of subsets, their origins,
and mechanisms of appearance (“models”, “processes”,…); see, e.g., Kaufman and
Rousseeuw (1990, 37–50), Gan et al. (2007, 6–10), or Xu and Wunsch (2009, 263
ff.).
The essentially primeval character of the task of clustering (“putting together the
similar and distinguishing the dissimilar”) does clearly represent, as we have already
mentioned, a multitude of various tasks and acts. It can be illustrated by the fact
that it directly corresponds to the way in which human language has developed in
various populations and cultures (with clusters corresponding to notions, words, and
expressions). In this context—related to an eternal human quest for attaining some
best solution or choosing a best option, or at least good enough—it is clear that there
is an optimization aspect to the clustering task (the effectiveness and efficiency of
the division obtained, its “veracity” put apart, and then the capacity and suitability
in practical use).
These obvious reasons have caused that, definitely, cluster analysis has become for
decades one of the most rudimentary data analysis techniques, but, at the same time,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 15


J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_2
16 2 Reverse Clustering—The Essence and The Interpretations

it is one of the most powerful tools of data analysis; see, for instance, Ducimetière
(1970), Kaufman and Rousseeuw (1990), Arabie et al. (1996), Mirkin (1996), Bramer
(2007), Böhm et al. (2006), Miyamoto et al. (2008). It is most often used when no
extra information is available and, thus, there is no clue as to the structure of the data
set at hand.
In order to actually execute a cluster analysis exercise one has to make a number of
choices. First of all, the data set must be characterized in terms of attributes and their
values for particular data points1 which may be task dependent. Then, a clustering
technique, some measures of distance/similarity, and other parameters, either related
to a chosen clustering technique, or of more general character, have to be assumed.
The choice of parameters may be guided by the experience of a user, availability of
data (values of the attributes to be used), some available metadata, or it may be based
on the results of some preliminary data analysis; see, e.g., Torra et al. (2011).
Having made these initial choices, one can run the selected clustering algorithm
and obtain some groups of data points, i.e., a partition of the data set under consider-
ation. Usually, such an exercise is repeated several times for different configurations
of the above mentioned parameters in order to find an “optimal”/”correct” partition.
The whole process may be seen as a kind of a transformation, which turns a data set
of individual data points into groups of such data points.
In this book we consider the problem, which, in one of potential interpretations,
may be treated as a kind of a reverse engineering related one that is applied to the
(hypothetical) results of the previously described (potential) clustering process.
As we have already mentioned in Chap. 1, the very essence of reverse engineering,
sometime also termed back engineering, boils down to a procedure that aims at finding
out for some object or process what has been an underlying design, architecture or
implementation procedures or processes that have resulted in the formation of the
object in question. To say in different words, reverse engineering is about the deduc-
tion of design characteristic features and goals for the object in question without a
deep or additional knowledge about which procedures and/or processes have implied
those features, see Eilam (2005), Chikofsky and Cross (1990), Raja and Fernandes
(2008).
As a side remark, let us notice that some work, proposing the use of clustering
for solving various reverse engineering problems has already been done, see, for
instance, Govin et al. (2016), Kuhn et al. (2005), Raffo (2019), Raffo et al. (2020),
Laube (2020), Quigley et al. (2000), Shim et al. (2020), or Travkin et al. (2011). Yet,
in spite of the very general similarity of domains, involving both reverse engineering
and clustering, these studies are, in fact, oriented at different, quite specific objectives
than what we propose in the present volume.
The general philosophy of reverse engineering is in a certain manner employed for
the clustering problem considered in this book. Namely, we assume that a partition

1 Itis possible to start with a data similarity/distance matrix, if available, without an explicit char-
acterization of the data in terms of some attribute values, and such a setting also seems to provide a
reasonable context for the paradigm proposed in this paper, but we will leave this case for a possible
further study.
2.1 The Background and the Broad Context 17

of the data set is given and we want to discover parameters of the process (transfor-
mation) that (may) have resulted in the given partition. Ultimately, though, we may
have to settle—which is usually the case for many analyses of complex problems—
for an approximation, both concerning the shape of the division and the parameters
of the clustering procedure (as not being able to obtain exactly the division, given
together with the data set).2
It is very common to consider the data sets that are divided into subsets (clusters,
classes, groups,…) in a definite manner, which is more or less “certain” or “justified”,
and to attempt the reconstruction of the given division using some “other” data than
just the respective group labels. This is most often done in order to validate or check
the quality of the classification, clustering, machine learning, etc. schemes.
More advanced purposes in this respect may involve model building and checking,
as well as—quite to the contrary—a verification of the original, prior division of the
data set. There may also be other objectives, some of them quite classical, like the
detection of outliers, and also very specific ones, like the assessment of adequacy of
the classification data to the labels or forming some descriptions of known groups
of data in terms of the values of some of the attributes characterizing them.
It is easy to notice that the results of the above reverse engineering type clustering
process can provide the analyst and user with a very useful information. In general,
for the information obtained from a data analysis study to be implementable and
usable in practice, that is, also for novice users, domain specialists, very often with
a limited command of mathematics, numerical algorithms, data analysis, clustering,
etc., who are now presumably the largest target group of users in virtually all real
world applications, some obvious requirements and limitations should be followed.
These requirements and limitations concern both the procedures and ways of
attaining results and reaching conclusions, and the form of providing data and infor-
mation and obtaining the results. Basically, these should observe limited cognitive
capabilities of the human being (see Rashidi et al. 2011; Perera et al. 2014; Tsai et al.
2014; Tervonen et al. 2014—these references concerning the topic mentioned in the
very relevant and modern context of the Internet of Things).
Basically, for our purposes, the accounting for the limited and specific human
cognitive capabilities boils down to the necessity of involving:
• a broadly perceived granulation of data and information (see Bargiela and Pedrycz
2002; Pedrycz 2013a, b; Pedrycz et al. 2008; Lin et al. 2002);
• a summarisation, of both numerical and linguistic character (see Kacprzyk et al.
2000; Kacprzyk and Yager 2001; Kacprzyk and Zadrożny 2005, 2009, 2010, 2016;
Kacprzyk et al. 2008, 2010; Reiter and Dale 2000; Reiter et al. 2006, as well as
Sripada et al. 2003; Yu et al. 2007).
In this context the problem of comprehensiveness of data analysis, data mining,
machine learning, etc. results (patterns) had been known for some time, and it had

2 Actually, a perfect reconstruction of the given partition P


A may be quite possible (as also illustrated
by some of the cases in this book), in the sense that PB = PA , but this is by no means equivalent to
finding the rationale behind this initial partition.
18 2 Reverse Clustering—The Essence and The Interpretations

been presumably Michalski who already in the early 1980s devised the so called
postulate of comprehensibility whose essence can be summarized as (Michalski
1983): “…The results of computer induction should be symbolic descriptions of
given entities, semantically and structurally similar to those a human expert might
produce observing the same entities. Components of these descriptions should be
comprehensible as single ‘chunks’ of information, directly interpretable in natural
language, and should relate quantitative and qualitative concepts in an integrated fash-
ion…”. Michalski’s vision has had a great impact on machine learning, data mining,
data analysis, etc. research, and has been further developed by many authors, see,
e.g., Craven and Shavlik (1995), Zhou (2005), Pryke and Beale (2004), Fisch et al.
(2011), to name just a few. A recent study on the comprehensiveness of linguistic
summaries by Kacprzyk and Zadrożny (2013) combines many ideas from those
works, and recasts them in the context of a very natural representation of results via
natural language.
Most of the above mentioned works on the comprehensiveness of data anal-
ysis/mining results emphasize as the main reasons for the importance of comprehen-
sibility (see, for instance, Kacprzyk and Zadrożny 2013), in particular, the following
ones:
1. To be confident in the performance and usefulness of the algorithms, and to be
willing to use them, the users have to understand how the result is obtained and
what it means,
2. The results obtained should be novel and unexpected, in one sense or another,
and these results can only be accessible to the human if they are understandable,
3. Usually, the results obtained may imply some action to be taken, and hence their
comprehensiveness is clearly crucial,
4. The results obtained may provide much insight into a potential better feature
representation which, to be meaningful, should be comprehensible, and
5. The results obtained can be employed for refining knowledge about a domain
or field in question, and the more comprehensible, the better.
The postulate of comprehensibility is recently strongly emphasized in the frame-
work of the tools and techniques of artificial intelligence. Namely, there is much
evidence that for a wide use of powerful artificial intelligence tools and techniques,
the essence of which is that they provide solutions to complex problems, their use and
the use of the solutions obtained is essentially subject to human acceptance. However,
for the human beings the goodness of results obtained is, in itself, as a rule not enough
and they want to understand how and why these results have been obtained, and why
they could be trusted, and then accepted and implemented. Unfortunately, most of
powerful artificial intelligence tools and techniques do not have such capabilities. For
instance, deep neural networks, almost all data analysis and data mining, machine
learning, etc. algorithms do show very good and increasing performance, but are
generally not “transparent” to the analysts or users with respect to how and why they
yield results.
This dangerous phenomenon can have a detrimental effect on the use of those
powerful, effective and efficient tools and techniques, and hence on proliferation of
2.1 The Background and the Broad Context 19

artificial intelligence. As a way out many ideas have been proposed, and among them
the recently highly advocated concept of the so-called “Explainable AI” has enjoyed
an extremely high popularity.
The motivation behind the Explainable AI (XAI) is simple, it concerns methods
and techniques to be used in the broadly perceived area of artificial intelligence
technology, which would yield results that can be understood by the human being.
Notice that it deeply departs from the concept of the “black box” in, for instance,
machine learning, where even the developers and analysts cannot really explain why
the particular tool and techniques arrived at a specific decision. Needless to say that
a lack of such knowledge can prohibit people from applying such a tool.
The concept of XAI has been quickly considered of utmost importance, and—first
of all—some manifesto type statements have been issued, for instance by military
agencies and institutions, as exemplified by DARPA (Defense Advanced Research
Projects Agency) who has explicitly stated that „… Explainable AI—especially
explainable machine learning—will be essential if future warfighters are to under-
stand, appropriately trust, and effectively manage an emerging generation of artifi-
cially intelligent machine partners…” (Gunning and Aha 2019). Many non-military
think tanks and policy making institutions have issued similar statements.
These statements have been strongly supported by an extremely active and inten-
sive research effort all over the world, at the majority of research, R&D and scholarly
institutions, and among hundreds of relevant publications one can quote the papers
and books, including some state of the art and position papers, by: Adadi and Berrada
(2018), Doran et al. (2017), Hall and Gill (2018), Miller (2017), Molnar (2020),
Murdoch et al. (2019), Ribeiro et al. (2016), Rudin (2019), Zhang and Chen (2018),
Biecek and Burzykowski (2020), to just quote a few. One of the studies, devoted
to the connection between interpretability and summarization of data and results of
analysis is that by Lesot et al. (2016).
Essentially, virtually all authors mentioned above stress that the results obtained
using modern AI tools should be explainable/interpretable/transparent what goes
hand in hand with the notion of comprehensibility. Thus, our approach may be seen
as belonging to this line of reasoning even if “Explainable AI” is mostly focused on
and concerned more explicitly with a black box type, notably deep learning kind of
methodology. (Actually, one of the interpretations, of which we speak in Sect. 2.3,
further on in this chapter, is exactly that the given partition PA is a kind of black
box product, of which we know very little. This is also why the reverse engineering
interpretation is quite to the point here.)
It is, however, worth to clearly state that the AI community from the very begin-
ning, in the framework of the traditional “symbolic AI”, inherently much more
susceptible of such comprehensibility, was deeply concerned with providing the users
of AI tools and techniques with some means to convince them as to the correctness
of provided results, securing their high interpretability and transparent provenience.
This may be exemplified with the concept and actual implementations of an explana-
tion module in an expert system (see, for instance, Castillo and Alvarez 1991; Baker
and O’Conor 1989; Berka et al. 2004, as well as, e.g. http://www.esbuilder/com/).
20 2 Reverse Clustering—The Essence and The Interpretations

It is quite clear, but will be also be further seen in greater detail, that the reverse
engineering type approach to clustering can be viewed to be following at the concep-
tual level the above mentioned philosophy of attaining comprehensiveness. First,
the clustering process itself implies an increase of comprehensiveness of data as
it produces per se representations of results that are closer to human perception.
Then, which is the essence of the approach proposed in this volume, we go further
and try to find those parameters of the clustering algorithm (potentially) employed
that have led to the results obtained. That is, an extremely important additional
knowledge is derived via our approach about the algorithms, parameters, types of
distance/similarity functions, etc. This all is clearly useful and can greatly help a
potential user in understanding (comprehending) the intrinsic relations between many
aspects of the data set analysed and, more generally, the respective analysis process.
As a result, the possibility of acceptance, and hence implementation, of the result
obtained can be greatly enhanced.
Notwithstanding the enhanced comprehension of the data themselves and the
essence of the process, the approach can be useful for quite pragmatic purposes in
a variety of manners. This fact is amply illustrated in the successive chapters of the
book, in which a variety of examples is treated for diverse data sets and concrete
problem formulations. It is exactly with respect to the latter, i.e. the various problem
formulations, that the reverse clustering paradigm offers quite broad possibilities.
This is closely associated with the interpretation of the generic problem and hence
of the results obtained. The next section is devoted exactly to an ampler discussion
of this issue.

2.2 Some More Specific Related Work

The idea of a reverse engineering approach relative to clustering as presented in


Chap. 1 of the book is obviously novel. However, the procedure, formally presented
here, may be seen as referring to many facets which are individually addressed in
the literature by multiple authors in various contexts and settings. We shall start with
some approaches or problem formulations, which reveal apparently similar overall
pattern to the here considered reverse clustering. Then, we shall pass to the more
specific questions, which are somehow addressed in the reverse clustering paradigm,
but very often appear as separate issues, important for the data analysis procedures
and hence treated through appropriate techniques.
Among the few significant references, which can be provided for the approach
here presented as proposing a similar overall perspective, we should indicate the LCE
(Learning from Cluster Examples) of Kamishima and associates (see Kamishima
et al. 1995, and, first of all, Kamishima and Motoyoshi 2003). The LCE approach
starts with a number, M, of initial partitions, PA1 , . . . , PA M , of the object sets
X 1 ,…,X M , and, having these partitions of sets of “analogous” objects (i.e. objects,
situated in the same kind of space E X ), proposes an approach to derive a procedure for
partitioning other sets of “analogous” object, Y. Thus, the obvious similarity with the
2.2 Some More Specific Related Work 21

reverse clustering paradigm lies in having some initial partition (in LCE: a number of
partitions) and intending to use the knowledge therefrom to partition some other data
sets3 . Similarly as reverse clustering, this problem and approach cannot be considered
to constitute, nor provide, a classifier in the standard sense. Yet, there are essential
differences with respect to the reverse clustering paradigm, namely: (1) the assump-
tion of M > 1 is fundamental; (2) the partitions PA1 , . . . , PA M , are all considered
to be “objectively certain”; (3) the procedure does, in fact, not involve clustering as
such: based primarily on probabilistic precepts, related to co-appearance of objects of
definite locations in space in the same clusters, it assigns the new objects to the same
or different groups. The procedure itself is quite complicated and, even though refer-
ring to the probabilistic framework, involves quite important arbitrary assumptions,
meant largely to tone down the overall complexity of the procedure.
There is also another domain, which has been very dynamically explored and
developed over the last two decades, in a certain manner associated with the LCE, but
of a definitely broader character, namely that of “domain adaptation” and “transfer
learning”. In short, the issue is in devising the principles and methods of using
knowledge—in the form of rules, clusters, models etc.—acquired on the basis of a
certain data set in the situation, when one deals with data, whose characteristics have
either somehow changed or we know they are to an extent different. This concerns
primarily the suspicion (“hypothesis” or “assumption”) that the distribution, behind
the data analysed is different (but not “totally different”) from the one, on which
our knowledge is based. While the problem is obvious and very general, indeed, its
actual significance stems from some very specific application areas (again, as in the
case of the LCE—image processing, but also, to a large extent—document analysis
and information retrieval, associated with sentiment analysis). A very good survey
on this domain is provided by Kouw and Loog (2019) (an evidence that research in
this field has indeed quite some history is also provided by the valuable surveys of
Pan and Yang 2010, or Gopalan et al. 2012).
The essential difference with respect to the reverse clustering paradigm consists
in the fact that the investigation domains mentioned above stem from a definite
problem, which is being solved in a variety of manners, depending upon the concrete
circumstances (e.g. assumptions as to the probabilistic or statistical nature of the
problem). In this context, very popular are the kernel-based methodologies, but they
also do not provide sufficient coverage for many of the more specialised problems in
domain adaptation. Further, also similarly as in the LCE, in the language of clustering,
it is assumed that: (a) PA is definitely based on the respective X, and (b) that it is
“correct” (or we dispose of a concrete measure of its “correctness”). Under these
assumptions we wish to obtain the new partition, PB , that would be also (similarly)
“correct” for a data set Y that is somewhat different from X. Thus, obviously, the
“reverse clustering” might find some application in the areas of domain adaptation
and transfer learning, under certain (mild) conditions and for a definite class of

3A typical, and, indeed, very adequate application consists in the pattern segmentation partitions
PA1 , . . . , PA M , being provided by humans, and automatic generation of pattern segmentations for
other patterns.
22 2 Reverse Clustering—The Essence and The Interpretations

problems, but, in fact, we deal here not only with a different kind of general task, but
also a different methodological perspective.
Finally, as characterised here, the latter research area borders upon the one of
“incremental” or “adaptive” clustering or learning, which, in the case of clustering,
refers to a situation, in which a partition PA , obtained on the basis of some data set
X = {x 1 ,…,x n }, has to be adapted to a change in this data set, resulting in some X’.
The difference with respect to domain adaptation/transfer learning is that we do not
assume any essential change in the “nature” of X, when it turns into X’, but rather a
kind of “parametric shift”, or, as this is often referred to, a “drift”. A natural choice
for the techniques applied in this case is the k-means family of clustering algorithms,
see, e.g. the seminal paper of Pedrycz and Waletzky (1997) (although on a more
general level one might wish to consult some of the earlier works, e.g. the one of
Fisher 1987), this approach being still currently applied in the conditions of massive
data flows, like, e.g., in Casalino et al. (2019).
The closeness of incremental or adaptive clustering to domain adaptation and
transfer learning is perhaps best expressed through the manner, in which X’ may
differ from X. In the simplest case, the difference boils down to addition of some
new observations, so that, in fact, X’ is just a superset of X, X X  , ultimately—just an
addition of a single observation (x n+1 ). This, indeed, is the “classification” situation,
for which k-means methodology appears to be very well fitted: starting from partition
PA for X we run the procedure for X’ and find easily the solution assigning x n+1 to
a cluster with a closest/most similar centroid (if not simply assigning x n+1 to it).
In many cases, though, X’ simply “preserves the character” of X, but, generally,
contains other observations (e.g. X∩X’ = ∅). Here, simply starting from PA may not
be sufficient, even though k-means are still applicable. Going farther towards domain
adaptation and transfer learning we encounter the cases, in which, e.g., the very
description of observations (set of variables/attributes) changes. Such divergences
may go in various directions. The examples of studies, devoted to such diverse cases
of adaptive clustering are Bagherjeiran et al. (2005); Câmpan and Şerban (2006);
Ntoutsi et al. (2009); Rokach et al. (2009); or Shi et al. (2018).
Yet another domain, worth noting, which has a definite technical association with
the reverse clustering paradigm, and, indeed, with the domains mentioned before, is
learning with partial supervision, to an important degree linked, in particular, with
incremental learning and hence also with some of the problem formulations and
methodologies referred to above (see, e.g., Bouchachia and Pedrycz 2006).
With respect to the areas commented upon here, it can be stated that although
reverse clustering may clearly be used as a tool in this kind of problems, and that in
a variety of manners, its formulation and procedure is fundamentally different.
Turning now to the more specific questions, let us note that the choice of the
attributes has been thoroughly studied, in particular, in the context of classification
(Hastie et al. 2009), but also in the area of more broadly meant data analysis. Many
different approaches have been proposed, which are applicable for our purposes.
Some of them take into account the information on classes, to which elements of
X are assigned, some not. In our case, both modes are possible, as we start with a
2.2 Some More Specific Related Work 23

partition, which may be interpreted as the classification. The choice of an appro-


priate family of techniques may be based on the aspects discussed at length later
on in this chapter. Namely, if the partition PA is to be seen as the valid one, then
taking into account the information on class assignments is more justified than in the
other cases. In our experiments, reported in the consecutive chapters, we associate
weights with the attributes and the values of the weights are optimized during the
evolutionary procedure. This may effectively lead to completely ignoring some of
the attributes, characterizing the data set X. Notice that the choice of attributes has
also been discussed in the literature on the comprehensibility of data analysis, data
mining, machine learning, etc. results, and Craven and Shavlik (1995) may be here
a good source of information.
Another important decision concerns the choice of the distances/similarities from
among the plethora of those proposed in the literature (Cross and Sudkamp 2002).
This choice has, of course, to take into account the scale, with which a given attribute
is endowed, i.e., nominal, ordinal, interval or ratio. For the latter type of attributes
it may be convenient to assume a parametrized family of distances, e.g., Minkowski
distance, what makes simpler the representation for the purposes of an evolutionary
optimization, and what is actually done in virtually all of the experiments, reported
in the book. One can even go further, using a fuller survey of (binary, in this case)
similarity/dissimilarity measures, like the one presented in Choi et al. (2010), where
those measures are categorised into classes, and a similar reverse type analysis is
performed. This will not be, however, considered in this book and will be left as a
potential direction of further studies.
The essence of the problem of the reverse clustering as meant in this book is
the formulation and solution of the optimization problem, described in Chap. 1.
Its important component is the performance criterion, denoted Q, which is identified
here with a measure of the fit between two partitions. In particular, we shall, as a rule,
interpret Q in such a way that it should measure how well the partition PB , produced
using Z * , matches the originally given partition PA . We have already indicated that in
our experiments we refer, in general, to the Rand index. Let us, however, mention that
such measures belong to a broader, and very deeply discussed family of the cluster
validity measures, which are meant to evaluate the quality of the partition produced
by a clustering algorithm; see, e.g. Wagner and Wagner (2006); Desgraupes (2013);
Vendramin et al. (2010); Rendón et al. (2011); Halkidi et al. (2001); or Arbelaitz
et al. (2013).
According to Brun et al. (2007) three broad classes of such measures may be
distinguished, which not necessarily refer to a golden standard (or otherwise a refer-
ence) partition, in our case denoted as PA . The first class comprises internal measures,
which are based on the postulated properties of the clusters produced (such as, for
instance, the classical Calinski-Harabasz index, see Calinski and Harabasz 1974; for
a more general treatment see Liu et al. 2010). This class is often treated as the one
of the “proper” cluster validity measures, and quite a lot of attention is devoted to
this class in the literature, based on various prerequisites (see, e.g., Zhao et al. 2009;
24 2 Reverse Clustering—The Essence and The Interpretations

Zhao and Fränti 2014; Xie and Beni 1991; Van Craenendonck and Blockeel 2015;
or Meila 2005).4
The second class of the relative measures “is based on comparisons of partitions
generated by the same algorithm with different parameters or different subsets of the
data” (Brun et al. 2007).
And finally, the third class comprises the external measures referring to the
comparison of partitions produced by the clustering algorithms and a partition known
a priori, usually assumed to be a valid one. As our primary goal is the reconstruction
of a cluster operator, which could have produced a given partition PA for a given
data set X, then we are first of all interested in the usage of the external validity
measures. However, it should be stressed that in different scenarios, discussed later
on in this chapter, also other types of measures could be of use. In particular, if our
belief in the validity of a given partition PA is not full, then we can define the quality
criterion as a combination of an external and internal one, for instance, favoring PB
which provides a balance between the matching of PA and having a high quality
(e.g., internal consistency) in terms of one or more internal measures.
Another parameter of the clustering procedure whose choice attracted a lot of
attention in the literature is the fixed number of clusters assumed, e.g., for the k-
means family of clustering algorithms (actually, Bock 1994, proposed the issue of
the number of clusters as one of the essential ones to be resolved in the domain). The
choice of the value for this parameter has evidently a far reaching influence on the
obtained partition while it may seem rather arbitrary. Thus, a number of approaches
has been proposed which usually base on the earlier mentioned validity measures.
Namely, the optimal number of clusters is recognized as the number for which a
given validity measure attains its extremum or satisfies some specific formula (see
Milligan and Cooper 1985; Libert 1986; Sugar and James 2003; Wagner et al. 2005,
or, for the more recent publications, Charrad et al. 2014, and Patil and Baidari 2019).5
In our general approach, the actual number of clusters present in the partition PA is
a natural candidate for the value of this parameter. However, such a choice may be
questioned when taking into account the various assumptions as to PA , or considering
the reverse engineering of PA to obtain Z * as a first step towards partitioning other,
possibly much larger, datasets using Z * .6

4 Itshould be noted that the so-called bi-partial approach, developed by one of the present authors
(see, e.g., Owsiński 2020) allows for obtaining a natural solution to the clustering problem in general,
without the need of referring to any (additional) clustering quality measure. It also provides the
answer to the problem of the cluster number, discussed further on, without solving it explicitly,
simply as a part of the global clustering solution.
5 There is, of course, the other side to the cluster number issue, namely that of the basic question

“what is a cluster?”. If we knew the answer to this question, we would not have at all to determine
the cluster number, see, e.g. Davies and Bouldin (1979), Chiu (1994) or Hennig (2015). For the
number of partitions of a set, see Rota (1964).
6 Here, another related problem arises, namely that of preservation of validity and stability of the

structure, obtained for a smaller set X, including the number of clusters, when applied to a much
bigger one. This issue has been perceived a long ago, as witnessed, e.g., by the work of Guadagnoli
and Velicer (1988).
2.2 Some More Specific Related Work 25

An example of a software package which combines all the above mentioned


main aspects and, thus, is very relevant for our idea of the reverse engineering type
clustering, is the NbClust package (Charrad et al. 2014) available on the R platform.
It is primarily oriented at supporting the choice of the number of clusters. However,
this package actually implements a number of:
• clustering algorithms,
• cluster validity indexes (measures), and
• distance measures (dissimilarity measures)

and makes it possible to use them in various configurations, together with a varying
number of clusters, where appropriate, to search for the “best” one for a given data
set. The configuration is pointed out as the best when the majority of the validity
measures involved confirm its superiority. Thus, the NbClust package may be seen as
a valuable and extremely relevant tool to carry out the endeavor laid out in this paper.
However, our proposal provides a broader framework for the emerging type of data
analysis and makes it possible to envision some interesting directions for the further
research. Moreover, it adds to the analysis an important aspect of comprehensibility
of results obtained.

2.3 The Interpretations

As this was already mentioned in the preceding section, an essential question,


which arises in connection with the new perspective or problem formulation, here
forwarded, is its interpretation, closely linked with the potential use of its results.
We shall show now that there exist quite a variety of ways, in which this formulation
can be treated and used.
Figure 2.1 puts together in a visual manner the essential components of the

The data set The prior partition


analysed X of X, i.e. PA
The criterion
Q(PA,PB):
similarity of
The clustering the two
algorithms and the The resulting partitions
Z
data processing partition of X: PB
parameters: Ω ={Zi}
STOP
Yes

The search (optimisation) procedure: No


maximising the Q(PA,PB)
Z*

Fig. 2.1 The scheme of the reverse clustering problem formulation


26 2 Reverse Clustering—The Essence and The Interpretations

paradigm and their interrelations. At this point of our considerations we would like to
indicate two aspects, which are of importance for the content of the present section:
1. The initial data, X and PA are not anyhow connected in the diagram: this empha-
sizes the possible a priori lack of knowledge of any association between the
two;
2. The ultimate result, Z* , depends upon the definition of the search space,  (with
particular role of the definite clustering algorithms) and the characteristics of
the optimization procedure.

Relation to classification
Thus, in a way, the formulation forwarded reminds of identifying the best classifier,
expressed in this case through Z * . Definitely, the setting described may be perceived
as typical for the supervised learning in that some known a priori grouping PA
constitutes the starting point (known labels of particular objects). By identifying Z *
we seem to obtain the best possible, in the class defined by , tool for classifying
instances belonging to X and, possibly, also other instances later on. Still, the differ-
ences with the standard scheme of obtaining and using classifiers are quite evident,
as commented upon below.
Thus, first, for quite obvious reasons, we are not interested here in devising a
scheme for classifying the further sequentially incoming data points, even though
Z * may certainly be interpreted as the essential part of some classification scheme.
Not being, in principle, meant for this purpose it can, of course, be, ultimately, used
with this aim. Under this kind of circumstances, classification could be carried out

for the subsequent observations x i , i = n + 1,…, in such a manner that partition PB
= Z * (X ∪ {x i }) would be computed, resulting in the placement of x i in one of the

clusters of PB ’ (but not necessarily identical with those of PB = Z * (X)).
Further, it must be emphasized that the partition PB , generated by Z * , would be,
in general, different from PA . This difference does not apply just to the content of
clusters (“classification errors”), but may also concern the essential features of the
partitions: first, the number of clusters, second, the potential indication of outliers.
Regarding the functioning of the paradigm in the classification mode, the above
two distinctions lead to the conclusion that it is, in fact, not meant for it. This will be
especially true when we consider the batch classification, i.e., when {x i } is replaced
with a whole new set of data points to be classified simultaneously. Yet, such a classi-
fication procedure would be usually prohibitively expensive. The excessive cost may
be avoided only for some special situations, through the use of the methods, called
by Miyamoto et al. (2008), Chap. 6, the inductive clustering techniques. Namely,
if Z * obtained for PA refers to an inductive clustering technique (e.g., belonging
to the k-means algorithms family), then new data points can be directly classified
to one of the clusters of PA (using the 1-nn classification technique with respect to
the centroids of the clusters of PA , in this case) with the same effect, which would
result from going through the complete classification scheme. The use of some other
incremental clustering techniques available for a clustering algorithm being a part
2.3 The Interpretations 27

of Z * may be also possible, even if only approximately equivalent to the carrying out
the whole clustering procedure from scratch.
It should be also stressed that the approach we propose, due to the adopted problem
setting, will not, in general, take into account the criterion of generalization which
is so important for the design of any effective classifier.
Furthermore, in the classification task, there is a fixed number of classes, whereas
the number of clusters for a given parameter set Z* may vary. For instance, if a
new batch of data contains an outlier dissimilar from the whole training dataset
the progressive-merger or density-based algorithms would generate an additional,
isolated cluster. On the other hand, classifiers would try to assign it to one of the
predefined categories.
Hence, a standard classification procedure might be one of the interpretations,
but, definitely, only a marginal one. Another interpretation of our approach may be
considered, which is also related to the task of classification albeit of a rather specific
type. First of all let us notice that the results of the application of our proposed
approach may be understood either more narrowly, i.e. as a given Z * , obtained for
the assumed space , or more broadly, i.e. as the entire procedure, leading from
some PA through choices, related to  and Z, down to PB = Z * (X). In both cases,
such results may serve for a non-standard kind of classification: when we intend to
partition different, usually much bigger, sets of data than the original set X. Under
such circumstances, we
(a) do not expect an absolute or ideal accuracy of results over large data sets, but
we wish to preserve the essential character of the original partition PA , and
(b) would like to check (and preserve) the general validity of the Z * (up to a certain
fine tuning) for various, though definitely in some way similar PA .

The prerequisites for the scope of interpretations


In order to consider the potential entire scope of interpretations of the problem and
the approach, let us take a look at an important aspect, related to the status of the
prior partition PA . Two issues are essential here:
1. where does this partition come from (what is its relation to the data set X and
what do we know about it)? and
2. what is the degree of certainty of this prior partition (degree of our belief in its
validity)?
Although these two aspects are often tightly associated in practice, they are,
in general, quite independent. So, there is a whole range of situations, arising in
connection with the (definitely qualitative) answers to these questions (schematically
illustrated in Fig. 2.2). The situations arising can be “ordered” from some sort of
extreme, “absolute” case, which takes place when (see the limiting lines in Fig. 2.2)
(a) the partitioning PA has been imposed on the data set X fully irrespectively of
the values of the particular attributes except possibly for an identifier attribute
which makes it possible to distinguish particular elements of X, i.e., PA has (at
28 2 Reverse Clustering—The Essence and The Interpretations

Degree of b (max)
independence of
PA from X

a (max)
In this direction PA is increasingly
based on information from X

Standard
In this direction PA becomes classification
increasingly „just a task
hypothesis” to be verified or Degree of
replaced credibility of PA

PA can be treated as some sort of random partition, PA is well founded on


generated on the basis of attributes of X the content of X

Fig. 2.2 The scheme of potential cases of interpreting the paradigm of reverse clustering

least apparently) nothing to do with the data set X (i.e. either it is simply given,
and we do not know anything about the relation between PA and the attributes
k = 1,…,m, or we know that the division was performed on the basis of the
attribute(s) not accounted for in X), and
(b) the partition PA is fully certain and thus we are fully convinced of its validity—it
is certainly a correct partition of X in a given context.
It is quite obvious, why these two form the “absolute” extreme: the certainty as to
the validity of the partition PA comes from a source that is outside of the data set X and
its specifications. In a way, then, finding of Z * (and PB ) would correspond to finding
a kind of model, in the space E x , to which x i belong, or of the “criterion” or “rule”
that produced PA . Under such circumstances it may happen that the partition PA has
nothing to do with the set X (i.e., its characterization with the values of available
attributes), and so there would be no hope of obtaining any good match between PA
and PB . If, however, we are certain of PA , and it is based on the characteristics of X,
then we deal with a true reconstruction of some Z A that produced PA (assuming this
partition arose from a procedure that can be cast in the clustering framework).7 The
latter case is, indeed, very much the one of classical, standard classification tasks.
The extreme case of (a) and (b) (see Fig. 2.2), i.e. when we are certain that PA
is valid and “true”, but we do not know (and will not know) anything about its
connection with X (or even know that it is not related to X) is softened towards the
partitions PA , which, for instance, may be produced by experts, who,

7 We put aside here the considerations, involving the direct study of relations between the labels of
x i , associated with PA , and the characteristics of x i , and we assume that the task at hand replaces
such a study.
2.3 The Interpretations 29

(c) take into account, even if implicitly, the actual attributes characterizing data
from X, and
(d) the opinions, appearing in the form of PA , can be put to doubt, or at least
discussed; thereby, we come to the situation, in which PB may give rise to a
feedback, enriching our knowledge that had led to PA .8
The scenarios that can be formulated for the thus outlined range of situations, lead
to problem statements of quite varying interpretations and (potential) utility. Thus,
say, if we have little faith in PA , why bother referring to it? The reason may lie in the
hypothetical character of PA , which is then subject to verification, and/or modifica-
tions (hence, we would be looking for the potential support for such a hypothesis or
its negation, provided other elements of the rationale of the entire reverse clustering
paradigm hold).
On the other hand, in the scenario arising when (a) and (d) are true, i.e., the
given partition PA is somehow transcendent with respect to the set of attributes
characterizing X, and, at the same time, we are not fully convinced as to its validity,
we would be interested if it is possible to recover partition PA using some Z * , but
we should expect that the best PB obtained may be quite different from PA , and,
moreover, we can think of PB as a legitimate replacement for PA , which may be,
therefore, treated just as the starting point for getting to a “real” partition of X.
In yet another scenario, arising when we know or assume that (b) and (c) are true,
i.e., when we treat the partition PA as a valid one and at the same time we know
that it has been established with the reference to the actual values of the attributes
characterizing the data set X, we will be more concerned with recovering exactly
PA and the benefits from carrying out the reverse engineering procedure would be
primarily related to getting a better insight into the meaning of the particular attributes
and their role in justifying the real/valid partition PA . This case, then, as noted also
in Fig. 2.2, is quite akin to the standard case of classification and the search for the
best classifier.
Yet, what we obtain, in addition, in a way, to the original partition PA , which
we may treat as “correct”, is the mechanism for the relatively easy partitioning of
other data sets of similar character (sets of objects located in an analogous space and
originating from a similar kind of process or phenomenon). On the top of this, the
mechanism obtained can be parameterised in a simple manner, quite in line with the
logic of the vector Z.
At the end of this section a couple of remarks are perhaps due on the very orien-
tation at clustering within the paradigm proposed. As already mentioned in Chap. 1,
once we deal with some data set X and some partition of it, PA , and we wish to recon-
struct in some formal, procedural manner, the way this partition has been arrived at,
the procedural manner invoked being reflected in the parameters, forming the vector
Z, the choice of clustering, together with the relevant procedure, seems to be natural,
for clustering is exactly meant to produce partitions of data sets. However, this parti-
tion, in the case of clustering, is directed by the primeval task of clustering, i.e. “to

8 Actually,even in the “absolute” case, doubts may arise, if the situation resembles the one of
multiple (and multiply) overlapping distributions.
30 2 Reverse Clustering—The Essence and The Interpretations

Fig. 2.3 An illustration of division of a set of objects according to the rule of “putting together
the dissimilar and separating the similar”: colours indicate the belongingness to three groups: blue,
red and green

put together the similar (the affine) and to separate the dissimilar (the distant)”, this
formulation having definite formal and pragmatic (technical) consequences, see, e.g.,
Fortier and Solomon (1966), Kemeny and Snell (1960), Marcotorchino and Michaud
(1979, 1982), Mulvey and Beck (1984), Rubin (1967), Wallace and Boulton (1968),
or Owsiński (2020) for various perspectives on what this formulation leads to.
Thus, it should be emphasized very strongly that although PA may be known to
be directly related to X, and even result from it through a certain procedure, this is by
no means to say that reverse clustering, as outlined here, would in principle be the
way to reconstruct its provenience. Although broadly understood clustering appears
to be a highly rational prerequisite to partitioning of a set of objects (observations),
it is not unique.
The partition PA may have, for instance, and to the contrary, with respect to the
rationale of the reverse clustering, resulted from an opposite approach, namely “to
put together the dissimilar and to separate the similar”, as this is exemplified in
Fig. 2.3, which is not just a spiteful example, but the procedure, which is for definite
reasons sometimes applied.
Another important and relevant example is provided in Chap. 4 of the book, where
PA is based on a simple categorization that is not contained in the very data, although
it is supposed to be associated with it. This categorization is very simple and is
based on just a few potential nominal (or, at best, ordinal) categories. It might have
happened, though, that such categorization leads to a structure that can hardly be
reconstructed through clustering.
Yet, the use of reverse clustering may also be helpful in such situations as the
ones depicted above, since it is obvious that this procedure would be applied only
in case the analyst had not been aware of the source of PA , the results from reverse
clustering showing, for such circumstances, the wide discrepancy between PA and
the attainable PB .
Finally, let us add that each of the algorithms of cluster analysis is based on
a somewhat different understanding of the basic task of clustering (including, in
particular, application of somewhat different criteria), and hence the search for PB
2.3 The Interpretations 31

consists in a way in looking for the basic principle that is as close as possible to the
one that stands behind PA , irrespective of the fact whether it was applied directly
or indirectly (the simplest example being that of k-means tending towards spherical
or ellipsoidal clusters, while some of hierarchical aggregation algorithms may tend
towards formation of complex chaining clusters). This interplay, depicted in Fig. 2.1,
between PA , X and the principles and limitations of the individual clustering proce-
dures, yields ultimately the partition PB , which ought, therefore, be interpreted in
this quite complex perspective.

References

Adadi, A., Berrada, M.: Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelli-
gence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.287
0052
Arabie, P., Hubert, L.J., De Soete, G.: Clustering and Classification. World Scientific (1996)
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pérez, J.M., Perona, I.: An extensive comparative study
of cluster validity indices. Pattern Recogn. 46(1), 243–256 (2013)
Bagherjeiran, A., Eick, C.F., Chen, C.S., Vilalta, R.: Adaptive clustering: obtaining better clusters
using feedback and past experience. In: Fifth IEEE International Conference on Data Mining
(ICDM’05), Houston, TX (2005). https://doi.org/10.1109/icdm.2005.17
Baker, V.E., O’Conor, D.E.: Expert system for configuration at digital: XCON and beyond.
Commun. ACM 32(3) (1989)
Bargiela, A., Pedrycz, W.: Granular Computing: An Introduction. Kluwer Academic Publishers,
Boston (2002)
Berka, P., Laš, V., Svátek, V.: NEST: Re-engineering the compositional approach to rulebased
inference. Neural Netw. World 5(04), 367–379 (2004)
Biecek, P., Burzykowski, T.: Explanatory Model Analysis. Explore, Explain and Examine Predictive
Models. With examples in R and Python. Chapman & Hall/CRC, New York (2021)
Bock, H.-H.: Classification and clustering: problems for the future. In: Diday, E., et al. (eds.) New
Approaches in Classification and Data Analysis, pp. 3–24. Springer, Berlin (1994)
Böhm, C., Faloutsos, C., Pan, J.Y., Plant, C.: Robust Information-Theoretic Clustering. In: KDD’06,
Philadelphia, Pennsylvania, USA. ACM Press, 20–23 Aug 2006
Bouchachia, A.,Pedrycz, W.: Data Clustering with Partial Supervision. Data Mining and Knowledge
Discovery, 12, 47–78 (2006)
Bramer, M.: Principles of Data Mining. Springer, New York (2007)
Brun, M., Sima, Ch., Hua, J.-P., Lowey, J., Carroll, B., Suh, E., Dougherty, E.R.: Model-based
evaluation of clustering validation measures. Pattern Recogn. 40(3), 807–824 (2007)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3, 1–27 (1974)
Câmpan, A., Şerban, G.: Adaptive clustering algorithms. In: Lamontagne, L., Marchand, M. (eds.):
Canadian AI 2006. LNAI vol. 4013, pp 407–418. Springer Verlag, Berlin-Heidelberg (2006)
Casalino, G., Castellano, G., Mencar, C.: Data stream classification by dynamic incremental semi-
supervised fuzzy clustering. Int. J. Artif. Intell. Tools (2019)
Castillo, E., Alvarez, E.: Introduction to Expert Systems: Uncertainty and Learning. Elsevier Science
Publishers, Essex (1991)
Charrad, M., Ghazzali, N., Boiteau, V., Niknafs, A.: NbClust: An R package for determining the
relevant number of clusters in a data set. J. Stat. Softw. 61(6), 1–36 (2014)
Chikofsky, E.J., Cross, J.H.: Reverse engineering and design recovery: a taxonomy. IEEE Softw.
7(1), 13–17 (1990). https://doi.org/10.1109/52.43044
32 2 Reverse Clustering—The Essence and The Interpretations

Chiu, S.L.: Fuzzy model identification based on cluster estimation. J. Intell. Fuzzy Syst. 2, 267–278
(1994)
Choi, S.S., Cha, S.H., Tappert, Ch.C.: A survey of binary similarity and distance measures. Syst.
Cybern. Inform. 8(1), 43–48 (2010)
Craven, M.W., Shavlik, J.W.: Extracting comprehensible concept representations from trained
neural networks. In: Working Notes of the IJCAI’95 Workshop on Comprehensibility in Machine
Learning, Montreal, Canada, 61–75 (1995)
Cross, V.V., Sudkamp, Th.A.: Similarity and Compatibility in Fuzzy Set Theory: Assessment and
Applications. Physica-Verlag, Heidelberg (2002)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell.
1(2), 224–227 (1979)
Desgraupes, B.: Clustering Indices. CRAN-R-Project (2013). https://cran.r-project.org/web/pac
kages/clusterCrit/…/clusterCrit.pdf
Doran, D., Schulz, S., Besold, T.R.: What does explainable AI really mean? A new conceptualization
of perspectives (2017). arXiv:1710.00794
Ducimetière, P.: Les méthodes de la classification numérique. Rev. Stat. Appl. 18(4), 5–25 (1970)
Eilam, E.: Reversing: Secrets of Reverse Engineering. Wiley (2005)
Fisch, D., Gruber, T., Sick, B.: SwiftRule: mining comprehensible classification rules for time series
analysis. IEEE Trans. Knowl. Data Eng. 23(5), 774–787 (2011)
Fisher, D.: Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139–172
(1987)
Fortier, J.J., Solomon, H.: Clustering procedures. In: Krishnaiah, P. (ed.) Multivariate Analysis I,
pp. 493–506. Academic Press, London (1966)
Gan, G., Ma, Ch., Wu, J.: Data Clustering: Theory, Algorithms and Applications. SIAM & ASA,
Philadelphia (2007)
Gopalan, R., Li, R., Patel, V.M., Chellappa, R.: Domain adaptation for visual recognition. Found.
Trends Comput. Graph. Vision 8(4) (2012). http://dx.doi.org/10.1561/0600000057
Govin, B., du Sorbier, A.M., Anquetil, N., Ducasse, S.: Clustering technique for conceptual clusters.
In: Proceedings of the IWST’16 International Workshop on Smalltalk Technologies, Prague,
Czech Republic, August (2016). https://doi.org/10.1145/2991041.2991052
Guadagnoli, E., Velicer, W.: Relation of sample size to the stability of component patterns. Psychol.
Bull. 103, 265–275 (1988)
Gunning, D., Aha, D.: DARPA’s Explainable artificial intelligence (XAI) Program. AI Mag. 40(2),
44–58 (2019)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inform.
Syst. 17(2–3), 107–145 (2001)
Hall, P., Gill, N.: An Introduction to Machine Learning Interpretability: An Applied Perspective on
Fairness, Accountability, Transparency, and Explainable AI. O’Reilly Media, Inc (2018)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning Data Mining, Inference,
and Prediction, 2nd edn. Springer, New York (2009)
Hennig, C.: What are the true clusters? Pattern Recogn. Lett. 64, 53–62 (2015)
Kacprzyk, J., Yager, R.R.: Linguistic summaries of data using fuzzy logic. Int. J. Gen Syst 30(2),
133–154 (2001)
Kacprzyk, J., Zadrożny, S.: Linguistic database summaries and their protoforms: towards natural
language based knowledge discovery tools. Inf. Sci. 173(4), 281–304 (2005)
Kacprzyk, J., Zadrożny, S.: Protoforms of linguistic database summaries as a human consistent tool
for using natural language in data mining. Int. J. Softw. Sci. Comput. Intell. 1(1), 100–111 (2009)
Kacprzyk, J., Zadrożny, S.: Computing with words is an implementable paradigm: fuzzy queries,
linguistic data summaries and natural language generation. IEEE Trans. Fuzzy Syst. 18(3), 461–
472 (2010)
Kacprzyk, J., Zadrożny, S.: Comprehensiveness of linguistic data summaries: a crucial role of
protoforms. In: Moewes, Ch., Nürnberger, A. (eds.) Computational Intelligence in Intelligent
Data Analysis, 207–221. Springer-Verlag, Berlin, Heidelberg (2013)
References 33

Kacprzyk, J., Zadrożny, S.: Fuzzy logic-based linguistic summaries of time series: a powerful tool
for discovering knowledge on time varying processes and systems under imprecision. Wiley
Interdisc. Rev. Data Min. Knowl. Discovery 6(1), 37–46 (2016)
Kacprzyk, J., Yager, R.R., Zadrożny, S.: A fuzzy logic based approach to linguistic summaries of
databases. Int. J. Appl. Math. Comput. Sci. 10(4), 813–834 (2000)
Kacprzyk, J., Wilbik, A., Zadrożny, S.: Linguistic summarization of time series using a fuzzy
quantifier driven aggregation. Fuzzy Sets Syst. 159(12), 1485–1499 (2008)
Kacprzyk, J., Wilbik, A., Zadrożny, S.: An approach to the linguistic summarization of time series
using a fuzzy quantifier driven aggregation. Int. J. Intell. Syst. 25(5), 411–439 (2010)
Kamishima, T., Motoyoshi, F.: Learning from Cluster Examples. Mach. Learn. 53, 199–233 (2003)
Kamishima, T., Minoh, M., Ikeda, K.: Rule formulation based on inductive learning for extraction
and classification of diagram symbols. Trans. Inform. Process. Soc. Japan 36(3), 614–626 (1995).
(in Japanese)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley,
New York (1990)
Kemeny J., Snell L.: Mathematical Models in the Social Sciences. Ginn, Boston (1960)
Kouw, W.M., Loog, M.: Technical Report. An introduction to domain adaptation and transfer
learning (2019). arXiv:1812.11806v2 Accessed 14 Jan 2019
Kuhn, A., Ducasse, S., Girba, T.: Enriching reverse engineering with semantic clustering. In:
Proceedings of the 12th Working Conference on Reverse Engineering (WCRE’05), Pittsburgh,
PA, pp 1–14. IEEE Xplore (2005). https://doi.org/10.1109/wcre.2005.16
Laube, P.: Machine Learning Methods for Reverse Engineering of Defective Structured Surfaces.
Springer (2020)
Lesot, M.-J., Moyse, G., Bouchon-Meunier, B.: Interpretability of fuzzy linguistic summaries. Fuzzy
Sets Syst. 292, 307–317 (2016)
Libert, G.: Compactness and number of clusters. Control Cybern. 15 (2), 205–212 (1986) (special
issue on Optimization approaches in clustering, edited by J. W. Owsiński)
Lin, T.Y., Yao, Y.Y., Zadeh, L.A.: Data Mining, Rough Sets and Granular Computing. Springer,
(Physica) (2002)
Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures.
In: 2010 IEEE International Conference on Data Mining, 911–916. IEEE (2010) https://doi.org/
10.1109/icdm2010.35
Marcotorchino, F., Michaud, P.: Optimisation en Analyse Ordinale des Données. Masson, Paris
(1979)
Marcotorchino, F., Michaud, P.: Aggrégation de similarités en classification automatique. Revue de
Stat. Appl. 30, 2 (1982)
Meila, M.: Comparing clusterings—an axiomatic view. In: Proceedings of the 22nd International
Conference on Machine Learning. Bonn, Germany (2005)
Michalski, R.: A theory and methodology of inductive learning. Artif. Intell. 20(2), 111–161 (1983)
Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267,
1–38 (2017)
Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters
in a data set. Psychometrika 50(2), 159–179 (1985)
Mirkin, B.: Mathematical Classification and Clustering. Springer, Berlin (1996)
Miyamoto, S., Ichihashi, H., Honda, K.: Algorithms for Fuzzy Clustering: Methods in c-Means
Clustering with Applications. Studies in Fuzziness and Soft Computing, vol. 229. Springer,
Berlin (2008)
Molnar, C.: Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.
Lulu Publisher (2020), eBook (GitHub, 2020–04-27). ISBN-13: 978-0244768522
Mulvey, J.M., Beck, M.P.: Solving capacitated clustering problems. Eur. J. Oper. Res. 18, 339–348
(1984)
Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Interpretable machine learning:
definitions, methods, and applications. Proc. Nat. Acad. Sci. USA 116(44), 2207–22080 (2019)
34 2 Reverse Clustering—The Essence and The Interpretations

Ntoutsi, I., Spiliopoulou, M., Theodoridis, Y.: Tracking cluster transitions for different cluster types.
Control Cybern. 38(1), 239–260 (2009)
Owsiński, J.W.: Data Analysis in Bi-Partial Perspective: Clustering and Beyond. Springer Verlag
(2020)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359
(2010)
Patil, C., Baidari, I.: Estimating the optimal number of clusters k in a dataset using data depth. Data
Sci. Eng. 4, 132–140 (2019)
Pedrycz, W., Waletzky, J.: Fuzzy clustering with partial supervision. IEEE Trans. Syst. Man Cybern.
B Cybern. 27(5), (1997)
Pedrycz, W.: Granular Computing and Intelligent Systems Design with Information Granules of
Higher Order and Higher Type. Springer (2013b)
Pedrycz, W.: Granular Computing: Analysis and Design of Intelligent Systems. Taylor and Francis
(2013a)
Pedrycz, W., Skowron, A., Kreinovich, V.Y. (eds.): Handbook of Granular Computing. Wiley (2008)
Perera, B., Zaslavsky, A., Christen, P., Georgakopoulos, D.: Context aware computing for the internet
of things: a survey. IEEE Commun. Surveys Tutorials 16(1), 414–454 (2014). https://doi.org/10.
1109/surv.2013.042313.00197
Pryke, A., Beale, R.: Interactive Comprehensible Data Mining. In: Cai, Y. (ed.) Ambient Intelligence
for Scientific Discovery. LNCS 3345, pp. 48–65. Springer (2004)
Quigley, J., Postema, M., Schmidt, H.: ReVis: reverse engineering by clustering and visual object
classification. In: Proceedings 2000 Australian Software Engineering Conference, pp. 119–125,
Canberra, ACT, Australia (2000). https://doi.org/10.1109/aswec.2000.844569
Raffo, A.: CAD reverse engineering based on clustering and approximate implicitization. erga.di.
uoa.gr/meetings/RAFFOpresentation.pdf (2019)
Raffo, A., Barrowclough, O. J. D., Muntingh, G.: Reverse engineering of CAD models via clustering
and approximate implicitization. Computer Aided Geometric Design, 80, June 2020, 101876
(2020)
Raja, V., Fernandes, K.J.: Reverse Engineering: An Industrial Perspective. Springer (2008).
ISBN 978-1-84628-856-2
Rashidi, P., Cook, D.J., Holder, L.B., Schmitter-Edgecombe, M.: Discovering activities to recognize
and track in a smart environment. IEEE Trans. Knowl. Data Eng. 23, 527–539 (2011)
Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press
(2000)
Reiter, E., Hunter, J., Mellish, C.: Generating English summaries of time series data using the
Gricean maxims. In: Proceedings of the Ninth ACM SIGKDD International Conference on
Knowledge Discovery and data Mining, 187–196. ACM (2006)
Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation
indexes. Int. J. Comput. Commun. 5 (1), 27–34 (2011)
Ribeiro, M.T., Singh, S., Guestrin, C.: Why should I trust you?: explaining the predictions of any
classifier. In: Proceedings of the 22nd ACM SIG KDD International Conference on Knowledge
Discovery and Data Mining, pp. 1135–1144 (2016)
Rokach, L., Naamani, L., Shmilovici, A.: Active learning using pessimistic expectation estimators.
Control Cybern. 38(1), 261–280 (2009)
Rota, G.C.: The number of partitions of a set. Am. Math. Mon. 71(5), 498–504 (1964)
Rubin, J.: Optimal classification into groups: an approach for solving the taxonomy problem. J.
Theoret. Biol. 15 (1), 103–144 (1967)
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use
interpretable models instead. Nat. Mach. Intell. 1(5), 206–2015 (2019)
Shi, B., Han, L.X., Yan, H.: Adaptive clustering algorithm based on kNN and density. Pattern
Recogn. Lett. 104, 37–44 (2018). https://doi.org/10.1016/j.patrec.2018.01.020
Shim, K.S., Goo, Y.H., Lee, M.S., Kim, M.S.: Clustering method in protocol reverse engineering
for industrial protocols. Int. J. Network Manage 30(6), 1–15 (2020)
References 35

Sripada, S.G., Reiter, E., Hunter, J., Yu, J.: Segmenting time series for weather forecasting. In:
MacIntosh, A., Ellis, R., Coenen, F. (eds.) Applications and Innovations in Intelligent Systems
X, pp. 193–206. Springer (2003)
Sugar, C.A., James, G.M.: Finding the number of clusters in a data set: an information-theoretic
approach. J. Am. Stat. Assoc. 98(January), 750–763 (2003). https://doi.org/10.1198/016214503
000000666
Tervonen J., Mikhaylov K., Pieskä S., Jamsä J. and Heikkilä M. (2014) Cognitive Internet-of-
Things solutions enabled by wireless sensor and actuator networks. In: 5th IEEE International
Conference on Cognitive Infocommunications (CogInfoCom 2014), 97–102, IEEE
Torra, V., Endo, Y., Miyamoto, S.: Computationally intensive parameter selection for clustering
algorithms: The case of fuzzy c-means with tolerance. Int. J. Intell. Syst. 26(4), 313–322 (2011)
Travkin, O., von Detten, M., Becker, S.: Towards the combination of clustering-based and pattern-
based reverse engineering approaches. In: Reussner, R.H., Pretschner, A., Jähnichen, S. (eds.)
Software Engineering 2011 Workshopband (inkl. Doktorandensymposium), Fachtagung des GI-
Fachbereichs Softwaretechnik, vol. LNI 184, 21–25 Feb 2011, Karlsruhe, Germany, 23–28.
Springer (2011)
Tsai, C., Lai, C., Chiang, M., Yang, L.T.: Data mining for internet of things: A survey. IEEE
Commun. Surveys Tutorials 16, 77–97 (2014)
Van Craenendonck, T., Blockeel, H.: Using internal validity measures to compare clustering algo-
rithms. In: Poster from Benelearn Conference (2015) https://lirias.kuleuven.be/handle/123456
789/504705
Vendramin, L., Campello, R.J.G.B., Hruschka, E.R.: Relative clustering validity criteria: a
comparative overview. Wiley InterScience (2010). https://doi.org/10.1002/sam.10080
Wagner, S., Wagner, D.: Comparing clusterings—an overview. Technical Report 2006–04, Faculty
of Informatics, University of Karlsruhe, TH (2006)
Wagner, R., Scholz, S.W., Decker, R.: The number of clusters in market segmentation. In: Baier,
D., Decker, R., Schmidt-Thieme, L. (eds.) Data Analysis and Decision Support, pp. 157–176.
Springer, Springer (2005)
Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11(2), 185–194
(1968)
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell.
13(8), 841–847 (1991)
Xu, R., Wunsch, D.C.I.I.: Clustering. Wiley/ IEEE Press, Hoboken (2009)
Yu, J., Reiter, E., Hunter, J., Mellish, C.: Choosing the content of textual summaries of large time-
series data sets. Nat. Lang. Eng. 13, 25–49 (2007), Cambridge University Press. https://doi.org/
10.1017/s135132490500403
Zhang, Y., Chen, X.: Explainable recommendation: a survey and new perspectives (2018). eprint
arXiv: 1804 11192
Zhao, Q., Fränti, P.: WB-index: a sum-of-squares based index for cluster validity. Data Knowl. Eng.
92, 77–89 (2014)
Zhao, Q., Xu, M., Fränti, P.: Sum-of-squares based cluster validity index and significance analysis.
In: Kolehmainen, M., et al. (eds.) ICANNGA 2009. LNCS vol. 5495, 313–322. Springer Verlag
(2009)
Zhou, Z.H.: Comprehensibility of data mining algorithms. In: Wang, J. (ed.) Encyclopedia of Data
Warehousing and Mining, 190–195. IGI Global, Hershey (2005)
Chapter 3
Case Studies: An Introduction

3.1 A Short Characterisation of the Cases Studied

This short chapter is devoted to an introduction to and an overview of the cases studied
with the reverse clustering approach and illustrated in the present book—their short
characterisation and the indication of the respective interpretations of the problems
formulated and hence also of the results obtained, in the vein of the framework,
introduced in the preceding chapters. It should be emphasised that almost all of the
cases here presented are based on concrete, real-life data. On the other hand, even
though the problems in themselves are intuitively appealing and comprehensible,
the individual interpretations of the problems and their solutions are not always
absolutely obvious, for two kinds of reasons:
– first, related to the potential formulation of the purpose or objective of the reverse
clustering exercise, but also of the original problem, corresponding to the data set
X and, especially, the partition PA , and.
– second—the actual status of both PA and X (validity, certainty) and the knowledge
of this status.
Yet, it is definitely possible to draw constructive conclusions from the results
obtained—as we shall see, conditional, of course, on the circumstances, mentioned
above, relative to the “precision of interpretation”. It should also be emphasised that
these constructive conclusions are related both to the paradigm here proposed, the
reverse clustering, and to the substantive essence of the problems considered.
1. The motorway traffic case
In this case, described in Chap. 4 of the book, the set of actual data on car traffic
(numbers of cars passing per hour) at a point on a state road has been analysed.
The data have been collected for a variety of practical purposes including long-term
maintenance planning and safety evaluation by an enterprise, responsible for road
servicing. One of the purposes has been to appropriately model the traffic movement
for the potential case, in which a sensor, registering the traffic at a given point is

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 37


J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_3
38 3 Case Studies: An Introduction

not functional. Knowing the “hourly traffic profile model” would help in mitigating
the loss of information, related to a lack of current data from this particular sensor.
(Of course, having data from other, neighbouring sensors might have constituted a
basis for an appropriate adjustment, but, first, the general shapes of hourly profiles
would anyway have to be identified to make a significant analysis possible.)
Thus, X was constituted by the data on hourly traffic collected during a definite
period of time, here equivalent to a year—numbers of cars passing per hour for 24 h of
the day, meaning 24 variable values plus the date. The initial partition, PA , was based
on the classification according to the days of the week, the founding assumption
being that it is the day of the week that is the most important aspect for the shape of
the hourly traffic profile.
Traffic analyses, modelling and design constitute a very popular and important
subject of research, and the Readers, who are more deeply interested in this subject,
are advised to consult, out of the truly vast literature, such exemplary positions as
Salter (1989), May (1990), Gazis (2002), Kerner (2004), Treiber and Kesting (2013)
or Kessels (2019).
2. Chemicals in the natural environment
The second of the cases here presented (in Chap. 5) was based on data originating from
Germany. Namely, the data set X was constituted by the averages of measurements
of definite chemical element contents in the herb layer of the counties (Kreise) of a
province (Land) in Germany. The data concerned four different elements (i.e. four
variables, characterising the objects—counties—in X) and, actually, there was no
explicit initial partition, PA , given. Upon the inspection of the data it turned out that
the distributions among the counties of the four elements differ significantly as to
their character, separating very clearly two pairs of elements—one pair, for which
the distributions among the counties display characteristic step-like character, and
the other pair, where one cannot perceive any such quasi-regularity.
Hence, the initial partition PA of the set of counties into subsets was based on
the values of the two step-like distributions. Then, attempts were made to recon-
struct the way this partition may be obtained on the basis of X, either taken in its
entirety or without the data on elements, which served to establish the partition PA .
These attempts, differed, therefore as to their significance and interpretation, and the
observed differences were highly telling for the potential ultimate use of the results.
This use of the results, here reported, is insofar—potentially—important as
contamination of the natural environment—here: the herb layer—is one of the most
significant environmental phenomena of our age, bringing enormous consequences
in terms of farming, human and animal health, ecosystem dynamics and resistance,
extending to leisure and recreation, and, more generally, ecosystem survival. At the
same time, the treatment of the related problems is much more complicated than
in, say, the case of atmospheric pollution. That is why any tangible, and verifiable
results, like those obtained here, may be of very high value.
3.1 A Short Characterisation of the Cases Studied 39

3. Administrative units in Poland


The clustering-type analyses of administrative units are, indeed, abounding. There
are literally thousands of studies, having been conducted for decades, involving some
sort of grouping of administrative units for various purposes and reasons, and in quite
a variety of substantive and theoretical perspectives. These studies may concern very
general issues of planning and socio-economic development (see, e.g., Senetra and
Szarek-Iwaniuk 2020; or Brauksa 2013, among a multitude of instances), or very
particular questions, like (to point out just few selected examples): desertification
(Zolfaghari et al. 2019), soil classification, very much in line with the previous case
study (Minasny and McBratney, 2007), health care incidence studies (Rodriguez-
Villamizar et al. 2020; or Umer et al. 2018), or, say, the connection between crime
incidence and economic development (Lombardo and Falcone 2011). See also
Boschma and Kloosterman (2005) for general treatment of the use of clustering in
spatial perspective. Hence, while the application of clustering to the sets of adminis-
trative or other spatial units is a kind of routine undertaking, the analysis we propose
here has, indeed, a very specific character.
So, regarding the investigations, related to reverse clustering, quite a number of
exercises have been carried out in the framework of analysis of the administrative
units in Poland. The primary objects are here the communes or municipalities (some
2 500 in Poland). These objects are characterised in this particular study by a set of
socio-economic and spatial features (around 20, the actual number depending on the
exercise), taken from the official public statistics, and amounting to the corresponding
data sets X. The respective case analyses are presented in two separate chapters of
this book, because of two essential differences between these two groups of analyses:
first, they differed as to the spatial scale, the first group being mainly oriented at the
provincial scale, while the second—mainly at the national scale, and second, they
definitely differed as to the nature of the initial partition PA .
Namely, in the first case (Chap. 6) attention was mainly focused on the “stiff”
administrative division of the municipalities into three formal categories, having
rather loose connection with the socio-economic characteristics of the municipalities,
i.e. the respective set X, since they arose rather in an historical-political process than
through an analytically based decision making procedure. (This, of course, does not
mean that they are completely unrelated to the respective X.)
In the second case considered (Chap. 7) we analysed the initial partition PA of the
whole set of Polish municipalities, elaborated for purposes of planning procedures,
and hence based on the explicit features of the municipalities. Although the process,
leading to this initial partition was described by its authors, and so its connection to
actual data on municipalities is known, this process had the character that cannot be
directly translated into clustering terms (see the corresponding remarks at the end of
Chap. 2, accompanied by an extreme example, shown in Fig. 2.1). So, in this case,
it was expected that the results obtained might point out the possible biases of the
initial partition and the potential improvements or alternatives. (A similar, smaller
study on a provincial level closes the preceding Chap. 6.)
40 3 Case Studies: An Introduction

4. The academic examples


Finally, in order to verify some specific hypotheses, concerning the functioning of the
reverse clustering approach, a series of academic examples was treated, first, related
to the well-known Iris data of Anderson (1935) and Fisher (1936), and the second—
based on a set of artificial examples of data, all of them two-dimensional, consisting
of several dozen points in various configurations on the plane. In the majority of
configurations of the artificial data it was obvious at the first sight what was the
“correct” partition of the respective data set. In the case of Iris data the respective PA
was constituted by the known division into the flower varieties. These experiments
are shortly reported in Chap. 8 of the book.
Hence, in this manner a particular PA could be established for the cases analysed.
However, regarding the artificial data in some cases the evidently “nested clusters”
(i.e. clusters contained in other clusters) were subject to analysis, forming several
distinct “levels of resolution”, and in these cases it was supposed that the reverse
clustering would tend to uncover one of these levels, depending upon the setting of
parameters in Z.
The experimental calculations, carried out with the reverse clustering, revealed,
for the artificial data, that in the doubtless cases the proper partition PA could be fully
reconstructed, and in the cases of “nested clusters” the solutions obtained usually
“focused” at one level (also usually the one that was most visible by visual inspection),
while the other levels were either not identified at all, or only for the narrow intervals
of quite extreme values of the controlled parameters.
The results of these experiments confirmed, for both kinds of data, on the one
hand, the intuitive “correctness” of the results, produced by the reverse clustering,
but also, on the other hand, indicated some avenues for future research, especially
regarding more complex cases, like those with “nested” cluster structures, which are
not, at all, very rare in reality (similarly as the overlapping clusters).

3.2 The Interpretations of the Cases Treated

As already mentioned in the short introduction of the preceding section, the cases,
commented upon in this book, represented quite a variety of interpretations within the
reverse clustering paradigm. These various interpretations are very roughly illustrated
in Fig. 3.1, based on Fig. 2.1 from the preceding chapter.
The locations, corresponding to the particular kinds of exercises, shown in Fig. 3.1,
are, definitely, largely subjective. In principle, one would have to propose a measure of
credibility (“likelihood”) of the partition PA , and another measure of its association
with the data set X. However, in some sense paradoxically, we are exactly in the
situation, in which our knowledge of the two is either limited or none. Were this
knowledge closer to the possibility of formulating and estimating the values of such
measures, we would definitely not be motivated to recur to the reverse clustering
paradigm.
3.2 The Interpretations of the Cases Treated 41

Degree of b (max)
independence of
PA from X

a (max)

Administrative
Standard
units I classificaton
Chemicals Administrative task

in the soil units II


Motorway
traffic
Academic Degree of
data credibility of PA

Fig. 3.1 Rough indication of interpretations of the cases treated against the framework of Fig. 2.1

A very telling example is provided by the second case of administrative data—even


if the procedure, leading from the properties of the particular units to the proposed
partition PA is, in principle, known, it can hardly be assessed, in view of the nature
of this procedure, to what extent it can be considered associated with the unified data
set on the units, X, to say nothing of the relation of this procedure to any potential
clustering.
All in all, though, the cases here presented definitely span quite some region of
the space of potential interpretations, providing sufficient material for the evaluation
of the utility of the reverse clustering approach for other analytic situations, which
can be cast in the form of the reverse clustering paradigm as considered in this book.
Side by side with the issue of interpretation, in almost each of the studies reported
some technical or methodological issues arise, which are then discussed in the final
sections of the respective chapters, along with the—definitely closely related—issue
of interpretation.

References

Anderson, E.: The irises of the Gaspé Peninsula. Bull. Am. Ir. Soc. 59, 2–5 (1935)
Boschma, R.A., Kloosterman, R.C. (eds.): Learning from Clusters: A Critical Assessment from an
Economic-Geographical Perspective. Springer (2005)
Brauksa, I.: Use of cluster analysis in exploring economic indicator differences among regions: the
case of Latvia. J. Econ. Bus. Manag. 1(1), 42–45 (2013)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenic. 7(2), 179–
188 (1936)
Gazis, D.C.: Traffic Theory. Springer (2002)
Kerner, B.S.: The Physics of Traffic. Springer, Berlin, New York (2004)
42 3 Case Studies: An Introduction

Kessels, F.: Traffic Flow Modelling—Introduction to Traffic Flow Theory Through a Genealogy of
Models. Springer (2019)
Lombardo, R., Falcone, R.: Crime and Economic Performance. A cluster analysis of panel data on
Italy’s NUTS 3 regions. Working Paper no. 12 / 2011, Department of Economics and Statistics,
University of Calabria (2011) https://www.ecostat.unical.it/
May, A.: Traffic Flow Fundamentals. Prentice Hall, Englewood Cliffs, NJ (1990)
Minasny, B., McBratney, A.B.: Incorporating taxonomic distance into spatial prediction and digital
mapping of soil classes. Geoderma 142, 285–293 (2007)
Rodriguez-Villamizar, L.A., Rojas Díaz, M.P., Acuña Merchán, L.A., et al.: Space-time clustering
of childhood leukemia in Colombia: a nationwide study. BMC Cancer 20, 48 (2020). https://doi.
org/10.1186/s12885-020-6531-2
Salter R. J. (1989) Highway Traffic Analysis and Design. Springer.
Senetra, A., Szarek-Iwaniuk, P.: (2020) Socio-economic development of small towns in the Polish
Cittaslow Network—A case study. Cities 103, 102758 (August 2020)
Treiber M. and Kesting A. (2013) Traffic Flow Dynamics. Springer.
Umer, M.F., Zofeen, Sh., Majeed, A., Hu, W.-B., Qi, X., Zhuang, G.-H.: Spatiotemporal clustering
analysis of Malaria infection in Pakistan. Int. J. Environ. Res. Public Health 15(6), 1202 (2018).
https://doi.org/10.3390/ijerph15061202
Zolfaghari, F., Khosravi, H., Shahriyari, A., Jabbari, M., Abolhasan, A.: Hierarchical cluster analysis
to identify homogeneous desertification management units. Plos One (2019). https://doi.org/10.
1371/journal.pone.0226355
Chapter 4
The Road Traffic Data

4.1 The Setting

The data set on vehicle traffic came from a measurement station on a state road.
Individual objects (observations) x i were the days, characterized by the numbers of
vehicles, passing through the station every hour. So, the vectors x i were composed of
m = 24 values, corresponding to hours of the day, and x ik were the numbers of cars
passing during the kth hour on day i. Thus, altogether, x i were the daily temporal
profiles of traffic intensity according to hours. Besides, the days were labeled as to
the day of the week, although this label was not included in the characterization of
x i in terms of the analysed X.
The data came from a company, which is engaged in servicing the road system, and
one of the purposes of the analysis, leading to the establishment of the initial partition
of traffic intensity profiles, PA , was to obtain the possibly justified daily profiles of
traffic intensity so that in the situation of lack of signal from the given measurement
station, its hypothetical indications could be at least approximately recovered on the
basis of “model profiles” (to be potentially adjusted to those from the neighbouring
stations). (For an earlier account on these experiments, see Owsiński et al. 2017a).
In this case, the initial partition PA was based on the classification of the days of
the week, holidays etc., and reflected the expert’s opinion on how these days ought to
be treated exactly in terms of distinction of daily profiles of traffic intensity. This case
is illustrated below in Fig. 4.1 in accordance with the assumed partition PA , coming
from the experts in the field, meaning the following partition based on the days
of the week: {Monday}, {Tuesday, Wednesday, Thursday}, {Friday}, {Saturday},
{Sunday}, i.e. the number of clusters in PA , denoted pA , equals 5, pA = 5.
The expert’s opinion, reflected in the median profiles for the clusters, forming
the initial partition PA , shown in Fig. 4.1, is better justified by the more explicit
rendition of Fig. 4.2, where the profiles, corresponding to the particular days of the
week are also shown on the diagram, presenting the entire week, divided into hours,
with different colours corresponding to the clusters, forming PA . Note that although

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 43


J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_4
44 4 The Road Traffic Data

Fig. 4.1 Median hourly profiles of traffic for the classes of the days of the week

Fig. 4.2 Hourly profiles of traffic intensity for individual hours of the week. Colours, assigned to
successive days, denote the clusters, forming the initial partition PA

Mondays often tend to have distinct traffic patterns due to weekly commuting, it is
not apparent in the data from the analyzed station.
In this study, we were simply interested in checking whether the daily traffic
profiles themselves, when treated through reverse clustering, would yield the partition
as suggested by the experts (and, if so, under what parameters, forming Z, and, if
not, what are the differences and their potential justification). In this case it can be
assumed that the credibility of PA , even if by no means low, cannot be also treated
as very high, similarly as its association with the data in X. Thus, in a way, we aimed
at checking the initial partition as a kind of “working hypothesis”, and, if a different
one were obtained—whether it might perhaps be better justified.
4.2 The Experiments 45

4.2 The Experiments

We have performed two series of experiments, using various choices in these


series, the two series differing primarily by the selection of the search algorithms.
The following two subsections summarize the results from these two series of
experiments.
Experiment series 1.
In this series the following assumptions have been made: the algorithms used have
been from the k-means and hierarchical agglomeration families. Their implementa-
tion in the R package “cluster” by Maechler et al. (2015), based on Kaufman and
Rousseeuw (1990), has been employed.
The “pam” (partitioning around medoids) method is a variant of the k-means with
the number of clusters, “k” (p in our notation) as the sole parameter. The algorithm
“agnes” (agglomerative nesting) is a hierarchical clustering method, parameterized
by the number of clusters, p, and a scalar a. The latter is used to obtain coefficients of
the Lance-Williams (Kaufman and Rousseeuw 1990) formula in a highly simplified
manner, as a1 = a2 = a, b = 1 −2a, and c = 0, where the coefficients a1 , a2 , b and
c appear in the original Lance-Williams formula, quoted in this book in Sect. 1.3.
The distances between the data points x i , x j ∈ X have been defined as weighted
Minkowski distances, i.e.
    h 1/ h
d xi , x j = wk xik − x jk  , (4.1)
k

where wk are the weights assigned to the variables, which can assume the values from
0 to 1. In this manner, both the vector {wk }k , and the exponent h could be treated
as parameters, defining the space  of the values of Z, along with the parameters
proper of the clustering algorithms.
To solve the optimization problem we used the Differential Evolution (DE) meta-
heuristic (Storn and Price 1997), which is an evolutionary global optimization algo-
rithm, and its implementation in the “DEoptim” R package by Mullen et al. (2011).
Variants of DE are among the state-of-the-art real-parameter global optimizers. Their
various modifications and applications are surveyed in Das and Suganthan (2011)
and, more recently Das, Mullick and Suganthan (2016).
Chromosomes are represented as vectors of real numbers Z = (π , a, v, w1 ,…,wn ) ∈
. After rounding, the first parameter represents the number of clusters p = round(π ),
the second is the parameter a, related to the Lance-Williams formula (which is not
used in the k-means algorithm), the third, ν, is the exponent h of the Minkowski
distance, and the subsequent ones are weights w1,…, wn of the variables, cf. (4.1).
The search space  was defined by constraining the possible values of individual
elements of Z to: p ∈ [1, 10], a ∈ [0, 1], h ∈ [0.1, 4], and wi ∈ [0, 1] and limiting the
choice of clustering algorithms to the algorithms “pam” and “agnes”, as previously
mentioned.
46 4 The Road Traffic Data

Table 4.1 Summary of


Algorithm Optimized Adjusted rand Rand index
results for the first series of
parameters (Z) index
experiments with traffic data
pam p, h, w1 ,…,w24 0.600 0.821
p 0.581 0.823
agnes p, a, w1 ,…,w24 0.654 0.850
p, a 0.625 0.837

The distinctive feature of classical DE is the differential mutation operator, which


boils down to the addition of a scaled difference between two randomly chosen
individuals Z 2 and Z 3 to a third randomly picked vector Z 1

Z 1 = Z 1 + F(Z 2 − Z 3 ) (4.2)

where F is a scalar parameter, which is known as the scaling factor, with typical
values F ∈ [0.4, 1). By assuming that the creation of new individuals is based on
differences between population members, we arrive at the adaptation of the search
direction to the local shape of the objective function.
In all experiments the fitting function was identified with the Rand index for the
partition PA and a partition PB , resulting from carrying out clustering on the basis of
a given vector of parameters Z.
For each dataset and algorithm we have run the DE algorithm for 1 000 itera-
tions. In Table 4.1 we provide the best values of the Rand index and its adjusted
variant (see Chap. 1, Sect. 1.4) obtained. For this setting we performed two series
of calculations—in the first one only the optimal values of the basic parameters of
algorithms (i.e. p and a) were sought for. In the second series we have also optimized
the exponent v from the Minkowski distance measure and the vector of weights of
the attributes, w.
In all cases treated the optimization of the whole configuration Z of the clustering
procedure (i.e. the second series of calculations, mentioned above) leads to better
results. Sometimes the difference is significant. We have also observed that the use
of hierarchical clustering allows for a more accurate reconstruction of the reference
partitions in our datasets than the use of k-means (in the form of “pam”).
The optimal value of the exponent h was varying in the solutions obtained over
the range from 0.8 to 2.3, which confirms that the whole family of distance measures
is useful, despite the loss of formal properties for the values of h below 1. This is in
agreement with the results from the paper by De Amorim (2015), in which feature
rescaling with the use of Lp norm proves useful for the hierarchical clustering.
The regularization with the L1 norm resulted in the selection of attribute weights
in the traffic dataset which has implied some of them to be zero.
Table 4.2 provides highly interesting results for the traffic data dealt with using the
full Z and choosing the “agnes” algorithm, showing, in particular, the correspondence
between the clusters in PA , i.e. Aq , and the clusters in PB , i.e. Bq .
4.2 The Experiments 47

Table 4.2 Results for traffic data for the entire vector of parameters Z, with the use of hierarchical
aggregation (values of Rand index = 0.850, of adjusted Rand = 0.654). The upper part of the table
shows the coincidence of patterns in particular Aq , based on the days of the week, and obtained Bq
Days of the week, Aq : Clusters obtained, Bq “Errors”
1 2 3 4 5 Totals
Friday 1 2 42 0 3 48 6
Monday 45 2 0 0 2 49 47
Saturday 0 1 0 46 1 48 2
Sunday 0 1 0 1 47 49 2
Tu-We-Th 140 3 0 0 4 147 7
Totals 186 9 42 47 57 341 64
Parameters
p 5 w1 through w24 —weights of variables, corresponding to the
a 0.78 consecutive hours of the day
h 0.91
w1 −w6 0.47 0.45 0.62 0.17 0.48 0.84
w7 −w12 1.00 0 0.90 0.58 0.83 0.33
w13 −w18 0.07 0.09 0.48 0 0 0.96
w19 −w24 0.30 0.43 0.79 0.53 0.25 0.90

Thus, this result, to a large extent induced by the conditions of computation, rather
than a “natural” tendency of the method, shows that perhaps the expert’s opinion as
to the original classes, ought to be verified (Monday traffic intensity profiles being
classified along with those for Tuesday, Wednesday and Thursday).
This result, not only interesting, but also telling in practical terms, is strongly
corroborated by the slightly “worse-off”, but still close to optimality, results, obtained
with the “pam” algorithm, which are shown in Table 4.3.
What is also highly interesting in the case of Table 4.3 is that the “pam” algorithm
“got rid” of quite a proportion of variables (seven variable weights being equal zero),
and still obtained very valuable results.
The result from Table 4.2 is illustrated graphically in Fig. 4.3 where, indeed, it
can be clearly seen that the hourly traffic distribution on Mondays is not at all very
different from the distributions for other weekdays except for Friday. This is exactly
the instance of the potential feedback from the reverse clustering that we mentioned
before.
Cluster 2 is visualized in Fig. 4.3 as subplot (e). In this case, apart from the
typical morning and afternoon peaks we observe a very high traffic intensity late in
the night. The days, for which such an unusual phenomenon is observed are scattered
throughout the year and represent different days of the week.
This type of traffic intensity curve could constitute an effect of measurement errors
and has therefore been subject to the additional plausibility checks. However, this
can also be—given the relatively systematic character—a true to life case of feast
48 4 The Road Traffic Data

Table 4.3 Results for the traffic data obtained with the “pam” algorithm
Days of the week, Aq : Clusters obtained, Bq “Errors”
1 2 3 4 5 Totals
Friday 3 2 42 0 1 48 6
Monday 47 2 0 0 0 49 47
Saturday 0 1 0 46 1 48 2
Sunday 7 0 0 2 40 49 9
Tu-We-Th 142 3 0 0 2 147 5
Totals 199 8 42 48 44 341 69
Parameters
k 5 w1 through w24 —weights of variables, corresponding to the
h 1.29 consecutive hours of the day
w1 −w6 0.76 0.00 1.00 0.72 0.18 0.74
w7 −w12 0.61 0.85 0.85 0.50 0.53 0.19
w13 −w18 0.84 0.05 0.73 0.28 0.21 0.00
w19 −w24 0.00 0.00 0.00 0.00 0.00 0.75

or special event days, when people tend to, say, come back home late at night. If
it were so (and, definitely, such an explanation appears to be quite plausible), then
the identification of such a cluster brings, indeed, valuable additional knowledge,
since the traffic on such occasions can, potentially, also be characterized by other,
“special” features (e.g. associated with the degree of safety).
A separate remark can here be made, concerning the treatment of this case in
terms of “anomalies”. In the case of the traffic intensity data, anomalies are typi-
cally detected through the assessment of specially trained operators. The ability to
approximate the partitions of the daily profiles by means of an appropriately tuned
clustering algorithm makes it possible to automate this procedure. The anomaly
detection differs from the classification because the training samples consist nearly
entirely of typical observations. Moreover, a new observation can be anomalous in a
great variety of ways. This case represents a one-class learning problem consisting of
identifying whether a given observation is typical rather than distinguishing between
different classes.
Experiment series 2.
In this series three kinds of clustering algorithms have been used: DBSCAN, the
classical k-means, and the general progressive merger, as defined by the entire Lance-
Williams formula. The evolutionary algorithm that served to find the partition PB ,
composed of clusters Bq , has been developed by one of the present authors (Stańczak
2003).
The use of specialized genetic operators requires the application of a selection
method to execute them in all iterations of the algorithm. The traditional method with
a small probability of mutation and a high probability of crossover is not applicable
4.2 The Experiments 49

Fig. 4.3 Visual interpretation of clusters described in Table 4.2


50 4 The Road Traffic Data

in this case, because the number of operators in use is greater than two and their
properties cannot be easily described as either exploration or exploitation (often
deemed to be realized by, respectively, mutation and crossover operators). In the
approach used here, following Stańczak (2003), it is assumed that an operator that
generates, so far, good results should have a higher probability of execution and more
frequently affect the population. But it is very likely that the operator, which is proper
for one individual, would give worse effects for another one, for instance because
of its location in the domain of possible solutions. Thus, each individual may have
its own preferences. So, each individual has a vector of floating-point numbers in
addition to the encoded solution. Each number corresponds to one genetic operation.
It is a measure of quality of the genetic operator (a quality factor). The higher the
factor, the higher the probability of using the operator. The ranking of qualities
becomes the basis for computing the probabilities of appearance and execution of
the particular genetic operators. The simple normalization of the vector of quality
coefficients turns it into a vector of operators’ execution probabilities. This set of
probabilities can be treated as a base of experience of each individual and according
to it, an operator is chosen in each epoch of the algorithm. Due to the experience
gathered one can maximize the chances of its offspring to survive.
The essential criterion used was simply the number of “misclassified” objects
for a given data set, i.e. with respect to the prior PA (this number being, of course,
minimized), in practical terms equivalent to the original Rand index (due to the
correspondence of clusters and categories). Because of the differences, related to the
implementations of the evolutionary procedures, the content of the parameter vectors
Z, as well as the form of the criterion, somewhat different results have been obtained
than in the series 1 of experiments (although, like in series 1, all variables have been
explicitly weighted, and the distance exponent has varied as well). For brevity, we
will characterize these result below in more general terms.
The results obtained with DBSCAN, parameterized with the number of neighbors
and the maximum distance, have been the poorest among the clustering algorithms
tried out, namely in the “best” partitions, obtained with DBSCAN, not less than 88
objects were misclassified, out of the total of 341.
For the classical k-means, the results, with respect to the criterion assumed, but
also quite intuitively, have been much better: 57 misclassified objects—with the
“optimum” number of clusters equal 5, resulting from the operations, related to the
treatment of particular clusters (empty and close-to-empty ones). One can compare
these results with 64 misclassified objects, shown in Table 4.2 for series 1, when the
“proper” number of clusters, i.e. 5, was actually forced for a better comparability
with the initial partition PA .
In the case of the hierarchical merger procedures the difference with series 1 has
additionally involved the complete parameterization of the procedure, according to
the Lance-Williams formula (five coefficients with varying values, subject only to
some quite “liberal” constraints). The results obtained are comparable with those for
the k-means in terms of their “quality”: 60 misclassified objects (with 6 clusters as
the “optimum” number).
4.3 Conclusions 51

4.3 Conclusions

Substantive interpretation.
First, let us note that the experiments with the road traffic data ended with a success
from the substantive point of view. It was shown that the expert-based categorization
of the daily traffic intensity profiles can be improved for the analysed measurement
station, also keeping, to quite a high extent, to the same basic classification principle,
i.e. reference to the days of the week. These experiments proved that the expert-
defined partition of the daily traffic profiles into 5 clusters, according to days of the
week, cannot be assumed to be the result of a disciplined application of a clustering
algorithm belonging to an assumed class.
The “identified” new class of profiles constituted an interesting subject of addi-
tional analysis, meant to verify whether these are “anomalous” profiles, resulting
from an error or being quite incidental, or rather truly represent a small, “excep-
tional” class of days. In this perspective, the best result for k-means in the second
series of experiments, showing 5 clusters, ought also to be considered as indicative.
All in all, while we certainly have not been dealing with a “classification” case, the
suggested broader interpretation of the reverse clustering approach turned out to be
quite adequate in this case, with PA treated as a “leading hypothesis”, having a partial
relation to the content of the data set X (the initial partition being devised with the data
from X in mind, but along the lines—a definite division of the days of the week—that
are not explicitly associated with the traffic intensity profiles themselves).
Technical aspects.
It turned out obvious that the most important influence on the quality and character
of results obtained was exerted by the choice of the clustering algorithm and then
its parameters. In this particular case hierarchical aggregation and k-means type
algorithms turned out to bring results of similar quality. The local density algo-
rithm DBSCAN appeared as yielding worse results, which is not surprising, as this
algorithm is meant for large sets of data, for which mainly dense local groups are
identified, rather than for smaller sets, for which more refined distinctions may be
significant.
On the other hand, even though optimization with respect to variable weight and
distance exponent proved to be rational in the sense of quite clear choices made—e.g.
zero weights of variables, exponent values well beyond the range of [1, 2]—their
influence on the ultimate results appear to be rather limited.
It can be, therefore, supposed that the criterion of similarity of the two partitions
(in whatever of its forms) might be quite “flat” with respect to the latter variables,
and this could be expected, since usually some variables are to a definite extent
redundant, but the selection of those “leading” ones and the “echoing” ones is not
uniquely performed by the different combinations of evolutionary and clustering
algorithms. Similarly, in many tasks a limited change of the Minkowski distance
exponent has very little influence on the results obtained.
52 4 The Road Traffic Data

As this case was the first one to treat through an extended series of experiments,
the above conclusions, although appearing as quite plausible, were to be verified in
the other cases, reported in the subsequent chapters of the book.

References

Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol.
Comput. 15(1), 4–31 (2011)
Das, S., Mullick, S.S., Suganthan, P.N.: Recent advances in differential evolution–an updated survey.
Swarm Evol. Comput. 27, 1–30 (2016)
De Amorim, R.: Feature relevance in ward’s hierarchical clustering using the Lp Norm. J. Classif.
32, 46–62 (2015). https://doi.org/10.1007/s00357-015-9167-1
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley,
New York (1990)
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., Hornik, K., Studer, M., Roudier, P., Gonzalez
, et al.: R package, version 2, 3 (2015)
Mullen K.M., Ardia, D., Gil, D.L., Windover, D., Cline, J.: DEoptim: An R package for global
optimization by differential evolution. J. Stat. Softw. 40 (6), 1–26 (2011). https://www.jstatsoft.
org/v40/i06/.
Owsiński, J.W., Kacprzyk, J., Opara, K., Stańczak, J., Zadrożny, Sł.: Using a reverse engineering
type paradigm in clustering: an evolutionary programming based approach. In: Torra, V., Dalbom,
A., Narukawa, Y. (eds.) Fuzzy Sets, Rough Sets, Multisets and Clustering, pp. 137–155. Springer,
Heidelberg, 2017a. ISBN 978–3–319–47556–1; https://doi.org/10.1007/978-3-319-47557-8.
Stańczak, J.: Biologically inspired methods for control of evolutionary algorithms. Control Cybern.
32(2), 411–433 (2003)
Storn, R., Price, K.: Differential evolution—a simple and efficient heuristic for global optimization
over continuous spaces. J. Global Optim. 11(4), 341–359 (1997)
Chapter 5
The Chemicals in the Natural
Environment

5.1 The Data and the Background

The subsequent case, treated in the framework of the reverse clustering paradigm,
concerned the content of definite chemicals in the herb layer of a set of administra-
tive units in Germany. A broader description of this particular study is provided in
Owsiński et al. (2017b).
This illustrative case study was based on the data, provided by the courtesy of Dr
Rainer Brüggemann (see Brüggemann et al. 1998). The particular data set obtained
by us has the convenience of having been the subject of several analytical and compar-
ative studies, this adding to its value as the testbed (see, for instance, De Loof et al.
2008, or Brüggemann, Mucha and Bartel 2012), as well as to the comparative value
of the present example.
The data reflect the selected chemical characterizations, namely herb layer pollu-
tion levels in terms of total concentrations of four chemical elements: Pb, Cd, Zn
and S (corresponding to variables) in mg/kg of dry weight, for n = 59 areas, deemed
uniform with this respect, in Baden-Württemberg in Germany. The entirety of the
data, used in the illustrative calculations, is provided in Table 5.1.
In this study, we treat the “areas” inside the land of Baden-Württemberg as the
objects. These objects are described by the data given in Table 5.1. We are looking for
a categorisation of the areas according to the levels of pollution (chemical content),
shown in Table 5.1.
Yet, these data, by themselves, do not provide nor contain any initial partition
nor classification of the areas, just the variable values. Thus, for purposes of this
exercise, a simple, but straightforward hypothesis was formulated that it is possible
to classify the areas on the basis of perhaps just one or two indicators (chemical
element concentrations), out of the four available. The feasibility of such a hypothesis
was verified by plotting the values from the table in an increasing order along the
“areas” for each of the elements separately. The results are shown here in the four
consecutive figures (i.e. Figs. 5.1, 5.2, 5.3 through 5.4).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 53


J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_5
54 5 The Chemicals in the Natural Environment

Table 5.1 Pollution data for Baden-Württemberg (Germany), used in the exemplary calculations:
total concentrations, in mg/kg of dry weight (Pb-Lead, Cd-Cadmium, Zn-Zinc, S-Sulphur)
Areas Pb Cd Zn S Areas Pb Cd Zn S
6 1 0.07 29 1750 36 1.2 0.05 31 1570
8 1.5 0.07 27 1750 46 0.8 0.09 33 1680
7 1.2 0.09 28 1600 50 1.4 0.13 29 1730
17 0.6 0.06 36 1820 53 1 0.12 36 1750
9 0.09 0.2 850 580 45 1.5 0.17 45 1780
16 1 0.12 32 1520 54 0.7 0.1 26 1750
22 1 0.03 28 2150 59 1.3 0.13 26 1470
18 0.5 0.43 28 4030 60 1 0.2 32 2160
30 0.8 0.08 27 1610 58 1 0.11 28 1980
23 1.1 0.04 42 2000 57 1.7 0.15 39 1850
15 0.9 0.1 24 1670 35 0.08 0.24 720 1960
14 1 0.17 34 1830 34 0.14 0.39 950 400
5 1.1 0.1 32 1990 33 0.16 0.26 800 530
28 0.9 0.05 34 1670 25 0.9 0.09 35 1460
39 1 0.1 38 1740 12 0.16 0.23 910 1460
40 0.7 0.06 34 1770 21 0.06 0.24 830 620
29 0.6 0.14 27 1680 11 0.9 0.08 27 1720
41 0.7 0.17 39 1840 2 0.7 0.14 27 1770
42 0.7 0.1 33 1690 1 1 0.04 21 1540
27 0.1 0.12 26 1600 10 1 0.03 29 1780
38 1.7 0.18 34 1720 20 1.5 0.14 32 1730
49 0.8 0.11 37 1680 24 1.7 0.18 39 1740
37 0.6 0.12 33 1580 31 1.1 0.15 28 1740
47 1.1 0.11 25 1650 32 1.2 0.03 35 1820
48 2.3 0.42 33 1600 19 0.8 0.01 18 4030
51 0.8 0.14 22 1640 43 0.5 0.11 39 4030
4 0.8 0.02 26 1790 44 0.8 0.08 38 1800
3 0.8 0.14 31 1710 52 2 0.23 36 4030
13 0.18 0.18 1160 350 56 1 0.11 34 1970
26 0.8 0.05 19 1620
Source Lfu Baden Württemberg, after Brüggemann et al. (1998), courtesy of Dr Rainer Brüggemann

When looking at Figs. 5.1, 5.2, 5.3 through 5.4 one should remember that in each
of these illustrations the objects are ordered according to the value of concentration
of the given element, and so in each case the horizontal axis corresponds to a different
sequence of objects (i.e. “areas”).
5.1 The Data and the Background 55

Pb levels
2,5

2
Pb levels

1,5
Pb levels
1

0,5

0
Areas
along
increasing
Pb levels
Areas

Fig. 5.1 Concentration levels for Pb: areas in the order of increasing Pb concentrations

Values for Cd
0,5

0,45

0,4

0,35
Cd values

0,3

0,25

0,2

0,15

0,1

0,05

Fig. 5.2 Concentration levels for Cd: areas in the order of increasing Cd concentrations

The character of the respective distributions is also illustrated in two following


figures, Fig. 5.5a and b. They show the respective histograms and the pairwise distri-
butions. It is clear that there are essential differences between the distributions, on the
one hand, of Zn and S, and, on the other hand, Pb and Cd. Already this observation
provides an important information, which might be of high value in terms of envi-
ronmental policy, but we intend to go in our study well beyond this straightforward
conclusion.
56 5 The Chemicals in the Natural Environment

Zn concentrations
1400

1200

1000
Concentrations

800
Zn concentrations
600

400

200

0
Areas along Zn
concentrations

Fig. 5.3 Concentration levels for Zn: areas in the order of increasing Zn concentrations

S concentrations
4500

4000

3500
3000

2500
S concentrations
2000

1500

1000

500

0
Areas along
concentration
of S

Fig. 5.4 Concentration levels for S: areas in the order of increasing S concentrations

5.2 The Procedure: Determining the Partition P A

On the basis of the illustrations provided, and especially Figs. 5.1, 5.2, 5.3 through
5.4 it can be easily seen that, indeed, the levels for Zn and S appear to indicate
clearly distinct “categories” (of “areas”), respectively, for these two elements, two
and three of such hypothetical categories (this is also indicated by the different
colours appearing in Fig. 5.5). One might, therefore, expect that there exist (at most)
two times three = six mixed categories, based on values, corresponding to these two
elements, say, defined by the conditions:
5.2 The Procedure: Determining the Partition PA 57

Fig. 5.5 The distribution of points (“areas”) in the space of concentrations for a particular elements
and pairwise; b enlarged for Zn and S (upper box) and of Pb and Cd (lower box); see the text further
on for the interpretation of colours
58 5 The Chemicals in the Natural Environment

Table 5.2 Numbers of areas in the classes, defined for the elements Zn and S contents in the herb
layer
Elements and their concentrations S
<1000 1000–2500 >2500
Zn <100 0 48 4
>100 5 2 0

for Zn: 1: < 100 and 2: > 100,


and
for S: 1: < 1000, 2: between 1000 and 2500, and 3: > 2500.
After applying these conditions to the data from Table 5.1, one obtains, actually,
only four categories, since two out of the potential six are empty (for the numbers of
areas in the particular thus defined classes, see Table 5.2).
Thus, having the 59 areas partitioned among 4 categories (say, 5 areas in category
1, 48 in category 2, 2 in category 3, and 4 in category 4, according to Table 5.2),
that is—having the thus determined partition PA , we could perform the exercise,
consisting in the attempt of obtaining clusters that would be as close to these cate-
gories as possible. If we obtained such clusters, well approximating the categories,
especially if we did this on the basis of data other than those for the two elements, used
to define the initial partition, it would mean that, on the one hand, the categories are
sound, and, on the other hand, they can be reconstructed effectively by the inverse
clustering approach, providing the basis for a hypothetical broader mechanism of
categorization. Figure 5.5, however, suggests that this might, indeed, be difficult to
achieve.

5.3 The Procedure: Reverse Clustering

For purposes of this book we shall only cursorily illustrate the exercises and their
results, mainly in order to show the functioning and the effectiveness of the approach,
as well as to indicate the possibility of drawing the substantive conclusions.
Series 1 of calculations.
The first series of calculations was performed with the evolutionary algorithm, devel-
oped by one of the authors (Stańczak 2003). The calculations were performed with k-
means, hierarchical agglomerative and DBSCAN clustering algorithms. The param-
eter values sought for each of those were (besides the algorithmic parameters them-
selves) the weights of the variables, the exponent of the Minkowski distance, and the
number of clusters.
The k-means algorithm. For the k-means method, “optimized” for the data set X,
concerning all four chemical elements, the weights of the respective variables were:
5.3 The Procedure: Reverse Clustering 59

Table 5.3 Contingency table for the partition PA assumed and the one obtained in Series 1 of
calculations, PB , with the k-means algorithm and data only for Pb and Cd
Categories: obtained, PB : Initial, PA : 1 2
1 5 0
2 0 48
3 2 0
4 1 3

Pb: 0.28, Cd: 0.06, Zn: 0.40, S: 0.26. This result comes, indeed, to some extent as
surprise, since the weights of Pb and Cd could have been close to or equal to zero.
Evidently, however, the perfect fit of PA and PB could be achieved without zeroing
of the weights of two “additional” variables. The Minkowski exponent obtained was
0.49, the number of clusters was, indeed, 4, with no misclassifications, meaning that
the assumed partition PA was reconstructed perfectly.
However, if we gave up the two variables, underlying the initial partition, and
optimized only with respect to the limited data set X based on data for Pb and Cd,
the results got, naturally, worse. The weights of the two variables were, respectively,
0.45 and 0.55, the value of the Minkowski exponent was 3.54. Only two clusters were
obtained, with 6 misclassified areas. The respective contingency matrix is provided
in Table 5.3.
Thus, even if 6 objects (10%) were misclassified and instead of four—two clusters
were obtained, the result is striking in that not only the partition obtained for a
different data set than the one used to determine PA is so close to the original, but
also only one of the original clusters (no. 4) got split between those two forming the
partition PB .
The hierarchical aggregation algorithm. The results, obtained with the use of the
general agglomerative clustering algorithm, in the case, when all the variables were
accounted for, were also perfect in terms of lack of misclassifications, and the number
of clusters was four. However, since a different type of algorithm, with different
parameters, was used, the details of the solution obtained were also different. Thus,
for instance, the weights of the variables were: Pb: 0.000, Cd: 0.333, Zn: 0.333,
and S: 0.333 (or, alternatively, Pb: 0.333, Cd: 0.000, Zn: 0.333, and S: 0.333, also
without misclassifications). Here, as before, it appears that it was not necessary to
assign non-zero weights only to the variables, associated with Zn and S, in order to
obtain the perfect reconstruction of the assumed partition PA . (In addition, of course,
the coefficients of the Lance-Williams formula were also obtained.)
Now, for the case, when only the data for Pb and Cd were used, again, the results
contained definite misclassifications (altogether five of them), and the number of
clusters this time was 5, as this is shown in Table 5.4.
DBSCAN. The third clustering method tried out was DBSCAN. This method
led to the worst results for the case of all four variables considered, although two
misclassifications do still constitute quite a plausible result. Three clusters were
obtained, and the respective contingency matrix is given in Table 5.5.
60 5 The Chemicals in the Natural Environment

Table 5.4 Contingency table for the partition PA assumed and the one obtained in Series 1 of
calculations, PB , with the hierarchical aggregation algorithm and data only for Pb and Cd
Categories: obtained, PB : Initial, PA : 1 2 3 4 5
1 5 0 0 0 0
2 0 47 1 0 0
3 2 0 0 0 0
4 0 1 0 2 1

Table 5.5 Contingency table for the partition PA assumed and the one obtained in Series 1 of
calculations, PB , with the DBSCAN algorithm and data for all four elements
Categories: obtained, PB : Initial, PA : 1 2 3
1 5 0 0
2 0 48 0
3 2 0 0
4 0 0 4

It is interesting to note that one of the original categories, namely A3 , was not
identified by this method in the case considered, and was, in fact, aggregated with
the original category A1 . The results for the second case—when two variables, not
considered in the establishment of PA , were accounted for, i.e. Pb and Cd, were
identical in terms of clusters to those characterized in Table 5.5.
Series 2 of calculations.
This series of experiments was performed with the use of the differential evolution
algorithm from the R library. We shall report here only the results from the hier-
archical merger algorithm, which fared the best in Series 1, but, definitely, quite
differently in Series 2. Namely, as many as 13 clusters were obtained! Actually, side
by side with one big cluster, corresponding to the initial category 2, all the remaining
clusters contained just one object each, according to the contingency table, Table
5.6.
This would confirm (and extend) the treatment of the objects not included in
the big, dominating cluster, suggested by DBSCAN in series 1, as the “outliers”.

Table 5.6 Contingency table for the partition PA assumed and the one obtained in Series 2 of
calculations, PB , with the hierarchical merger algorithm and data for all four elements
Categories: obtained:initial: 1 2 3 4 5 6 7 8 9 10 11 12 13
1 0 1 0 0 0 1 0 1 1 0 1 0 0
2 46 0 0 1 1 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 0 0 1 0 0 0
4 1 0 1 0 0 0 0 0 0 0 0 1 1
5.3 The Procedure: Reverse Clustering 61

Definitely, upon the visual inspection of Figs. 5.1, 5.2, 5.3 through 5.4, as well as
Fig. 5.5, one might have the impression that there is, indeed, rather little premise for
any regularity as to potential groups outside of the dominating one.

5.4 Discussion and Conclusions

The case here presented is quite specific, first of all in that the partition PA was not
obtained from an “external” source, but resulted from the analysis of data in the form
of an explicit hypothesis as to what the respective partition might look like (based on
two out of four variables, for which the data were available). The results obtained in
terms of PB were, actually, beyond expectations. They were obtained for two quite
distinct sub-cases, namely.
(I) when reverse clustering was run for the complete set of data—i.e. for all four
variables; in this case we can postulate that the partition PA is quite closely
associated with the data set X, being, in fact a direct reflection of a part of it;
and
(II) when the reverse clustering was run for the other two variables than those,
which served to produce partition PA ; in this situation we may rightly postulate
that this partition has “nothing to do” with the analysed data set X.
It ought to be very strongly emphasized that in both cases very promising results
were obtained—i.e. promising in terms of the ultimate goal of the whole exercise:
determination of categories of “areas” for the actions, related to contamination of
their ecosystems.
It appears that this exercise is not quite “in line” with the inner logic of the reverse
clustering, since it is quite far from the “classification paradigm”. In fact, this case
appears to be closer to correlation analysis, factor or principal component analysis—
but this, exactly, demonstrates that the reverse clustering paradigm can be indeed
applied to a very wide scope of analytic situations.
Another comment, which is motivated by this particular study, is associated with
the interpretation of the results against the background of the full use of the vector Z,
especially its component, related to variable choice or weights, and the Minkowski
exponent. It can be justly pointed out that such an intervention changes to a very high
extent the initial geometry of the given data set. This is definitely true, and ought to be
accounted for, when substantive interpretation of the results is formulated. However,
given that we do not know, in principle, the origin of PA , we are justified in trying to
figure out the various shapes and subspaces, in which we might be getting closer to
this partition. In any case, we are not introducing any artificial divisions and twists
in space (like locally valid distances, e.g. changing from cluster to cluster).
62 5 The Chemicals in the Natural Environment

References

Brüggemann, R., Voigt, K., Kaune, A., Pudenz, S., Komoßa, D., Friedrich J.: Vergleichende ökol-
ogische Bewertung von Regionen in Baden-Württemberg. GSF-Bericht 20/98. GSF, Neuherberg
(1998)
Brüggemann, R., Mucha, H.J., Bartel, H.G.: Ranking of polluted regions in South West Germany
based on a multi-indicator system. MATCH Commun. Math. Comput. Chem.69, 433–462 (2012)
De Loof, K., De Baets, B., De Meyer, H., Brüggemann, R.: A hitchhiker’s guide to poset ranking.
Comb. Chem. High Throughput Screening 11, 734–744 (2008)
Owsiński, J.W., Opara, K., Stańczak, J., Kacprzyk, J., Zadrożny, S.: Reverse Clustering. Toxicolog-
ical & Environmental Chemistry, An Outline for a Concept and Its Use (2017). https://doi.org/
10.1080/02772248.2017.1333614
Stańczak, J.: Biologically inspired methods for control of evolutionary algorithms. Control Cybern.
32(2), 411–433 (2003)
Chapter 6
Administrative Units, Part I

6.1 The Background: Polish Administrative Division


and the Province of Masovia

This chapter and the next one are devoted to the experiments related to the data,
concerning Polish administrative units (some of the early results, reported in these
two chapters, have been already provided in Owsiński et al. 2018). The administrative
breakdown of Poland is, in its essence, a three-tier one:
– the whole country is divided into 16 provinces (voivodships), for purposes of the
EU statistics, basically, classified as NUTS 2 regions,
– the provinces, in turn, are divided into counties (poviats)—altogether 380 such
units, formally constituting the local administrative units (LAU) of level 1, which,
for purposes of the EU statistics are grouped into NUTS 3 regions, and, finally,
– the counties are divided up into municipalities (or communes, gminas)—alto-
gether some 2,500 of them, for purposes of the EU statistics referred to as LAU 2.
There are formally also yet smaller units, having some administrative functions,
corresponding roughly to villages in the countryside and to quarters in towns, as
well as units bigger than provinces, entirely “virtually” formed for statistical purposes
only, but neither of these is usually accounted for in terms of socio-economic or other
kinds of analyses (although the smallest units, the distinct parts of municipalities,
do play an administrative role). Out of 380 counties 66 are constituted by single
towns—the county-towns, and in this case a municipality is in a way identical with
the county. For these towns, there exist also the so-called landed counties, i.e. the
area around, outside of the town, divided into corresponding municipalities.
Given that Poland is a country of some 38.5 million inhabitants and its surface
is 312,000 km2 , an average municipality has several thousand residents and some
150 sq km of surface, meaning that it is equivalent to a circle of roughly 7 km
of radius. Yet, of course, municipalities are very strongly differentiated—mainly
along the urban–rural axis. They are formally, administratively categorised into three
categories: urban, rural, and urban–rural. The last category is composed of two-part

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 63


J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_6
64 6 Administrative Units, Part I

municipalities, namely the ones consisting of an urban and a rural part, governed
to some extent together. Such municipalities are usually formed when the urban
part is constituted by a really very small town. There are—as of the time of this
writing—1533 rural municipalities, 302 urban municipalities and 642 urban–rural
ones.
The fact that a locality is formally a town is the effect of a political and historical
process that is, of course, correlated with the character of the locality, but at the
edge of the category (smallest towns and biggest rural municipalities) one can easily
encounter definite inconsistencies. Thus, just to give an example—the population-
wise biggest rural municipality has some 25,000 inhabitants, while the smallest
formal town has only 330 of them! (Of course, this exceptionally small town is a
part of an urban–rural commune. The fact that it currently has the status of a town is
entirely due to historical reasons.)
Hence, we can already see a motivation for a study of the reverse clustering
type: the initial partition, PA being the one resulting from the formal categorisa-
tion of communes into three categories mentioned, and the data set X appropriately
describing the character of the communes analysed. It would certainly be interesting
to see how far the potential reconstruction of this PA may go and what are the
differences between PA and PB and their potential reasons.
We shall be dealing with this problem in the present chapter. However, first, we
shall be dealing with it at the provincial level, i.e. for the set of municipalities of a
single province, but, second, we shall also deal, at the same level, with a different
problem, in which the initial partition shall be determined by the experts in the field.
The province, for which the here reported experiments shall be performed, is the
biggest of Polish provinces, Masovia, its capital, Warsaw, being at the same time the
capital of Poland. For purposes of characterisation of the set of objects we shall be
dealing with here (the municipalities of Masovia), we bring in Table 6.1, containing
the data on these municipalities, presented in the setting, which was established by
the partition into functional types of municipalities, mentioned above, determined
by the experts from the Institute of Geography and Spatial Organization of the Polish
Academy of Sciences, mainly for reasons, related to planning.
Thus, in the light of Table 6.1 we can see that the task, related to the reconstruction
of the administrative categorisation of municipalities means trying to classify the
total of 314 municipal units in three categories, while the other task—namely to
reconstruct the categorisation, presented in Table 6.1, would amount to trying to
classify these 314 units in 9 categories.

6.2 The Data

Regarding the first of the tasks mentioned it is obvious that there are no “a priori” given
corresponding data, forming the set X and so it is up to the analyst to conceive the
most appropriate characterisation of the municipalities that would possibly allow for
their categorisation in the administrative framework, and at the same time constitute
6.2 The Data 65

Table 6.1 Functional typology of municipalities of the province of Masovia (data as of 2009)
Type Number of Population number Area Population
Description of Name units In thousand % In (km2 ) density (persons
municipalities towns per 1 km2 )

1. Core of the MS 1 1 714.4 100.0 517 3 315


national and
provincial
capital
(Warsaw)
2. Suburban zone PSI 27 725.1 72.9 1 297 559
of Warsaw
3. Outer PSE 31 393.0 33.6 2 897 136
suburban zone
of Warsaw
4. Cores of the MG 5 526.4 100.0 293 1 797
urban areas of
subregional
centres
5. Suburban PG 20 182.8 4.5 2 236 82
zones of
subregional
centres
6. County seats MP 22 433.9 82.7 1 871 232
7. Intensive O 29 241.1 24.7 3 529 68
development
of non-
agricultural
functions
8. Intensive R 112 615.1 4.9 13 912 44
development
of farming
9. Extensive E 67 390.3 4.2 9 006 43
development,
mainly
farming
Totals 314 5 222.1 64.6 35 558 147
Source Courtesy of Śleszyński and Komornicki (2009), Institute of Geography and Spatial
Organization of the Polish Academy of Sciences

a possibly wholesome, even though “shorthand”, description of these municipalities.


With this in mind, we used the set of features as given in Table 6.2.
It must be noted that an effort has been obviously made to reflect in the set of
selected variables such essential characteristics of the communal units as: (a) the
degree and the nature of their urban / rural character, (b) their demographic features,
and (c) their socio-economic characteristics. In addition, no absolute quantities are
contained in the vectors x i , i = 1,…,314, as specified in Table 6.2, like, say, total
population, area of a municipality, sales value etc.
66 6 Administrative Units, Part I

Table 6.2 Variables describing municipalities, accounted for in the study


No. Variable No. Variable
1 Population density, persons / sq. km 10 Share of registered employed persons, %
2 Share of agricultural land, % 11 Number of businesses per 1,000
inhabitants
3 Share of overbuilt areas, % 12 Average employment in a business
indicator
4 Share of forests, % 13 Share of businesses from manufacturing
and construction, %
5 Share of population in excess of 60 years, 14 Number of pupils and students per 1,000
% inhabitants
6 Share of population below 20 years, % 15 Number of students in above primary
schools per 1,000 inhabitants
7 Birthrate for the last 3 years 16 Own revenues of the municipality per
capita
8 Migration balance rate for the last 3 years 17 Share of revenue from Personal Income
Tax in municipal revenues, %
9 Average farm acreage indicator, hectares 18 Share of expenditures into social care in
total of expenditures from municipal
budget, %
Source Own elaboration

The situation is different with respect to the initial partition PA as characterized in


Table 6.1, i.e. the expert-provided partition into functional types of municipalities.
This partition was performed on the basis of a definite set of data, according to a well-
described analytical procedure. Yet, for purposes of our experiments, we adopted the
same data set as presented in Table 6.2. This decision was based on two important
premises:
1. The procedure, which led to the determination of the functional types from
Table 6.1, was not a simple “linear” analytic procedure, based on a unified
set of data—it involved a number of decision points, depending upon specific,
threshold like, or nominal values;
2. We wished to preserve as much as possible of comparability with the other case
here analysed, that of the administrative categorization.
6.3 The Analysis Regarding the Administrative Categorization of Municipalities 67

6.3 The Analysis Regarding the Administrative


Categorization of Municipalities

We shall provide here the results for some selected experiments, with the distinction of
the clustering methods and the evolutionary search methods. And so, the contingency
table for the best overall result, attained with the evolutionary algorithm of one of
the present authors (Stańczak 2003), is shown in Table 6.3.
This result comes, indeed, as striking in its clarity: there is no misspecification
between the urban and rural municipalities, while the mixed character of the urban–
rural ones definitely calls in many concrete situations for an adjustment or correction
(involving, though, all of the three categories). For comparison, but also in order
to corroborate the above result, we provide the one obtained with the use of the
hierarchical aggregation algorithm in Table 6.4, similarly striking as to its clarity
and facility of interpretation.
Finally, let us also quote the same contingency matrix for the DBSCAN algorithm
(Table 6.5).
Concerning the total number of objects assigned to particular classes in the last
case (313 instead of 314) it must be remembered that DBSCAN classifies explicitly
some of the objects as “outliers”, not necessarily “pushing” them into the classes
determined.

Table 6.3 Contingency matrix for the administrative breakdown of municipalities of the province
of Masovia in Poland and reverse clustering performed with own evolutionary algorithm using
k-means
Clusters: Obtained, PB → 1 2 3 Totals
Initial, PA ↓
1. Urban municipalities 33 0 2 35
2. Rural municipalities 0 217 11 228
3. Urban–rural municipalities 1 20 30 51
Totals 34 237 43 314

Table 6.4 Contingency matrix for the administrative breakdown of municipalities of the province
of Masovia in Poland and reverse clustering performed with own evolutionary algorithm using
hierarchical aggregation
Clusters: Obtained, PB → 1 2 3 Totals
Initial, PA ↓
1. Urban municipalities 34 0 1 35
2. Rural municipalities 0 216 12 228
3. Urban–rural municipalities 0 22 29 51
Totals 34 238 42 314
68 6 Administrative Units, Part I

Table 6.5 Contingency matrix for the administrative breakdown of municipalities of the province
of Masovia in Poland and reverse clustering performed with own evolutionary algorithm using
DBSCAN
Clusters: Obtained, PB → 1 2 3 Totals
Initial, PA ↓
1. Urban municipalities 35 0 0 35
2. Rural municipalities 3 216 8 227
3. Urban–rural municipalities 4 25 22 51
Totals 42 241 30 313

Thus, although the results, obtained for the own evolutionary method with the
k-means algorithm proved to be, in this case, on a par with the result from the hier-
archical aggregation algorithm, all three algorithms clearly indicated the vagueness
of the “urban–rural” category and the need of treating it in a separate perspective.
Just for the sake of comparison let us quote two results, obtained with the differ-
ential evolution (DE) algorithm, one for the “pam” (partitioning around medoids)
algorithm, from the k-means family, provided in Table 6.6, and another, for the
hierarchical aggregation “agnes” algorithm, provided in Table 6.7.
With respect to the results from the DE algorithm it must be pointed out that
the one obtained with “agnes” had a better value of the Rand and adjusted Rand
index. In any case, these results emphasised once more the uncertain status of the
“urban–rural” category of municipalities.
This status may also be well illustrated by the diagram in Fig. 6.1, showing the
locations of the objects, according to their assignment to the three initial categories,

Table 6.6 Contingency matrix for the administrative breakdown of municipalities of the province
of Masovia in Poland and reverse clustering performed with DE algorithm using “pam”
Clusters: Obtained, PB → Initial, 1 2 Totals
PA ↓
1. Urban municipalities 32 3 35
2. Rural municipalities 29 199 228
3. Urban–rural municipalities 19 32 51
Totals 80 234 314

Table 6.7 Contingency matrix for the administrative breakdown of municipalities of the province
of Masovia in Poland and reverse clustering performed with DE algorithm using “agnes”
Clusters: Obtained, PB 1 2 3 4 5 6 7 8 Totals
→ Initial, PA ↓
1. Urban 16 5 5 1 6 1 0 1 35
municipalities
2. Rural 0 206 18 0 0 3 1 0 228
municipalities
3. Urban–rural 1 36 12 0 0 2 0 0 51
municipalities
Totals 17 247 35 1 6 6 1 1 314
6.3 The Analysis Regarding the Administrative Categorization of Municipalities 69

Fig. 6.1 Data on municipalities of the province of Masovia with administrative categorisation into
three categories on the plane of the first two principal components (colours refer to the results from
Table 6.6). Note See, e.g., Comrey and Lee (1992)

on the plane of the first two principal components. In this figure, numbers correspond
to the initial categorisation into three administrative categories, while colours—to the
reverse clustering results, reported in Table 6.6. It is particularly well visible how the
(majority of the) urban municipalities distinguish themselves from the “cloud” to the
right, within which it is, definitely, difficult to discern the two remaining categories.
Another illustration is provided by the map of the province of Masovia, shown
in Fig. 6.2, corresponding to the best of the results here reported (see Table 6.3).
The red units, assigned to the “urban” category 1, form a broader area in the centre,
corresponding to Warsaw and some of its neighbouring municipalities, as well as
dispersed “urban islands” throughout the entire province. It is highly telling, though,
that the yellow units, corresponding to “something, which is neither urban (1) nor
rural (2)” do also tend to form compact areas, including very distinct ones in the
vicinity of Warsaw. This would imply that indeed, the approach interprets these
units as intermediate, but in a different sense from that of the official administrative
breakdown, since an important part of them appears to have a suburban character,
while some other correspond to the highly productive farming areas. It should be
added at this point that all of the communes considered were characterised by their
70 6 Administrative Units, Part I

Fig. 6.2 Map of the province of Masovia with the indication of the municipalities classified in
three clusters resulting from the reverse clustering according to the data from Table 6.3. Red area
in the middle corresponds to Warsaw and its neighbourhood, the bigger red blobs correspond to
subregional centres (Radom, Płock, Siedlce and Mińsk Mazowiecki)

overall features, and no distinction was made of the potential urban and rural parts,
regarding the cases of the urban–rural municipalities.
It may be interesting at this point to also shortly characterise some of the remaining
aspects of the found composition of the vector Z, especially regarding weights of
variables and the Minkowski exponent. Thus, concerning the weights of variables in
the calculations performed with the own evolutionary algorithm it can be stated that
the weights of variables displayed high lability, presumably in view of the numerous
high correlations among them (different variables possibly representing variable
groups). Yet, in the majority of runs the variable weight values could be quite clearly
classified into three groups of importance, as this is shown in Table 6.8 (weights of
6.3 The Analysis Regarding the Administrative Categorization of Municipalities 71

Table 6.8 Examples of variable weights for two runs of calculations, presented in Tables 6.3 and
6.4
Calculations illustrated in: Most important Important Unimportant variables
variables variables
Nos. of variables from Table 6.2; weight values
Table 6.3 (k-means) No. 15; 0.178 No. 3; 0.109 Remaining variables;
0.000–0.099
Table 6.4 (hierarchical Nos. 1, 14; 0.436, – Remaining variables;
aggregation) 0.319 0.001–0.030

all variables add to 1). The values of the Minkowski exponent ranged generally from
2 upwards to not quite 4.

6.4 A Verification

In the context of the study, described in the preceding section, a kind of verification
was performed. Namely, the vector of clustering parameters, Z, which was found
to perform the best in the case of the province of Masovia, was applied directly to
the data for some other province. Out of the 15 remaining Polish provinces the one
of Wielkopolska, with its capital in Poznan, was selected. This choice is justified
by the fact that Wielkopolska is also a relatively large province, composed of 226
municipalities, featuring quite a diversity of characteristics of these municipalities,
with a large agglomeration as its capital.
It can be said that what we were looking for in this verification exercise was the
comparison of PA (Wielkopolska) and P(X Wielkopolska ,Z Masovia ). The respective result
for Z Masovia , which was established for the k-means algorithm, is shown in Table 6.9.
Because both k-means and hierarchical agglomeration fared virtually equally well
in the study of Masovia, also the results for the clustering of the municipalities
of Wielkopolska with the vector Z, determined for Masovia for the hierarchical
agglomeration algorithm are shown here, in Table 6.10.

Table 6.9 Contingency matrix for the administrative breakdown of municipalities of the province
of Wielkopolska in Poland and clustering performed with the Z vector obtained for Masovia in the
case shown in Table 6.3 (k-means algorithm)
Clusters: Obtained, PB → 1 2 3 Totals
Initial, PA ↓
1. Urban municipalities 18 1 0 19
2. Rural municipalities 0 95 20 115
3. Urban–rural municipalities 4 43 45 92
Totals 22 139 65 226
72 6 Administrative Units, Part I

Table 6.10 Contingency matrix for the administrative breakdown of municipalities of the province
of Wielkopolska in Poland and clustering performed with the Z vector obtained for Masovia in the
case shown in Table 6.4 (hierarchical aggregation algorithm)
Clusters: Obtained, PB → 1 2 3 Totals
Initial, PA ↓
1. Urban municipalities 18 1 0 19
2. Rural municipalities 0 109 6 115
3. Urban–rural municipalities 0 74 18 92
Totals 18 184 24 226

It can easily be seen that the results are almost as good as for Masovia – the
misclassifications between “urban” and “rural” municipalities being indeed very
rare. Thus, the clustering procedure, determined through the evolutionary algorithm
for Masovia turned out to be well applicable to another, though in general terms
similar, data set. Hence, the verification procedure confirmed the potential of the
reverse clustering paradigm.

6.5 The Analysis Regarding the Functional Categorization


of Municipalities

In this section we shall show the results for the second experiment, in which the func-
tional categories of municipalities, characterised in Table 6.1, were treated as forming
PA , and reverse clustering was applied to reconstruct them. Somewhat surprisingly,
these results were not as strikingly clear as in the preceding case, even though it
could rightly be said that the partition PA was definitely related to the data, charac-
terising the municipalities, even if: (a) the data, used in designing the typology were
somewhat different, (b) the procedure, as mentioned, was quite not straightforward,
involving decision points and (actually) nominal variables, and (c) the number of
categories to reconstruct was definitely higher.
Now, then, Table 6.11 shows the contingency matrix for this case, as obtained for
the own evolutionary method and the k-means algorithm, while Table 6.12 contains
the same for the hierarchical aggregation algorithm.
In this case, again, the k-means algorithm gave better results than the other two
algorithms accounted for, also including the identification of the “correct” number of
clusters. The comparison with the hierarchical clustering is provided by Table 6.12,
where also the somewhat more complicated correspondence between the Bq and Aq
is indicated. Notwithstanding the number of “errors” (close to 1/3 of all objects are
“misclassified” when hierarchical aggregation is applied), we can speak here of a
qualitative reconstruction of the functional typology of municipalities. Yet, even this
debatable result—its reasons having been mentioned above—provides quite some
light on the issue of “functional typology”.
6.5 The Analysis Regarding the Functional Categorization of Municipalities 73

Table 6.11 The contingency matrix for the functional typology of municipalities of Masovia from
Table 6.1 and reverse clustering with own evolutionary method using the k-means algorithm
Categories: obtained, PB → 1 2 3 4 5 6 7 8 9 Totals
initial, PA ↓
1 (MS) 1 0 0 0 0 0 0 0 0 1
2 (PSI) 1 19 1 1 5 0 0 0 0 27
3 (MP) 0 0 16 2 1 0 3 0 0 22
4 (MG) 0 0 1 4 0 0 0 0 0 5
5 (PSE) 0 2 1 2 13 11 2 0 0 31
6 (PG) 0 0 0 0 1 16 1 1 1 20
7 (O) 0 0 1 1 1 8 11 3 4 29
8 (E) 0 0 0 0 0 8 3 49 7 67
9 (R) 0 0 0 0 0 9 0 8 95 112
Totals 2 21 20 10 21 52 20 61 107 314

Table 6.12 The contingency matrix for the functional typology of municipalities of Masovia from
Table 6.1 and reverse clustering with own evolutionary method using hierarchical aggregation
algorithm
Categories: obtained, PB → 1 2 3 4 5 6 Totals
initial, PA ↓ (1 + 2) (3 + 4) (5) (6) (7 + 8) (9)
1 (MS) 1 0 0 0 0 0 1
2 (PSI) 24 1 2 0 0 0 27
3 (MP) 0 18 1 0 3 0 22
4 (MG) 0 5 0 0 0 0 5
5 (PSE) 5 0 21 1 3 1 31
6 (PG) 0 0 6 8 4 2 20
7 (O) 0 4 1 0 17 7 29
8 (E) 0 0 0 4 51 12 67
9 (R) 0 0 1 0 16 95 112
Totals 30 28 32 13 94 117 314

Thus (see Table 6.11), first, some of the initial categories are being reconstructed
with no or only quite limited doubt. These are:
• the core of the provincial (and national) capital, MS
• the cores of the subregional centres, MG
followed by.
• the suburban zones of the subregional centres, PG, and, finally
• the county seats, MP.
74 6 Administrative Units, Part I

Table 6.13 The contingency matrix for the functional typology of municipalities of Masovia from
Table 6.1 and reverse clustering with DE using “pam” algorithm
Categories: obtained, PB → 1 2 3 4 Totals
initial, PA ↓
1 (MS) 1 0 0 0 1
2 (PSI) 15 11 1 0 27
3 (MP) 2 8 11 1 22
4 (MG) 1 4 0 0 5
5 (PSE) 15 2 12 2 31
6 (PG) 2 0 9 9 20
7 (O) 0 1 21 7 29
8 (E) 2 0 17 48 67
9 (R) 1 0 28 83 112
Totals 39 26 99 150 314

With respect to the county seats, MP, which have been “recognised” with a not
bad rate, let us mention that if we do not apply any nominal distinctions—these urban
centres tend, in an obvious manner, to be intermingled with other urban, urban-like
and even the exceptional rural units. Against this background, even the seemingly
heavily biased distinctions among the rural units (other than the suburban ones) come
out as relatively well-founded, this being particularly true for the ones characterised
as featuring intensive farming (R). Regarding the suburban zone of Warsaw there
is, definitely, a doubt as to its actual reach (note that the data set here used does not
include any variable, describing the actual connections between the units, like, e.g.,
job and school commuting, shopping etc.). Such doubt is a natural phenomenon,
since various criteria can be applied in order to determine the reach of the suburban
zones.
For purposes of comparison, we shall also quote the analogous contingency
tables, obtained with the DE method for both k-means like (“pam”) and hierarchical
agglomeration (“agnes”) algorithms, provided in Tables 6.13 and 6.14.
Of particular interest is the result shown in Table 6.14, suggesting an actual return
to a three-partite categorisation, namely into urban-like, intermediate and rural units.
Yet, altogether, the results obtained with DE were definitely farther away from the
imposed PA than those obtained with the own evolutionary algorithm. However, if
we refer to the image from Fig. 6.1, the doubts, illustrated by Tables 6.11 and 6.12,
concerning the potential division among as many as nine groups of municipalities,
become quite justified. It definitely appears that the urban-suburban-rural1 axis is
so much dominating that any additional division appears to be quite superficial and
to a high extent subjective.

1 This remains valid in spite of the obvious existence of the rural communes, featuring high intensity

of farming production and mixed types of economy, since these units are mostly neighbouring urban
or suburban areas.
6.5 The Analysis Regarding the Functional Categorization of Municipalities 75

Table 6.14 The contingency matrix for the functional typology of municipalities of Masovia from
Table 6.1 and reverse clustering with DE using “agnes” algorithm
Categories: obtained, PB → 1 2 3 Totals
initial, PA ↓
1 (MS) 1 0 0 1
2 (PSI) 26 1 0 27
3 (MP) 11 5 6 22
4 (MG) 5 0 0 5
5 (PSE) 17 10 4 31
6 (PG) 2 7 11 20
7 (O) 1 11 17 29
8 (E) 2 11 54 67
9 (R) 1 13 98 112
Totals 66 58 190 314

Concerning the variable weights and the Minkowski exponent similar observa-
tions can be forwarded for this series of experiments as for the administrative division
of the set of municipalities, namely: variable weights took distinctly three different
kinds of values: the dominating one, the few, mostly 2–3 important ones, and the rest,
including those of no significance at all; the Minkowski distance definition exponent
varying over a broader interval, from roughly 0.5 to well above 2. This apparently,
had not exerted any substantial influence on the quality of results, conform, anyway,
to the previously forwarded comments on this subject.
We shall end the presentation of the results for this case, i.e. the functional typology
of municipalities for the province of Masovia, with two maps of the province, one
corresponding to the results, characterised in Table 6.11 (own evolutionary method
and k-means algorithm) and the other one, corresponding to Table 6.12 (own evolu-
tionary method and hierarchical aggregation algorithm). These two maps constitute
Figs. 6.3 and 6.4.
In both these figures several highly characteristic features can be observed:
The first one is the very pronounced area of influence of Warsaw, reaching well
beyond the agglomeration and perhaps even well beyond the functional area (as, for
instance, defined in the report, constituting the basis for Table 6.1).
The second, quite similar, is the distinct appearance of the areas of subregional
cities and their zones of influence (see, especially, the one for the city of Radom in
the South of the province). Then, against the background of the rural units (in the
map of Fig. 6.3 clearly split into two or three sub-categories), and
The third one is the emergence of the compact belts or sequences of municipal-
ities, in some cases associated with transport routes, evidently displaying definite
intermediate characteristics (some of the darker green belts in Fig. 6.3 and, in a very
pronounced manner, some of the darker blue ones in Fig. 6.4).
76 6 Administrative Units, Part I

Fig. 6.3 Map of Masovia province with the partition PB from Table 6.11

6.6 Conclusions and Discussion

The two separate cases, treated in this chapter, but concerning the very same dataset
X, gave quite different, in both quantitative and qualitative terms, and, at the same
time, interesting results. First, the attempt of reconstructing the official administra-
tive breakdown into three categories of municipalities for the province of Masovia,
the capital province of Poland, indicated very high degree of agreement with the
classification into the “urban” and “rural” categories, while suggesting the necessity
of distinguishing and perhaps different classification of numerous cases from the
hybrid “urban–rural” category. These results were achieved in spite of the fact that
the partition imposed, namely the formal one, had, in principle, formally nothing to
do with the socio-economic and spatial data on the municipalities concerned.
6.6 Conclusions and Discussion 77

Fig. 6.4 Map of Masovia province with the partition PB from Table 6.12

On the other hand, the second case, pertaining to the functional typology of munic-
ipalities of the same province, Masovia, in which the initial partition was definitely
based on the data for the respective municipalities, turned out to produce through
the reverse clustering the partitions, which were similar to the initial one merely in
qualitative sense. Yet, this fact could be explained by both the data-related factors
and the quite restrictive requirement of dividing the set of objects into as many as
nine definite clusters. Still, even in this case, substantive conclusions, of importance
for the subject matter, could be formulated.
Regarding the technical aspect, it turned out again that for the data sets of limited
dimensions and quite clear interpretations of content, the local density algorithm
DBSCAN fared much worse than the other two kinds, i.e. k-means and hierarchical
agglomeration. Likewise, it also turned out that the selection, or weighting, of vari-
ables has an influence on the results obtained, but only in a general sense, that is—a
78 6 Administrative Units, Part I

very clear choice of variable weights was performed on each occasion, but the indi-
cation of variables changed very much. In most cases, the variables were weighted
in such a manner that three classes of them could be distinguished:
(i) the dominant variable (defining the “axis’ of categorisation);
(ii) the important variables: two–three variables, less important than the domi-
nant variable, but still exerting quite an influence on the result (modifying
variables), and
(iii) the remaining ones, most of them having, actually, no influence on the results
obtained.
The reverse clustering approach demonstrated, therefore, also in this case its
usefulness in terms of reaching the telling results, having quite interesting substantive
implications for the problems considered.

References

Owsiński, J.W., Stańczak, J., Zadrożny, Sł.: Designing the municipality typology for planning
purposes: the use of reverse clustering and evolutionary algorithms. In: Daniele, P., Scrimali, L.
(eds.) New Trends in Emerging Complex Real Life Problems. ODS, Taormina, Italy, 10–13 Sept
2018. AIRO Springer Series, vol. 1. Springer, Cham (2018)
Śleszyński, P., Komornicki, T.: Courtesy of data and internal report on functional typology of
municipalities of the province of Masovia (2009)
Stańczak, J.: Biologically inspired methods for control of evolutionary algorithms. Control Cybern.
32(2), 411–433 (2003)
Comrey, A.L., Lee, H.B.: A First Course in Factor Analysis, 2nd edn. Lawrence Erlbaum Associates,
Hillsdale, N.J. (1992)
Chapter 7
Administrative Units, Part II

7.1 The Background

In this chapter, we shall be analysing the data similar to those we considered in the
preceding chapter, that is—the data on the administrative units in Poland. Yet, our
analysis shall focus here on the national level, even though we shall be dealing, as
before, with the communes (municipalities). Let us remind that there are some 2,500
municipalities in Poland, these units constituting the elements of the higher admin-
istrative level of counties, which, in turn, compose the provinces. In the preceding
chapter we have analysed the data sets on municipalities at the provincial level. In
this chapter we will be analysing them for the entire country—some 2,500 units.
Similarly as in one of the two cases, considered in the preceding chapter, we
shall be looking at the initial partition, PA , that was prepared by the experts from the
Institute of Geography and Spatial Organisation of the Polish Academy of Sciences,
see Śleszyński and Komornicki (2016). It actually was meant for a similar purpose
to that typology, which was used in the preceding chapter for the Masovian province.
The similarity extends also to the procedure, which led to the establishment of this
typology. Of foremost importance is the fact that both procedures were based on
both a definite data set and a sort of branching procedure. This sort of procedure is
sometimes the source of data that are used for quite pragmatic purposes, as this is
schematically exemplified in Fig. 7.1 for a specific case of social care/unemployment
benefits. The resulting data sets are, therefore, not so easily amenable to the reverse
clustering analysis, since the prior partition, PA is the reflection not only of the data
themselves, but also of some specific decisions, referring to definite thresholds and/or
nominal values.
The result of the procedure, which leads to the typology mentioned, and which is
described in detail in Śleszyński and Komornicki (2016), is shown in Table 7.1. It can
easily be noticed that the categorisation in question reminds very much the one of
Table 6.1 from the preceding chapter, mainly because of the similarities, mentioned
above. The primary difference consists in the appearance of some special cases, like
those of the commune type no. 6 (pronounced transport functions of a commune),

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 79


J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_7
80 7 Administrative Units, Part II

Fig. 7.1 Two examples of the procedures, leading to the potential prior categorization of the sort
similar to the one of interest here

but also no. 7 (actually, this type contains municipalities, in which such activities are
located of significant scale, as opencast mining, on the one hand, and recreation and
leisure activities, on the other hand). The total number of categories is bigger than
in the previous case only by one, while the categories have to be allocated over the
entire territory of the country.
For purposes of this analysis, the set of variables, presented in Table 6.2, of
which all were in a way “relative”, such as shares in the area of a municipality or in
population number, was extended with some additional variables, the first two having
an absolute character, namely population number and overbuilt area in a municipality,
and the third one, referring to the category no. 6—the share of transport surfaces in
the municipality (see Table 7.3 further on). The addition of the two absolute variables
turned out to be of essential importance for the results, as we shall see further on.
Thus, altogether, in this study—in the majority of calculations—we used 21 instead
of 18 variables.

7.2 The Computational Experiments

The calculations followed the path of those, presented in the preceding chapter. We
shall be reporting from some of these calculations, involving the evolutionary method
from Stańczak (2003) and the k-means as well as hierarchical aggregation algorithms.
We start the presentation of these results with Table 7.2. It can be easily seen from
this contingency matrix that the initial partition, initially outlined in Table 7.1,1 may
only be considered to be very vaguely qualitatively reconstructed. Indeed, for most

1 The quantitative content of Table 7.2 in terms of PA does not fully follow the one of Table 7.1, a
part of communes being placed in different categories (that is also why we use somewhat different
wording for the description of the categories). The difference results from the dating of the respective
typological source studies.
7.2 The Computational Experiments 81

Table 7.1 Functional typology of Polish municipalities


Functional types Number of Population Area Population
units density
No. % in ‘000 % ‘000 km2 % Persons/km2
1. Urban functional 33 1.3 9 557 24.8 4.72 1.5 2 025
cores of provincial
capitals
2. Outer zones of 266 10.7 4 625 12.0 27.87 8.9 166
urban functional
areas of provincial
capitals
3. Cores of the urban 55 2.2 4 446 11.6 3.39 1.1 1 312
areas of
subregional
centres
4. Outer zones of 201 8.1 2 409 6.3 21.38 6.8 113
urban areas of
subregional
centres
5. Multifunctional 147 5.9 3 938 10.2 10.39 3.3 379
urban centres
6. Communes having 138 5.6 1 448 3.8 20.06 6.4 72
pronounced
transport function
7. Communes having 222 9.0 1 840 4.8 33.75 10.8 55
pronounced
non-agricultural
functions
8. Communes with 411 16.6 2 665 6.9 55.59 17.8 48
intensive farming
function
9. Communes with 749 30.2 5 688 14.8 93.83 30.0 61
moderate farming
function
10. Communes 257 10.4 1 878 4.9 41.59 13.3 45
featuring
extensive
development
Totals for Poland 2 479 100 38 495 100 312.59 100 123
Source Śleszyński and Komornicki (2016)

of the original categories most of their objects are placed in different clusters than
they “should”. Altogether, only about half of objects, i.e. municipalities, are placed
“correctly”. Evidently, the errors are the smallest for the urban units (especially the
original categories 3 and 5), but also, quite surprisingly—there is a relatively not too
big error for the rural category 8.
82

Table 7.2 Contingency table for the proposed functional typology of Polish municipalities and the reverse clustering partition obtained with own evolutionary
method using k-means algorithm
No. Types of communes in partition PA Clusters obtained forming partition PB Errors
1 2 3 4 5 6 7 8 9 10 11 12
1 Urban functional areas of provincial capitals 14 0 13 0 0 0 0 0 0 0 1 5 19
2 Outer zones of fua’sa of provincial capitals 0 99 16 76 13 35 8 7 11 0 0 0 166
3 Functional urban areas of subregional centres 3 0 46 0 6 0 0 0 0 0 0 0 9
4 External zones of fua’s of subregional centres 0 8 1 101 6 29 33 7 15 1 0 0 100
5 Multifunctional urban centres (other) 0 0 6 5 94 33 2 0 2 0 0 0 48
6 Communes with developed transport functions 0 0 0 24 4 32 17 30 30 0 0 0 105
7 Communes with other developed non-farming functions (tourism and 0 1 1 11 14 19 95 43 29 9 0 0 127
large-scale functions, including mining)
8 Communes with intensive farming functions 0 0 0 15 0 18 0 359 104 0 0 0 137
9 Communes with moderate farming functions 0 1 0 80 7 57 56 98 366 0 0 0 299
10 Extensively developed communes (with forests or nature protection 0 0 0 10 5 18 105 32 92 0 0 0 262
areas)
Sums 17 109 83 322 149 241 316 576 649 10 1 5 1,272
2,478
a Fua: functional urban area
7 Administrative Units, Part II
7.2 The Computational Experiments 83

The phenomena, which can be observed from Table 7.2, and which are worth
noting, are:
• the disappearance of the category 10, “distributed” mainly among the categories
7 and 9 (no surprise, indeed),
• the clear distinction of a singleton category (B11 in Table 7.2), composed only of
the capital of the country (no wonder, one might say)
• and the appearance of the category B12 , composed of the functional urban areas
(fuas) of the biggest metropolises, besides the national capital;
• further, the very specific initial category 6 got actually almost entirely “washed
away” and that despite the introduction of the relevant variable.
Table 7.3, on the other hand, shows the weights of variables, obtained for the same
solution. It now becomes obvious why it was important to include the two (first, here)
absolute variables: their joint weight account for 45% of the total weight of all 21
variables! While only one variable got its weight explicitly equalled to zero, in fact
many other became only marginally important (there are altogether 10 variables
with weights below 0.020). The 12 variables with the lowest weights have the total
weight accounting for a mere 13%. On the other hand, there appears another “factor”
of high importance, namely the variables “Registered employment indicator” and
“Registered businesses per 1,000 inhabitants”, accounting together for more than
25% of total weight. We shall be yet returning to the issue of variable weights and
the related implications.

Table 7.3 Variable weights in the solution illustrated in Table 7.2


Variable Weight Variable Weight
Population 0.358 Average farm acreage indicator 0.023
Overbuilt area 0.091 Registered employment indicator 0.160
Share of transport-related areas 0.001 Registered businesses per 1 000 0.096
inhabitants
Population density 0.013 Employment-based average business 0.018
scale indicator
Share of agricultural land 0.018 Share of businesses from 0.024
manufacturing and construction
Share of overbuilt areas 0.024 Number of pupils per 1 000 inhabitants 0.004
Share of forest areas 0.009 Number of over-primary pupils per 1 0.039
000 inhabitants
Share of population over 60 years 0.000 Own revenues of municipality per 0.009
inhabitant
Share of population below 20 years 0.019 Share of revenues from personal 0.037
income tax in own communal revenues
Birthrate for last 3 years 0.010 Share of social care expenses in 0.009
communal budget
Migration balance for last 3 years 0.037
84 7 Administrative Units, Part II

Figure 7.2, showing the map of Poland with an indication of municipality bound-
aries, illustrates the scale of agreement/disagreement of the results obtained (like
those from Table 7.2) with the initial partition. It is highly interesting to note that
the municipalities placed “correctly” and “incorrectly” form, in their vast majority,
compact areas rather than a haphazard mosaic. This—especially against the back-
ground of the next map, that in Fig. 7.3, indicates that the “errors” concerned no so
much individual communes as their subclasses, often forming compact territories in
space. Such a phenomenon calls, definitely, for a more in-depth substantive analysis
(e.g. oriented at the number of clusters).

Fig. 7.2 Map of Poland with indication of municipalities, which belonged in the solution of
Table 7.2 to the “correct” categories from the initial partition and those that belonged to the other
ones (“incorrect”)
7.2 The Computational Experiments 85

Fig. 7.3 Map of Poland, showing the partition of the set of Polish municipalities obtained with the
own evolutionary method and the k-means algorithm, composed of 12 clusters, corresponding to
Table 7.2

The hierarchical aggregation algorithm gave similar results to those of k-means


(k-means: 1272 wrong classifications, hierarchical aggregation: 1240 wrong classifi-
cations,2 i.e. hierarchical aggregation fared in this case slightly better than k-means)
in terms of similarity between PA and PB . The results, obtained with the generalised
hierarchical aggregation algorithm are interesting in themselves and qualitatively
quite different from those obtained with k-means. They are characterised in Table 7.4.
First, and foremost, it must be noted that the best number of clusters, determined for
this algorithm, was 5, i.e. just half of those in PA . The table very strongly suggests a
much simpler categorisation of the totality of Polish communes, namely into:
(A) urban cores (including some of the suburban municipalities);

2 If
the wrongly placed objects (communes) are counted with respect to the clusters, forming the
obtained partition PB , which are in two cases the aggregates of the original categories, then the
number of these “erroneously” assigned communes dwindles to 908, see Table 7.4.
86 7 Administrative Units, Part II

Table 7.4 Contingency table for the proposed functional typology of Polish municipalities and the
reverse clustering partition obtained with own evolutionary method using hierarchical aggregation
algorithm
No. Types of communes in Clusters obtained forming partition PB Errorsb
partition PA 1 (1,3,5)a 2 (4,6,7,9)a 3 (10)a 4 (2)a 5 (8)a
1 Urban functional areas 26 1 0 6 0 7
of provincial capitals
2 Outer zones of fua’s of 24 68 14 152 7 113
provincial capitals
3 Functional urban areas 46 0 3 6 0 9
of subregional centres
4 External zones of fua’s 11 85 25 60 20 116
of subregional centres
5 Multifunctional urban 90 25 3 8 16 52
centres (other)
6 Communes with 6 74 10 8 39 63
developed transport
functions
7 Communes with other 8 103 69 4 38 119
developed non-farming
functions (tourism and
large-scale functions,
including mining)
8 Communes with 0 115 1 6 374 122
intensive farming
functions
9 Communes with 9 518 6 25 107 147
moderate farming
functions
10 Extensively developed 2 108 104 3 45 158
communes (with forests
or nature protection
areas)
Sums 222 1097 235 278 646 908
2478
a Hypothetical corresponding clusters from the initial partition; in bold: numbers of municipalities
from the hypothetically corresponding clusters in the initial partition
b Calculated with respect to the newly established categories, i.e. B
q

(B) suburban municipalities, primarily those of the larger agglomerations, along


with some other ones, most presumably of similar socio-economic character-
istics;
(C) rural municipalities around smaller urban centres, together with those featuring
not too intensive farming;
(D) rural municipalities with intensive farming; and
7.2 The Computational Experiments 87

(E) the extensively developed rural municipalities, with forests and nature
protection areas.
Some other comments on this result will be forwarded in the discussion section
of this chapter.
A good illustration of the characteristics of Polish municipalities across the terri-
tory of Poland, supporting some of the observations, based on Tables 7.3 and 7.4, is
provided in Fig. 7.3, presenting the map of Poland, produced with the calculation run,
performed with the k-means algorithm, in this case resulting in 12 clusters, forming
the respective PB , and characterised before in Table 7.2.
Namely, this image shows very clearly two aspects of the set of Polish communes:
I. the natural gradation from urban to rural to peripheral units and areas; and
II. the very distinct difference in spatial character between the north-western and
the south-eastern parts of Poland (visual domination of the blue areas in the
former and of the green territory in the latter).
This second aspect disturbs very much the possibility of a “linear” classification
of municipalities across the entire territory of Poland. In a very rough statement: the
West of Poland is much more urbanised than the East, but it is by no means more
densely populated—its settlement system is simply definitely different. Actually,
south-eastern Poland is more densely populated than the north-western one. At the
same time, the shares of forests are much higher in north-western Poland.

7.3 Discussion and Conclusions

The situation from the second series of experiments, illustrated and commented upon
in the preceding chapter, was repeated here in that the solutions obtained were quite
far from the initial partition, at least in purely formal terms (the number of misclas-
sified units being at around half). Like there, though, the qualitative character of the
initial partition was to a significant extent preserved—with some telling exceptions,
which could be used, in particular, for drawing substantive conclusions. Quite tech-
nically, it turned out again that k-means and hierarchical aggregation outperformed
DBSCAN.
It was highly important to obtain the explicit weights of variables in this particular
exercise, for these weights indicated the “main direction”, along which the initial
categories were defined, namely—and quite naturally—the “urban-rural” direction
(like in some other experiments before—the dominating variables were identified,
followed by some complementary ones, and then the group of really unimportant
variables, roughly half of them). No wonder, therefore, that the “diverging” clusters
(like category no. 6, associated with transport) could be identified (or “reconstructed”)
with less certainty, or not at all.
Another conclusion of a similar character concerns the fact that most probably
the population of municipalities of rural character—some 1 500 units or more—
does not feature any distinct divisions into clear subcategories, but rather constitutes
88 7 Administrative Units, Part II

a continuum in the socio-economic and spatial dimensions. That is why, apparently,


although—in distinction from the above commented “diverging” groups—it is defi-
nitely somehow distributed along the very general “urban-rural” axis, any partition
of this large group has to bear an arbitrary or subjective character, at least to a signif-
icant extent (although, again, a reference to Owsiński 2012, may be recalled for a
potentially “objective” approach to the division of the thus distributed set of objects).

References

Owsiński, J.W.: On dividing an empirical distribution into optimal segments, SIS (Italian Statistical
Society) Scientific Meeting, Rome, June 2012. http://meetings.sis-statistica.org/index.php/sm/
sm2012/paper/viewFile/2368/229
Śleszyński, P., Komornicki, T.: Functional classification of Poland’s communes (gminas) for
the needs of the monitoring of spatial planning (in Polish with English summary). Przegl˛ad
Geograficzny 88, 469–488 (2016)
Stańczak, J.: Biologically inspired methods for control of evolutionary algorithms. Control Cybern.
32(2), 411–433 (2003)
Chapter 8
Academic Examples

8.1 Introduction

This chapter, closing the series of presentations of application cases of the reverse
clustering paradigm, is devoted to these experiments, whose applied side was for
various reasons actually void. Naturally, the purpose of such experiments was to test
the capacities of the approach and the setting we used for it, as well as the role of
definite parameters, composing the vector Z. There were not so many such exper-
iments carried out, as the ones, based on real-life data, with potentially applicable
results and conclusions, provided quite ample material for testing the methodology
and its technical details.
First, a couple of remarks are forwarded, concerning the very initial tests,
performed with the classical Fisher’s Iris dataset (see Fisher 1936, and Anderson
1935). Although the data are empirical, they are treated merely as an academic
testbed, mainly because of the ample knowledge on this data set and broad
comparative material.
Then, a bit more detailed account is provided on a series of artificial data sets,
of similarly small dimensions as the Iris data, but featuring other kinds of potential
difficulties. In particular, some of these data sets were clearly composed of the “nested
clusters”, i.e. smaller clusters forming bigger ones. It was interesting to attempt
determining the parameters of Z, for which the different “levels of nested clusters”
could be recovered.

8.2 Fisher’s Iris Data

Since this data set, due to E. Anderson and R. A. Fisher, is very well known, we
shall only shortly report here on the results obtained from the tests of the reverse
clustering approach, based on this data set. We speak here, namely, of n = 150
observations, characterised each by m = 4 variables, describing flowers, and the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 89


J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_8
90 8 Academic Examples

initial partition, PA , is the one into three varieties, Iris setosa, Iris virginica and
Iris versicolor. The calculations, performed with differential evolution method (DE)
using “pam” (partitioning around medoids) and the hierarchical aggregation “agnes”
algorithms yielded the results as shown in the series of tables—Tables 8.1, 8.2 and
8.3. These experiments were first performed with a rather narrow composition of the
vector Z and it soon became obvious that in order to obtain better results a possibly
broad selection of parameters and their values is necessary. Thus, Tables 8.2 and 8.3
contain the results for the extended composition of the vector Z.
It can be easily seen that in this case the hierarchical aggregation algorithm
provided better results than the one, belonging to the k-means family.
Actually, in the framework of experiments with the DE method, also the standard
fuzzy clustering algorithm “fanny” (from the R package, again, following Kauffman
and Rousseeuw 1990) was tried out, yielding, in this particular case, yet somewhat
better results. This algorithm, though, was not used in any of the remaining cases,
here commented upon.
Like in several other situations, reported in this book, the own evolutionary method
(Stańczak 2003) fared altogether rather distinctly better than the DE from the R
package, the respective results being summarised in Table 8.4.

Table 8.1 The results obtained for the Iris data with the DE method—comparison of “pam” and
“agnes” algorithms and two selections of vector Z components (notation as in Table 4.1)
Algorithm Optimized parameters Adjusted Rand index Rand index
pam p, h, w1 , . . . , w4 0.758 0.892
p 0.730 0.880
agnes p, a, h, w1 , . . . , w4 0.922 0.966
p, a 0.759 0.892

Table 8.2 Contingency table


Iris varieties Clusters obtained
for the DE method applied to
the Iris data with the “pam” 1 2 3
algorithm Iris setosa 50 0 0
Iris versicolor 0 48 2
Iris virginica 0 14 36

Table 8.3 Contingency table


Iris varieties Clusters obtained
for the DE method applied to
the Iris data with the “agnes” 1 2 3
algorithm Iris setosa 50 0 0
Iris versicolor 0 50 0
Iris virginica 0 14 36
8.2 Fisher’s Iris Data 91

Table 8.4 The reverse clustering results for the Iris data obtained with the own evolutionary method
using DBSCAN, k-means and hierarchical merger algorithms
Clustering Variable weights Minkowski exponent Number of
method/variant misclassified objects
DBSCAN—1 0.369, 0.028, 0.229, 0.95 6
0.967
DBSCAN—2 0.470, 0.027, 0.217, 1.33 6
0.898
DBSCAN—3 0.040, 0.041, 0.076, 0.53 5
0.908
k-means 0.052, 0.051, 0.673, 0.42 3
0.224
hierarchical 0.158, 0.193, 0.439, 3.27 2
aggregation—1 0.210
hierarchical 0.116, 0.174, 0.560, 2.64 3
aggregation—2 0.150

The different variants of DBSCAN and hierarchical aggregation algorithms are the
witnesses to the attempts, undertaken for this initially treated data set, of specifying
the variants of the algorithms and of the conditions on parameter combinations that
would be used for further studies.
Like in the case of DE, the best results were obtained for hierarchical aggrega-
tion (although those for k-means are only slightly worse). This is not really very
surprising, since the hierarchical aggregation algorithms can overcome the limita-
tion, proper for the k-means algorithms, of containing the clusters in hyperspherical
or hyperellipsoidal shapes.
In any case, the comparison of these results with those usually obtained by various
clustering and classification methods, when tested against the Iris data, demonstrated
that the approach, equipped with the techniques and their parameters assumed, can
work quite properly in terms of an adequate reconstruction of the initial partition PA .
This was one of the important signals for continuing the study as reported in this
book.

8.3 Artificial Data Sets

The essential character of the artificial data sets, which were the subject of analysis
in this series of calculations, is shown in Figs. 8.1 and 8.2. The fundamental issue,
as indicated already, was related to the “nested” nature of the respective clusters.
Thus, depending upon the “level of perception” or of “resolution”, in the case of the
pertinent Fig. 8.1, one could speak of four, eight, or even more (say, 15) clusters.
It was, then, of interest, which, if any, of the “resolution levels” would be recon-
structed through reverse clustering, and, if such a reconstruction turned out to be at
92 8 Academic Examples

Nested clusters - 2D - three levels


8

5
x2

0
0 1 2 3 4 5 6 7 8
x1

Fig. 8.1 An example of the artificial data set with “nested clusters”, subject to experiments with
reverse clustering

Linear broken structure - 2D


8

5
x2

0
0 1 2 3 4 5 6 7 8
x1

Fig. 8.2 An example of the artificial data set with “linear broken structure”, subject to experiments
with reverse clustering
8.3 Artificial Data Sets 93

all possible, whether appropriate tuning of the parameters in vector Z would allow
for the identification of the different, successive “resolution levels”.
Concerning the “nested” structure, experiments were performed for a number of its
variants, differing by the mutual positioning and separation of the “partial clusters”,
appearing in Fig. 8.1, implying more or less distinct division of the bigger clusters
into the smaller ones. In the extreme case, there would be no visual distinction inside
the four “main” clusters.
The calculations were performed using the k-means algorithm and the conclusions
could be summarised as follows:
• It turned out to be trivial to obtain the structures required, using the k-means
algorithm when the steering parameter was simply the number of clusters—the
algorithm identified the “proper” clusters without any errors for partitions into 1, 4,
8, 15, 16 and 60 clusters; in addition—once the number of clusters was established,
the remaining parameters played either absolutely no role whatsoever or only a
truly marginal one;
• On the other hand, obtaining the different level of granulation (“resolution”) for the
variable (optimised) number of clusters, through manipulation of other parameters
turned out to be quite a difficult task; actually, it turned out to be possible to obtain
only partitions into 1, 4 and 60 clusters; it is not excluded, of course, that a much
finer mesh for the parameter values used might still yield the other “resolution
levels” (e.g. partitions into 8 and 15 clusters);
• In an obvious manner, the problems with obtaining “less distinct” (other than
mentioned above) “resolution levels” is also associated with the inherent charac-
teristic of the k-means algorithm, namely the monotone dependence of its implicit
objective function (sum over clusters of the cluster-proper sums of distance of
objects in the cluster to the cluster representative) on the number of clusters,
decreasing along with the number of clusters; this causes that only the extreme
or very distinct structures can be identified by the algorithm as the “best” ones.
Another kind of example analysed is shown in Fig. 8.2—in this case the essential
issue was in the degree of separation of the segments of the supposedly “linear”
structure. If one refers to the case analysed in the preceding chapter (administrative
units—case II), one might easily see the association of the two: the municipalities
being roughly distributed in the universe of socio-economic and spatial characteristics
along the “urban-rural” axis. There, however, the main problem consisted not so much
in the gaps between the subgroups (as in Fig. 8.2), but mainly in the (potential or
even only hypothetical) divergence from the main axis mentioned, featured by some
specific kinds of municipalities. Another interpretation of this kind of data set is
oriented at chronological data series and the changes of behaviour in the underlying
model over time.
The results obtained for the test data, whose example is provided in Fig. 8.2,
were very similar, in qualitative terms, to those reported and summarised before for
the “nested clusters” case. Only the most distinct clusters (four clusters in the case
of Fig. 8.2) were identified, along with the extreme ones (the one, all-embracing
cluster and the set of singletons). This, again, has to be mostly attributed to the
94 8 Academic Examples

specific features of the k-means algorithm, which provided the best results also for
this dataset.

8.4 Conclusions

The here discussed analyses were primarily oriented at the verification of the basic
capabilities of the reverse clustering paradigm. We do not report the detailed results
for this group of experiments, since this is not of primary interest: the main issue
in all of these experiments was to verify the capacity of the methodology engaged
in the paradigm (the search procedures, the clustering algorithms, and the sets of
parameters optimised) in reconstructing the basic features of the respective initial
partitions. In these terms, the results of the tests were altogether positive, with an
indication of some reservations, which ought to be kept in mind when applying the
approach.
The latter concerned mainly the usefulness of the particular clustering algorithms
and their variants, as well as parameterisations. The conclusions drawn therefrom
are in agreement with the experiences derived from other experiments. Thus, in
particular, k-means and hierarchical aggregation come out as definitely better than
DBSCAN, while the inner limitations of, for instance, k-means, become apparent
for the appropriately constructed test data, which, actually, may correspond to some
specific real-life data sets, with which one might deal in practice.

References

Anderson, E.: The irises of the Gaspé Peninsula. Bull. Am. Iris Soc. 59, 2–5 (1935)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenic. 7(2), 179–
188 (1936)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley,
New York (1990)
Stańczak, J.: Biologically inspired methods for control of evolutionary algorithms. Control Cybern.
32(2), 411–433 (2003)
Chapter 9
Summary and Conclusions

We shall now try to sum up the experiences from the study of the reverse clustering
paradigm, whose essential results are presented in this book. Conforming to the
purpose of this book, announced at the outset, we shall concentrate on the sense and
interpretations of the reverse clustering approach, and hence on the potential signifi-
cance and use of its results, but will also devote some attention to the methodological
aspect of the respective procedure. The purely technical, computational side will at
the moment be treated rather marginally.

9.1 Interpretation and Use of Results

Thus, the very first conclusion from the experiments, here reported, is the wide scope
of potential and actual interpretations, and therefore uses, of the paradigm. Contrary
to what many might think, when first confronted with the idea, reverse clustering is
not just another approach to the determination of classifiers. This is well illustrated
by the cases, in which the initial, reference partition PA had a specific relation to the
data set X, e.g. based on a feature that was not present in X (traffic data, environmental
contamination data), and not necessarily easily identifiable through clustering. On
the other hand, the approach enabled a reasoned analysis of the situations, in which
PA was apparently just a hypothesis, or resulted from a procedure that could not
be brought to the framework of clustering (administrative units). The variety of
situations treated, which was announced in Fig. 3.1, allowed for quite thorough
verification of the capacities of the approach. Although the clarity and strength of
results, in terms of their interpretation, differed quite widely among the experiments,
in each case an additional insight into the data set and its structure was gained, in some
cases boiling down to the “better partition than that of PA ” or “general confirmation
of the PA , with definite reservations”. So, it can be stated that reverse clustering is a
new, versatile tool of data analysis, which can be used for a wide variety of problems,
in which one of the essential aspects is partitioning of the set of data objects.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 95


J. W. Owsiński et al., Reverse Clustering, Studies in Computational Intelligence 957,
https://doi.org/10.1007/978-3-030-69359-6_9
96 9 Summary and Conclusions

We would like to emphasise that in some cases we obtained truly surprising


results, regarding both their nature (relation of the shape and character of partitions
PB and PA ) and quality (i.e. either similarity of PB and PA or “internal quality”
of the clustering itself). This concerns, first of all, the results for the road traffic
data, where not only a better partition PB was found, notwithstanding the fact that
PA was based on a feature not contained in the data set, a new category could be
identified, in this case composed of “outliers” or “anomalous patterns”. Similarly,
in the case of environmental contamination data, where no a priori PA was given,
the one, determined on the basis of superficial visualisation of the data for some
of the variables in X, could be quite appropriately reconstructed on the basis of the
remaining ones. Thereby, a strong confirmation was provided for the possibility of
categorising the units analysed according to the levels of heavy metal content in the
ecosystems of the units considered. Moreover, it turned out that the obtained and
confirmed categorisation is quite simple and effective—only few categories, having
quite obvious interpretations, adequate from the point of view of environmental
management.
Another case of surprisingly good results was that of the administrative cate-
gorisation of municipalities in Poland. Not only the three formal categories of
municipalities (“urban”, “rural” and “urban–rural”) could be quite well reconstructed
on the basis of socio-economic and spatial data on these units, but it turned out that the
vector Z, established for one province of Poland (Masovia), produces, when applied
in terms of defining a clustering procedure, almost as good clustering results for
another province (Wielkopolska). This demonstrated that one of the major intended
uses of the approach—i.e. classification of entire data sets different than the original
X is actually not only possible, but has been verified on quite not simple empirical
material.
The feasibility of the concept behind the reverse clustering paradigm, as it was
described in the initial chapters of the book, was confirmed by the calculations,
related to what we called the “academic data”, including, in particular, the famous
Iris data set of Fisher and Anderson, commented upon in the preceding Chap. 8.
Not only was it possible to reconstruct the imposed initial partitions with acceptable
precision, but this verification-oriented application of reverse clustering also allowed
for some preliminary assessment of the more technical details of the procedure, first
of all concerning the composition of the vector Z. Moreover, positive results—even
if not entirely in agreement with visually-based intuitions—were obtained for the
case of “nested clusters”, quite specific as to the character of hypothetical clusters
and related alternative partitions.
Notwithstanding this confirmation of the very concept, some of the cases treated
turned out to produce results, which, in purely quantitative terms, were highly “unsat-
isfactory”. Definitely, with only about half of units getting “correctly” classified in
the best attainable PB the results can hardly be referred to as “adequate”. This was
the case with several experiments, concerning the functional typologies of munici-
palities in Poland (Chaps. 6 and 7). Yet, the respective contingency matrices for PA
and PB demonstrated that in qualitative terms the results were not as “bad”, and were
indeed telling in some respects. First, some of the initial categories were relatively
9.1 Interpretation and Use of Results 97

well reconstructed (the respective errors being at the levels of 10–20% rather than
50%). Second, these relatively well reconstructed categories were easily interpreted
in quite obvious intuitive terms (“urban cores” of various degrees, for instance).
Third, the biggest errors, and, in fact, even lack of recognition of a definite category,
occurred for the initial categories that “by definition” were highly uncertain, very
likely constituting segments of a continuum, from which they were separated by the
imposition of some thresholds or use of external criteria.1 This applied, in particular,
to such general categories as:
(i) suburban municipalities, forming various concentric areas around bigger
agglomerations, with vague assignment to vaguely defined zones (here the
results provided very interesting alternatives to the reaches and shapes of the
respective “zones of influence” of particular agglomerations), and,
(ii) rural municipalities, with different degrees of farming intensity and also
different degrees of importance of farming for the local economy, usually
meaning mixed local economies (along with different degrees of the advance-
ment of such processes as ageing and depopulation).
Another category, which could hardly fit into the framework of reverse clustering and
was poorly identified within the PB ’s obtained, was constituted by “special kinds” of
units, determined on the basis of selected sectors and branches of economy or other
features, characteristic for a group of spatial units, such as, for instance:
(iii) municipalities, hosting special kinds of economic activities, such as a large
scale strip mining and associated industries;
(iv) municipalities, with high intensity of recreation and leisure activities, or
(v) municipalities, in which the function of transport played an important role
(junctions, storage and logistic facilities, etc.).
In addition, in some cases such distinctions (“criteria of classification”) were put
together in conjunction (like (iii) and (iv) above), which made their reconstruction
in the framework of any clustering-based PB very difficult, if at all possible.
In the light of these remarks it becomes obvious that for several of similarly deter-
mined categories, distinguished in the initial partitions, a second thought is neces-
sary, concerning their definitions, and here the reverse clustering approach may, and
actually does, provide quite significant material for consideration, pertaining to the
substantive, and not only just “mechanical”, aspect of the particular categorisations.

1 Itis, theoretically, possible to divide in a supposedly “optimal” manner a relatively continuous


distribution, as this is shown in Owsiński (2012), but this requires applying a special procedure and
a specially devised objective function, and is always based on the use of some divergences from the
smoothness and continuity of the relevant distribution, which was in this case not feasible.
98 9 Summary and Conclusions

9.2 Some Final Observations

Concerning the technical side of the reverse clustering procedure, we shall forward
here some remarks on the roles of particular elements of the vector Z, as resulting
from the experiments described in the book.
It is quite obvious that the most important parameter of Z is the clustering algo-
rithm, meaning, in our case, the choice between the k-means-type algorithm, the
general hierarchical aggregation procedure and the DBSCAN algorithm, as the repre-
sentative of the local density algorithms. Quite systematically, for the cases here
considered, it would turn out that DBSCAN gave the worst results, as measured by
the similarity of PA and PB , even if in some cases these results were in themselves of
some interest (and also in some cases featured similar quality to one of the two other
kinds of techniques considered). The two other ones, k-means-type and hierarchical
aggregation, gave often the results of similar quality, although there were cases, in
which a distinct difference could also be observed.
Along with the choice of the clustering algorithm, the key parameter was, natu-
rally, the number of clusters, p. In many situations, in order to shorten the calcula-
tions, this number was fixed, usually being equal to that of PA . Yet, such a limitation
very strongly influenced the final solution, which, when p was subject to optimisa-
tion, could have been different, even—surprisingly—though the PB would be more
similar to PA . This is the outcome of the interplay—mentioned at the end of Chap. 2—
between the data, i.e. PA and X, on the one hand, and the various principles, constraints
and parameters, characterising the clustering algorithms, referred to through Z.
The number of clusters is an explicit parameter for the k-means and the hierarchical
aggregation algorithms, while it is simply an element of output for DBSCAN. Yet,
other algorithmic parameters were also used, like the Lance-Williams coefficients
in the case of hierarchical aggregation algorithms and the local distance/density
coefficients in the case of DBSCAN. This aspect was not treated in depth in this
book, for the sake of shortness and clarity, and the only comment we shall forward
here with respect to these elements of Z is that while there were also subject to
optimisation in most of the cases treated, their influence was usually limited, except
for some special situations (e.g. clusters of very specific shapes, densities etc.).
Through a number of specifically oriented experiments it was demonstrated that
the use of the complete vector Z, including all its elements, listed in Chap. 1, yields
better results than the use of only the most important parameters. Thus, in virtually
all cases optimisation was applied to the weights of variables and the Minkowski
exponent of the distance definition, along with other parameters. This proved to be
fully justified in some of the reported cases, in which quite a proportion of variables
were neglected or marginalised, and a distinct structure of importance of variables
could be observed (e.g. one-two leading variables, few less important ones, and the
rest of no or barely visible importance).
Yet, a high degree of lability was observed both for the actual values of the weights
of variables (even if the structure of weight values, mentioned above, was preserved),
and even more so with respect to the Minkowski exponent—often ranging within the
9.2 Some Final Observations 99

same experiment and for similar other conditions of calculations over a wide interval
of values (e.g. for one of the experiments—between 0.6 and 3.7). This high degree of
changeability of values is due, first, to low sensitivity of results, especially in terms of
the Rand index values, with respect to some of the parameter values, and also to the
substitutability among variables within their definite groups (the highly correlated
ones). Computational limitations, related to computation times and facilities, have
not allowed us to carry out a more detailed analysis of the respective phenomena,
but, all in all, they were not so important from the point of view of the interpretation
of the results obtained. Still, given the conclusion that the use of the complete vector
Z yields better results than when only algorithmic parameters are being optimised,
this issue definitely requires further study.
As a kind of a (closing, in fact) illustration for the latter statement we provide
Fig. 9.1, which is related to the study, reported in Chap. 6 of the book—the study of
categories of municipalities within a single province. This figure shows the map of the
Polish province of Masovia, with the national capital, Warsaw, as its centre, analysed
in the reverse clustering experiments, performed with the DBSCAN algorithm, the
one that fared the worst in this series of experiments, the respective results being
shortly characterised in Table 9.1. It can be easily seen from this table that instead
of the initial nine functional types the algorithm was capable of producing only five,
and not easily attributable to the initial ones, at that, so that even quite qualitatively
one finds it difficult to reconstruct the initial partition. The “error” proportions are,
indeed, very significant in the case of this algorithm, and one hardly finds the result
acceptable, definitely so in quantitative terms, but, then, also in the qualitative ones,
as well.
Notwithstanding all these shortcomings and the related criticism, the map of
Fig. 9.1 clearly shows a distinct and well-justified spatial structure, with the large
agglomeration area of Warsaw in the middle, very distinct urban functional zones
of the subregional centres, and some complementary municipality types, which,

Table 9.1 Contingency matrix for the typological categorisation of the municipalities of the
province of Masovia in Poland obtained with reverse clustering using own evolutionary algorithm
and the DBSCAN algorithm (for explanations see Chap. 6)
Categories: obtained → 1 (MS, PSI) 2 (MP, MG) 3 (PSE, PG, O, 4 (?) 5 (?)
initial functional types↓ E, R)
1 (MS) 1 0 0 0 0
2 (PSI) 26 0 1 0 0
3 (MP) 5 9 8 0 0
4 (MG) 2 3 0 0 0
5 (PSE) 8 0 13 0 7
6 (PG) 1 0 15 4 0
7 (O) 2 0 27 0 0
8 (E) 3 0 63 0 0
9 (R) 1 0 111 0 0
100 9 Summary and Conclusions

Fig. 9.1 Map of the province of Masovia showing the municipality types, obtained from the reverse
clustering performed with DBSCAN algorithm, characterised in Table 9.1

contrary to what one might think by just looking at Table 9.1, are by no means
“outliers”, but rather much narrower classes of units, at which, perhaps, special
attention ought to be devoted. Thus, also this result provides a valuable insight and
cannot be simply rejected on the basis of poor quantitative index value.
The striking clarity and simplicity of this spatial image is, indeed, very telling. In
this context two aspects might—or perhaps—ought to be indicated:
• if this result has really a substantive value in itself (and not just as an element of a
technical analysis and debate) then what is the role of the initial partition PA and
the reverse clustering procedure in obtaining it, as opposed to (compared with) the
potentially straightforward application of a clustering procedure to the respective
data on municipalities (or, in fact, any other procedure, leading to categorisation
of municipalities)?
9.2 Some Final Observations 101

The data set The prior partition


analysed X of X, i.e. PA
The criterion
Q(PA,PB):
similarity of
The clustering the two
algorithms and the Z The resulting partitions
data processing partition of X: PB
parameters: Ω ={Zi}
STOP

Yes
The search (optimisation) procedure: No
maximising the Q(PA,PB)
Z*

Relation of PB(Z*) to PA and the purpose of


producing it, and quality assessment

Fig. 9.2 The meta-scheme of application of the reverse clustering paradigm

• are we capable of reconstructing the (quantitative and qualitative) rationale for this
image? this quite fundamental question (closely related to the one above, having
more qualitative character) calls for some kind of approach that would actually
truly close the superior-level loop (see Fig. 9.2), beyond the one, in which we
now work with the reverse clustering, and which would (more) explicitly give
the answer to the question: is PA correct / appropriate / optimal, and if no, what
should it be?

Reference

Owsiński, J.W.: On dividing an empirical distribution into optimal segments, SIS (Italian Statistical
Society) Scientific Meeting, Rome, June 2012. http://meetings.sis-statistica.org/index.php/sm/
sm2012/paper/viewFile/2368/229

You might also like