You are on page 1of 6

Pattern-based Classification via a High Level

Approach using Tourist Walks in Networks

Thiago Christiano Silva and Liang Zhao

Institute of Mathematics and Computer Science (ICMC),


University of São Paulo (USP)
Av. Trabalhador São-carlense, 400, 13560-970, São Carlos, SP, Brazil
Email: {thiagoch, zhao}@icmc.usp.br

Abstract—Traditional data classification considers only phys- Following the literature stream on such matter, the devel-
ical features (e.g., geometrical or statistical features) of the input oped works that are most related to high level classification are
data. Here, it is referred to low level classification. In contrast, the contextual classification techniques [4], [5], which consider
the human (animal) brain performs both low and high orders the spatial relationships between the individual pixels and the
of learning and it has facility in identifying patterns according local and global configurations of neighboring pixels in assign-
to the semantic meaning of the input data. Data classification
ing classes. The underlying assumption is that the neighboring
that considers not only physical attributes but also the pattern
formation is here called high level classification. In this paper, pixels are strongly related, i.e., the structures of classes are
we present an alternative technique which combines both low likely to be found in the presence of others. In the proposed
and high level data classification techniques. The low level term high level classification method, the spatial information is used
can be implemented by any classification technique, while the only in the network construction phase. The difference between
high level term is realized by means of the extraction of the the contextual classification and the proposed approach lies in
underlying network’s features (graph) constructed from the input the following factor: once the network is constructed from the
data, which measures the compliance of the test instances with original data, the proposed classification method does not use
the pattern formation of the training data. Out of various high the spatial relation among the data items anymore, but instead
level perspectives that can be utilized to capture semantical it exploits the topological properties of the network for data
meaning, we utilize the dynamical features that are generated
analysis. The topological properties have been shown to be
from a tourist walker in a networked environment. Specifically, a
weighted combination of transient and cycle lengths are employed quite useful in data analysis. For example, they are applied
for that end. Furthermore, we show computer simulations with to detect irregular cluster forms by a data clustering or a
synthetic and widely accepted real-world data sets from the semi-supervised classification algorithm with a unique distance
machine learning literature. Interestingly, our study shows that measure [6]–[10].
the proposed technique is able to further improve the already
optimized performance of traditional classification techniques. Therefore, it is clear that, from the viewpoint of high
level classification, the available approaches are quite restricted
to the types of data, such as the contextual classification,
I. I NTRODUCTION which is devoted to considering spatial relationship among
Supervised data classification aims at generating a map pixels in image processing. To our knowledge, it is still
from the input data to the corresponding desired output, for lacking an explicit and general scheme to deal with high level
a given training set. The constructed map, called a classifier, classification in the literature, which is quite desirable for many
is used to predict new input instances. Many supervised data applications, such as invariant pattern recognition. The current
classification techniques have been developed [1]–[3], such paper presents an endower to this direction.
as k-nearest neighbors, Bayesian classifiers, neural networks,
decision trees, committee machines, and so on. In essence, As mentioned, low level classification usually presents
all these techniques train and, consequently, classify unlabeled difficulty in identifying the complex relationships among the
data items according to the physical features (e.g., distance or data items. Consequently, these techniques are not suitable
similarity) of the input data. Here, these techniques that predict for uncovering semantically meaningful patterns formed by
using physical features or class topologies but not the class the data. This is because the data patterns are often not
pattern formation are called low level classification techniques. encountered with a fixed shape or distribution, instead, they are
frequently determined by the local and/or global interactions
Not rarely, the data items are not isolated points in the among the data items. It is well-known that the network
attribute space, but instead tend to form certain patterns. The representation can capture arbitrary levels of relationships or
human (animal) brain performs both low and high orders of interactions of the input data. For this reason, we here show
learning and it has facility in identifying patterns according to how the topological properties of the input data can help in
the semantic meaning of the input data. However, this kind of identifying the pattern formation and, consequently, can be
task, in general, is still hard to be performed by computers. used for general high level classification. In this case, the
Supervised data classification by considering not only physical topological properties are revealed by tourist walks. A tourist
attributes and class topologies but also pattern formation is here walk can be defined as follows: given a set of cities, each
referred to as high level classification. time the tourist (walker) goes to the nearest city that has not
been visited in the past µ time steps [11]. It has been shown
that tourist walk is useful for data clustering [12] and image ℊc1 ℊc3 ℊ′c1 ℊ′c3
processing [13]. However, all these kinds of works are realized
in regular lattices. In other words, the walker performs partially
ℊ′c2
self-avoiding deterministic walks over the data set, where the ℊc2
self-avoiding factor is limited to the memory window µ − 1.
This quantity can be understood as a repulsive force emanating
from the sites in this memory window, which prevents the
walker from visiting them in this interval (refractory time).
Therefore, it is prohibited that a trajectory to intersect itself (a) (b)
inside this memory window. In spite of being a simple rule,
Fig. 1. (a) Schematic of the network in the training phase. (b) Schematic of
it has been shown that this movement dynamic possesses how the classification inference is done.
complex behavior when µ > 1 [11].
In this paper, we propose a technique that combines the
low level and the high level supervised data classifications. The In the figure, the surrounding circles denote these
idea of this paper is built upon the general framework proposed components: GC1 , GC2 , and GC3 .
by [14]. The low level classification can be implemented • Classification Phase: the test instances are presented
by any traditional classification technique, while the high to the classifier one by one. Since we do not know
level classification exploits the complex topological properties the label of the test instance, it is temporally inserted
of the underlying network constructed from the input data. into the network in a way that it is connected to its
Improvements of the low level classification can be achieved most similar vertices. Once the data item is inserted,
by the combination of the two levels of learning. The main each class analyzes, in isolation, its impact on the
contribution of this work is the development of a new method respective class component using the complex topo-
for extracting semantical and organizational features from a logical features of it. In the high level model, each
given network constructed from the input data. This extraction class retains an isolated graph component. Each of
is performed by aggregating different measures derived from these components calculate the changes that occur in
the dynamics of a tourist walks. Specifically, we will see that its pattern formation with the insertion of this test
the weighted transient and cycle lengths for different tourist instance. If slight or no changes occur, then it is said
walker’s memory windows form the decision inferred from that the test instance is in compliance with that class
the high level classifier. We show the construction of such pattern. As a result, the high level classifier yields
classifier is built upon the intuitive idea that the topological a great membership value for that test instance on
information of the data relationships is naturally extracted from that class. Conversely, if these changes dramatically
those two quantities from a local to global basis as we increase modify the class pattern, then the high level classifier
the tourist walker’s memory window. This happen because, as produces a small membership value on that class.
one increases the memory window, the walker is compelled to These changes are quantified via network measures,
explore vertices that are not in the vicinity of its starting region. each of which numerically translating the organization
Therefore, the dynamics of the transient and cycle lengths of the component from a local to global fashion.
capture the organizational features of a network from different For the sake of clarity, Fig. 1b exhibits a schematic
perspectives. of how the classification process is performed. Note
The remainder of the paper is organized as follows. The that, once the test instance gets classified, it is either
model definition is presented in Section II. Computer simula- discarded or incorporated to the training set with the
tions are performed on synthetic and real-world data sets in corresponding predicted label. In any case, each class
Section III. Finally, Section IV concludes the paper. is still represented by a single graph component.

II. M ODEL D ESCRIPTION B. High Level Classification Framework


In this section, the high level classification technique is The hybrid classifier M (low and high level classifiers
presented. It is worth noting that the general idea of this paper jointly considered) consists of a convex combination of two
is built upon the framework proposed by [14]. terms: (i) a low level classifier and (ii) a high level classifier,
which is responsible for classifying a test instance according
A. Overview of the Technique to its pattern formation with the data. Mathematically, the
The proposed technique works in a fully supervised membership of the test instance xi ∈ Xtest with respect to
network-based environment. With this in mind, the induction the class j ∈ L yielded by the hybrid framework, here written
(j)
is performed in two steps: as Fi , is given by:

• Training Phase: the data items are mapped into a


graph such that each class holds a unique component Fi
(j) (j)
= (1 − ρ)Li + ρHi ,
(j)
(1)
(sub-graph). For example, Fig. 1a shows a schematic
of how the network looks like for a three-class prob-
(j)
lem when the training phase has been completed. In where Li ∈ [0, 1] denotes the membership of the test instance
this case, each class holds a representative component. xi towards class j produced by an arbitrary traditional (low
(j)
level) classifier; Hi ∈ [0, 1] indicates the same membership
information yielded by a high level classifier; and ρ ∈ [0, 1] is (j) (j)
the compliance term, which plays the role of counterbalancing Ti (µ) = ∆ti (µ)p(j)
(j) (j)
(4)
the classification decisions supplied by both low and high level Ci (µ) = ∆ci (µ)p(j)
classifiers.
(j) (j)
A test instance receives the label from the class j that where ∆ti (u), ∆ci (u) ∈ [0, 1] are the variations of the
maximizes (1). Mathematically, the estimated label of the test transient and cycle lengths that occur on the component
instance xi , ŷxi , is given by: representing class j if i joins it and p(j) ∈ [0, 1] is the
proportion of data items pertaining to class j.
(j) (j)
(j) Next, we explain how to compute ∆ti (µ) and ∆ci (µ)
ŷxi = arg max Fi . (2) that appear in (4). Firstly, we need to numerically quantify the
j∈L
transient and cycle lengths of a component. Since the tourist
walks are strongly dependent on the starting vertices, for a
Equation (1) supplies a general framework for the hybrid fixed µ, we perform tourist walks initiating from each one
classification process, in the sense that various supervised data of the vertices that are members of a class component. The
classification techniques can be brought into play. The first transient and cycle lengths of the jth component, hti i and
(j)
term of (1) is rather straightforward to implement, since it can (j)
hci i, are simply given by the average transient and cycle
be any traditional classification technique. In contrast, high
lengths of all its vertices, respectively. In order to estimate the
level approaches are relatively scarce in the literature. In the
variation of the component’s network measures, consider that
following, we discuss a possible approach for that end.
xi ∈ Xtest is a test instance. In relation to an arbitrary class
Motivated by the intrinsic ability to describe topological j, we virtually insert xi into component j using the network
structures among the data items, we propose a network- formation technique that we have seen, and recalculate the
based (graph-based) technique for the high level classifier H. new average transient and cycle lengths of this component.
(j) (j)
Specifically, the inference of pattern formation within the data We denote these new values as ht0 i i and hc0 i i, respectively.
is processed using the generated network. In order to do so, This procedure is performed for all classes j ∈ L. It may occur
the following structural constraints must be satisfied for any that some classes u ∈ L will not share any connections with
(k) (k)
constructed network: the test instance xi . Using this approach, hti i = ht0 i i and
(k) 0 (k)
hci i = hc i i, which is undesirable, since this configuration
i. Each class is an isolated subgraph (graph compo-
would state that xi complies perfectly with class u. In order to
nent);
overcome this problem, a simple post-processing is necessary:
ii. Each class retains a representative and unique graph
For all components u ∈ L that do not share at least 1 link
component. (j) (j)
with xi , we deliberately set ht0 i i and hc0 i i to a high value.
Having in mind the basic concepts revolving around tourist This high value must be greater than the largest variation that
walks, we proceed to explain the high level classifier based on occurs in a component which shares a link with the data item
it. In mathematical terms, its decision output is given by: under analysis. One may interpret this post-processing as a
way to state that xi does not share any pattern formation with
class u, since it is not even connected to it.
Pµc h (j) (j)
i
With all this information at hand, we are able to calculate
αt (1 − Ti (µ)) + αc (1 − Ci (µ))
µ=0 (j) (j)
(j)
Hi =P Pµc h i ∆ti (µ) and ∆ci (µ), ∀j ∈ L, as follows:
(g) (g)
g∈L µ=0 αt (1 − Ti (µ)) + αc (1 − Ci (µ))
(3) (j) (j)
(j) |ht0 i i − hti i|
∆ti (µ) = P
0 (u) (u)
where µc is a critical value that indicates the maximum u∈L |ht i i − hti i|
memory length of the tourist walks, αt , αc ∈ [0, 1] are user- (j) (j)
(5)
(j) |hc0 i i − hci i|
controllable coefficients that indicate the influence of each ∆ci (µ) =
(j) 0 (u) (u)
P
network measure in the process of classification, Ti (µ) and u∈L |hc i i − hci i|
(j)
Ci (µ) are functions that depend on the transient and cycle
lengths, respectively, of the tourist walk applied to the ith data where the denominator is introduced only for normalization
item with regard to the class j. These functions are responsible matters. According to (5), for insertions that result in a
for providing an estimative whether or not the data item i considerable variation of the component’s transient and cycle
(j) (j)
under analysis possesses the same patterns of the component lengths, ∆ti (µ) and ∆ci (µ) will be high. In view of (4),
j. The denominator in (3) has been introduced solely for (j) (j)
Ti (µ) and Ci (µ) are expected to be also high, yielding a
normalization matters. Indeed, in order to (3) to be a valid low membership value predicted by the high level classifier
convex combination of network measures, αt and αc must be (j)
Hi , as (3) reveals. On the other hand, for insertions that do
chosen such as to satisfy αt + αc = 1.
not significantly interfere in the pattern formation of the data,
(j) (j) (j) (j) (j)
Regarding Ti (µ) and Ci (µ), they are given by the ∆ti (µ) and ∆ci (µ) will be low, and, as a result, Ti (µ)
(j)
following expressions: and Ci (µ) are expected to be also low, producing a high
where V is the number of vertices and 1{.} is the indicator
(j)
membership value for the high level classifier Hi , as (3)
exposes. function that yields 1 if the argument is logically true, or
0, otherwise. In view of the introduction of this mechanism,
The network-based high level classifier quantifies the vari- we expect to obviate the effects of unbalanced classes in the
ations of the transient and cycle lengths of tourist walks with classification process.
limited memory µ that occur in the class components when
a test instance artificially joins each of them in isolation.
According to (3), this procedure is performed for several values III. C OMPUTER S IMULATIONS
of the memory length µ, ranging from 0 (memoryless) to a In this section, we present simulation results in order to
critical value µc . This is done in order to capture complex assess the effectiveness of the proposed high level classification
patterns of each of the representative class components in a model. In the adopted methodology, the error estimation is
local to global fashion. When µ is small, the walks tend to performed by employing a stratified 10-fold cross-validation.
possess a small transient and cycle parts, so that the walker
does not wander far away from the starting vertex. In this way, A. Synthetic Data Sets
the walking mechanism is responsible for capturing the local
structures of the class component. On the other hand, when
µ increases, the walker is compelled to venture deep into the
component, possibly very far away from its starting vertex. In
this case, the walking process is responsible for capturing the
global features of the component. In summary, the fundamental
idea of the high level classifier is to make use of a mixture of
local and global features of the class components by means of
a combination of tourist walks with different values of µ. 1

(j) 0.75
With respect to fi (u), it possesses a general closed form

λmin
0.5
given by:
0.25

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(j) (j)
fi (u) = ∆Gi (u)p(j) , (6)
Fig. 2. Minimum value of the compliance term, λmin , that results in
(j) the correct classification of the missing test instances. Traditional techniques
where ∆Gi (u) ∈ [0, 1] is the variation of the uth network
would definitely fail to correctly classify the straight line that diametrally
measure that occurs on the component representing class j if crosses the densely connected component pertaining to the blue or “square”
xi joins it and p(j) ∈ [0, 1] is the proportion of data items class.
pertaining to the class j. Remembering that each class has
a component representing itself, the strategy to check the In this section, we consider a very interesting problem
pattern compliance of a test instance is to examine whether of data classification in networks presenting clear patterns
its insertion causes a great variation of the network measures along different classes. The following example serves as a gist
representing the class component. In other words, if there is of the role that the compliance term plays in the inference
a small change in the network measures, the test instance is process of the high level classifier. For this end, consider the
in compliance with all the other data items that comprise that classification problem arranged in Fig. 2. Here, we are going to
class component, i.e., it follows the same pattern as the original empirically calculate the minimum required compliance term
members of that class. On the other hand, if its insertion λmin for which the data items from the test set are classified
is responsible for a significant variation of the component’s as members of the red or “circle” class. In the figure, one
network measures, then probably the test instance may not can see that there is a segment of line representing the red
belong to that class. This is exactly the behavior that (3) or “circle” class (9 vertices) and also a condensed rectangular
together with (6) propose, since a small variation of f (u) class outlined by the blue or “square” class (1000 vertices).
causes a large membership value output by H; and vice versa. The network formation in the training phase uses k = 1 and
 = 0.07 (this radius covers, for any vertex in the straight line,
Next, we proceed to explain the role of p(j) ∈ [0, 1] in 2 adjacent vertices, except for the vertices in each end). In
(6). In real-world databases, unbalanced classes are usually the classification phase, we use the same  = 0.07. The fuzzy
encountered. In general, a database frequently encompasses SVM technique with RBF kernel (C = 22 and γ = 2−1 ) is
several classes of different sizes. A great portion of the network employed as the traditional low level classifier. The task is
measures is very sensitive to the size of the components. In to classify the 14 test instances depicted by the big triangle-
an attempt to soften this problem, we introduce in (6) the shape items from left to right. After a test instance is classified,
term p(j) , which is the proportion of vertices that class j has. it is incorporated to the training set with the corresponding
Mathematically, it is given by: predicted label. The graphic embedded in Fig. 2 shows the
minimum required value of λmin for which the triangle-shaped
items are classified as members of the red or “circle” class.
V This graphic is constructed according to the triangle-shaped
1 X
p(j) = 1{yu =j} , (7) element that is exactly at the same position with respect to
V u=1
the x-axis in the scatter plot drawn above. For example, the
first triangle-shaped data item can be correctly classified if the memory window µ. In the latter case, it is expected that
one chooses λ ≥ λmin ≈ 0.37. The second and third data the walker will perform long jumps in the data set once the
items would require at least λmin = 0.81 and λmin = 0.96, memory length µ assumes large values. It will be verified
respectively, and so on. Specifically, as the straight line crosses that such mechanism is not very welcomed in classification
the condensed region of the blue class, the compliance term tasks. Furthermore, this serves as a strong argument for the
approaches λ → 1, since one cannot establish its decision employment and introduction of network-based high level
based on the low level classifier, because it would erroneously classifiers. With respect to the low level classifiers, we utilize
decide favorable to the blue or “square” class. the Bayesian Networks, Weighted k-Nearest Neighbors (k-
NN), Multi-layer perceptron (MLP), Support Vector Machines
From this analysis, we can see that the value of the (SVM),
compliance term is responsible for changing the vision of
the classifier towards the data relationships. That is, when λ Table II reports the results obtained by the proposed
is low, the perspective that the classifier employs is the one technique and the referred competing techniques. The average
based on mere physical or statistical properties of the data. value of 100 runs using the stratified 10-fold cross-validation is
However, as λ increases, the reigning perspective starts to be reported. Also, for each result, three different types of results
the organizational features that the data relationships have. are indicated as follows:
Midpoints in-between these two extremes lead to complex-
behaviored classification schemes, where a convex combina- • “Pure” row: only the low level classifier is utilized (λ = 0).
tion of different perspectives (low and high level) is taken into In this case, inside the parentheses are indicated the best
account. parameters obtained from the optimization process for each
technique;
• “Networkless” row: a mixture of low and high level classi-
B. Simulations on Real-World Data Sets
fiers is employed. The value inside the parentheses indicates
In this section, we will apply the proposed framework to the best compliance term λ. Moreover, the tourist walks are
several well-known UCI data sets. The most relevant metadata performed in a networkless environment, i.e., the tourist can
of each data set is given in Table I. For a detailed description, visit any other site (data item) apart from those contained
refer to [15]. Concerning the numerical attributes, the recip- in the memory window µ.
rocal of the Euclidean distance is employed. For categorical • “Network” row: the same setup as before, but the tourist
examples, the overlap similarity measure [16] is utilized. All walks are conducted on a networked environment. Here, the
data sets are submitted to a standardization pre-processing step. network in the training phase is built using k = 1 and  =
0.03. The values inside the parentheses exhibit: the  used
TABLE I. B RIEF INFORMATION OF THE DATA SETS . in the classification phase and the best compliance term λ,
respectively.
Data Set # Samples # Dimensions # Classes αt αc
For the sake of clarity, take the first entry of Table II.
Yeast 1484 8 10 0.40 0.60 The pure low level classifier Bayesian Networks achieved an
Teaching 151 5 3 0.50 0.50
Zoo 101 16 7 0.30 0.70
accuracy rate of 57.8 ± 2.6 (λ = 0). However, if we use the
Wine 178 13 3 0.40 0.60 proposed technique in a networkless environment, the accuracy
Iris 150 4 3 0.40 0.60 rate is refined, achieving 58.6±2.3 when λ = 0.04. Now, when
Glass 214 9 6 0.50 0.50 the proposed technique is used in a network environment, the
Vehicle 846 18 4 0.60 0.70 accuracy rate reaches 66.3 ± 2.6 when λ = 0.28. In general,
Letter 20000 16 26 0.40 0.60 the proposed technique is able to boost the accuracy rates of
the data sets under analysis. Furthermore, we can see that the
Here, the high level classifier is composed of the best networked high level classifier can outperform the networkless
weighted combination of transient and cycle lengths. The version.
optimization process is done by encountering αt × αc ∈
{0, 0.1, . . . , 1} × {0, 0.1, . . . , 1} (search space), subjected to IV. C ONCLUSIONS
αt + αc = 1, which result in the highest accuracy rate of the
model. The critical memory length is fixed to µc = 0.3nmax , In this work, we have proposed a new technique of data
where nmax indicates the size of the largest component. The classification combining low and high level classifiers. The
determination of the critical memory length is left as future former classifies data instances by their physical features and
work, due to the page limit. However, it is worth noting that the latter measures the compliance of the test instance with
the maximum memory length is dependent on the data set’s the pattern formation of the input data. To this end, tourist
characteristics and reflects its complexity in terms of data walk has been employed to capture the complex topological
relationships. This conjecture will be shown in a future work. properties of the network constructed from the input data.
The parameter optimization results are given in the last two Several experiments are conducted on synthetic and real-world
columns of Table I. data sets, so that we can better assess the performance of
the proposed framework. A quite interesting feature of the
Here, we will deal with two kinds of high level classifiers: proposed technique is that the high level term influence has
(i) one in which the tourist walks are performed in a network to be increased in order to get correct classification as the
constructed from the vector-based data set and (ii) one in complexity of the class configuration increased. This means
which the tourist walks are realized in a lattice, i.e., the tourist that the high level term is specially useful in complex situations
is free to visit any other data site apart from the ones in of classification. Also, it is worth observing that the application
TABLE II. ACCURACY RATE ACHIEVED BY SEVERAL LOW LEVEL CLASSIFICATION TECHNIQUES AND THE HIGH LEVEL CLASSIFIER WITH AND
WITHOUT NETWORKS . I N THE ROW NAMED “P URE ”, THE OPTIMIZED PARAMETERS OF EACH LOW LEVEL TECHNIQUE ARE REPORTED AS FOLLOWS :
W EIGHTED kNN (k), MLP ( NUMBER OF LAYERS , LEARNING RATE , MOMENTUM ), AND FUZZY M-SVM (C, γ).

Data Set Type Bayesian Networks Weighted KNN MLP Fuzzy M-SVM
Pure 57.8 ± 2.6 60.9 ± 3.6 (16) 56.2 ± 3.9 (4, 0.3, 0.2) 58.9 ± 4.8 (211 , 20 )
Yeast Networkless 58.6 ± 2.3 (0.04) 62.0 ± 3.2 (0.07) 56.9 ± 2.5 (0.05) 60.2 ± 4.6 (0.14)
Network 66.3 ± 2.6 (0.05, 0.28) 65.7 ± 4.0 (0.03, 0.19) 63.3 ± 2.9 (0.05, 0.23) 69.8 ± 3.8 (0.05, 0.28)
Pure 61.3 ± 8.8 63.0 ± 12.3 (9) 60.9 ± 9.4 (7, 0.2, 0.4) 52.5 ± 7.9 (26 , 23 )
Teaching Networkless 63.5 ± 9.3 (0.18) 63.8 ± 10.6 (0.12) 62.0 ± 7.7 (0.13) 55.3 ± 8.6 (0.18)
Network 67.2 ± 6.6 (0.03, 0.24) 68.5 ± 7.4 (0.04, 0.19) 67.8 ± 6.1 (0.04, 0.26) 62.7 ± 6.9 (0.04, 0.33)
Pure 95.9 ± 4.3 96.2 ± 5.8 (1) 96.1 ± 6.9 (3, 0.4, 0.5) 96.3 ± 6.4 (21 , 21 )
Zoo Networkless 96.0 ± 3.6 (0.02) 96.5 ± 5.2 (0.04) 96.4 ± 6.6 (0.04) 96.5 ± 4.5 (0.05)
Network 97.3 ± 4.3 (0.02, 0.06) 97.5 ± 4.4 (0.02, 0.09) 97.5 ± 4.2 (0.02, 0.09) 97.5 ± 2.3 (0.02, 0.08)
Pure 98.8 ± 0.7 94.6 ± 1.4 (1) 97.8 ± 0.5 (3, 0.6, 0.4) 98.9 ± 0.2 (211 , 22 )
Wine Networkless 98.8 ± 0.7 (0.00) 94.6 ± 2.1 (0.00) 97.9 ± 0.3 (0.03) 98.9 ± 0.2 (0.00)
Network 98.8 ± 0.7 (-, 0.00) 96.3 ± 1.0 (0.03, 0.07) 98.6 ± 0.3 (0.02, 0.09) 98.9 ± 0.2 (-, 0.00)
Pure 92.7 ± 1.2 97.9 ± 3.3 (19) 94.0 ± 2.9 (1, 0.3, 0.2) 97.0 ± 4.6 (2−2 , 23 )
Iris Networkless 93.2 ± 1.9 (0.07) 97.9 ± 3.3 (0.00) 94.8 ± 2.8 (0.10) 97.2 ± 3.7 (0.05)
Network 94.9 ± 0.4 (0.01, 0.15) 98.3 ± 0.6 (0.01, 0.05) 96.5 ± 1.1 (0.02, 0.21) 98.1 ± 1.0 (0.01, 0.13)
Pure 70.6 ± 7.7 71.8 ± 9.0 (1) 67.3 ± 5.0 (7, 0.1, 0.3) 72.4 ± 5.6 (210 , 24 )
Glass Networkless 71.5 ± 5.7 (0.14) 72.7 ± 7.1 (0.16) 68.8 ± 3.2 (0.22) 73.3 ± 3.9 (0.12)
Network 79.2 ± 5.3 (0.03, 0.32) 79.7 ± 5.0 (0.35) 77.4 ± 5.5 (0.02, 0.30) 80.1 ± 4.3 (0.03, 0.31)
Pure 68.1 ± 3.8 67.6 ± 4.1 (5) 69.0 ± 4.4 (5, 0.3, 0.2) 84.4 ± 3.4 (210 , 23 )
Vehicle Networkless 70.0 ± 2.6 (0.19) 69.4 ± 2.5 (0.18) 70.3 ± 3.8 (0.13) 84.4 ± 3.4 (0.00)
Network 74.1 ± 2.9 (0.05, 0.26) 73.6 ± 3.0 (0.05, 0.24) 74.7 ± 3.6 (0.07, 0.29) 84.9 ± 2.7 (0.04, 0.07)
Pure 74.4 ± 5.6 96.0 ± 7.6 (1) 88.9 ± 9.9 (3, 0.2, 0.4) 94.8 ± 1.7 (26 , 24 )
Letter Networkless 75.5 ± 4.6 (0.14) 96.0 ± 7.6 (0.00) 89.3 ± 7.4 (0.09) 94.8 ± 1.7 (0.00)
Network 77.8 ± 3.4 (0.04, 0.17) 96.0 ± 7.6 (-, 0.00) 92.1 ± 4.1 (0.04, 0.19) 94.8 ± 1.7 (-, 0.00)

of the tourist walks dynamics in the context of high level [4] R. W. Donaldson and G. T. Toussaint, “Use of contextual constraints
classifier is also a novel approach in the literature. We have in recognition of contour-traced handprinted characters,” IEEE Trans.
shown that, even though such walk is constructed based on Computers, pp. 1096–1099, 1970.
very simple rules, it is able to successfully capture topological [5] A. Micheli, “Neural network for graphs: A contextual constructive
approach,” IEEE Trans. Neural Networks, vol. 20, no. 3, pp. 498–511,
features of the underlying network in a local to global basis. 2009.
We hope our work can provide an alternative way to the [6] G. Karypis, E.-H. Han, and V. Kumar, “Chameleon: hierarchical clus-
tering using dynamic modeling,” IEEE Computer, vol. 32, no. 8, pp.
understanding of high level semantic machine learning. As 68–75, 1999.
a future work, the high level classifier will be extended by [7] S. Fortunato, “Community detection in graphs,” Physics Reports, vol.
considering new network measures, such as degree entropy, 486, pp. 75–174, 2010.
component spectrum, average edge reciprocity, matching in- [8] T. C. Silva and L. Zhao, “Stochastic competitive learning in complex
dices, among many others (see [17] for details about these networks,” IEEE Transactions on Neural Networks and Learning Sys-
measures). Additionally, mechanisms for taking representative tems, vol. 23, no. 3, pp. 385–398, 2012.
samples, rather than recalculating all the tourist walks for all [9] ——, “Network-based stochastic semisupervised learning,” IEEE Trans-
the vertices in the network, are going to be studied. This actions on Neural Networks and Learning Systems, vol. 23, no. 3, pp.
451–466, 2012.
would enable the application of this technique in large-scale
[10] ——, “Detecting and preventing error propagation via competitive
data. Finally, more advanced network construction techniques learning,” Neural Networks, vol. 41, no. 0, pp. 70 – 84, 2013.
are going to be explored, such as adaptive k and -radius
[11] G. F. Lima, A. S. Martinez, and O. Kinouchi, “Deterministic walks in
techniques. random media,” Phy. Rev. Lett., vol. 87, p. 010603, 2001.
[12] M. G. Campiteli, P. D. Batista, O. Kinouchi, and A. S. Martinez,
“Deterministic walks as an algorithm of pattern recognition,” Physical
ACKNOWLEDGMENT Review E, vol. 74, p. 026703, 2006.
This work is supported by the São Paulo State Research [13] A. R. Backes, W. N. Gonçalves, A. S. Martinez, and O. M. Bruno,
“Texture analysisandclassificationusingdeterministictouristwalk,” Pat-
Foundation (FAPESP) and by the Brazilian National Research tern Recognition, vol. 43, pp. 685–694, 2010.
Council (CNPq). [14] T. C. Silva and L. Zhao, “Network-based high level data classification,”
IEEE Transactions on Neural Networks and Learning Systems, vol. 23,
no. 6, pp. 954–970, 2012.
R EFERENCES [15] A. Frank and A. Asuncion, “UCI machine learning repository,” 2010.
[1] V. N. Vapnik, Statistical Learning Theory. New York: Wiley- [16] S. Boriah, V. Chandola, and V. Kumar, “Similarity measures for
Interscience, 2008. categorical data: A comparative evaluation,” in SIAM Data Mining
[2] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed. Conference, 2008, pp. 243–254.
New York, NY: John Wiley & Sons. Inc., 2001. [17] L. F. Costa, F. A. Rodrigues, G. Travieso, and P. R. V. Boas, “Charac-
[3] C. M. Bishop, Pattern Recognition and Machine Learning. Springer, terization of complex networks: A survey of measurements,” Advances
2006. in Physics, vol. 56, no. 1, pp. 167–242, 2007.

You might also like