You are on page 1of 18

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

Thiago Henrique Cupertino, PhD Candidate Thiago Christiano Silva, PhD Candidate Liang Zhao, PhD
Department of Computer Sciences Institute of Mathematics and Computer Science (ICMC) University of So Paulo (USP)

Macei Alagoas - Brazil

TOPICS
Motivation Objectives Approach Using Particle Competition Model Description Simulation Results Conclusion and Future Works Acknowledgements
Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering
2 2

29/1/2012

MOTIVATION
A network presenting cluster structure

Mapping data items into an underlying network can reveal topological relationships. We can use techniques to detect groups of nodes and then cluster the original data

S. Fortunato, Community detection in graphs, Physics Reports, vol. 486, pp. 75 174, 2010.

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

3 3

OBJECTIVE
Develop a competition mechanism in which particles walking in a network, constructed from data, compete with each other to occupy and identify clusters.
Methodology:
Map data items into an underlying graph
Nodes = data items Links = similarity between data items

Use the competition mechanism to cluster data


Our method identifies groups of nodes which correspond to data clusters

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

4 4

APPROACH USING PARTICLE COMPETITION


Similar to many natural and social processes, such as resource competition by animals, territory exploration by humans (or animals), election campaigns etc Several particles walk in the network and compete with each other to mark their own territory (occupy as many nodes as possible), while attempting to reject intruder particles Each particle performs:
random walk by choosing any neighbor to visit a biased walk by choosing the node with the highest domination to visit or a combination of them
-M. G. Quiles, L. Zhao, R. L. Alonso, and R. A. F. Romero, Particle competition for complex network community detection, Chaos, vol. 18, no. 3, p. 033107, 2008. - Breve, Fabrcio A.; Zhao, Liang; Quiles, Marcos G.; Pedrycz, Witold; LIU, Jimming , Particle Competition and Cooperation in Networks for Semi-Supervised Learning . IEEE Transactions on Knowledge and Data Engineering (Print). , 2011. (in press)

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

5 5

MODEL DESCRIPTION: General Dynamics


Particle s energy is increased when it visits an own node

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

6 6

MODEL DESCRIPTION: General Dynamics


Particle s energy is decreased when it visits a node owned by other particle

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

7 7

MODEL DESCRIPTION: General Dynamics


When a particle visits a node it increases its dominance on that node

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

8 8

MODEL DESCRIPTION: General Modeling


Convex combination between biased and random walks:

If a particle k is alive, S = 0 Otherwise, when S = 1, particle k is dead and it is restored to a randomly chosen node in which it dominates
29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

9 9

MODEL DESCRIPTION: Random Walk


The probability is based on link strength connecting the particles current node i and a neighbor node j

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

10 10

MODEL DESCRIPTION: Biased Walk


Based on link strength connecting the particles current node i and a neighbor node j and particles domination level on node j

Relative domination level of particle k on node i: Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

29/1/2012

11 11

SIMULATION RESULTS: Data Sets


Modified NIST Data Set
Handwritten numerical digits (0 to 9) Training set: 60 000 Test set: 10 000 Images normalized to 20 x 20 pixels

Letter Recognition Data Set


26 classes 20 000 samples 16 attributes
-Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278 2324, 1998. - A. Frank and A. Asuncion, UCI machine learning repository, 2010.

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

12 12

SIMULATION RESULTS: Network Formation


k-nearest neighbors Dissimilarity measure based on eigenvalues:

(.) is monotonically decreasing:

MNIST: k=3 Letter Recognition: k=6


29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

13 13

SIMULATION RESULTS: Optimal Number of Particles


How many particles need to be put into the network? We can check the maximum domination level on each node:

R R

0 indicates an intense competition 1 indicates the competition has ceased

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

14 14

SIMULATION RESULTS: Optimal Number of Particles

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

15 15

SIMULATION RESULTS: Comparison With Other Techniques

- F. Ratle, J. Weston, and M. L. Miller, Large-scale clustering through functional embedding, in Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II, ser. ECML PKDD 08. Springer-Verlag, 2008, pp. 266 281. -J. Liu, D. Cai, and X. He, Gaussian mixture model with local consistency, in AAAI 10, vol. 1, 2010, pp. 512 517. -C. M. Bishop, Pattern Recognition and Machine Learning. Springer, 2006. - J. B. MacQueen, Some methods for classification and analysis of multivariate observations, in Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. University of California Press, 1967, pp. 281 297. -J. Shi and J. Malik, Normalized cut and image segmentation, Berkeley, CA, USA, Tech. Rep., 1997.

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

16 16

CONCLUSION AND FUTURE WORKS


It has been formulated a mathematical model for competitive learning in networks Experiments on synthetic and real-world data sets assessed the performance of the proposed framework Also, it has been derived an embedded technique for determining the number of clusters As a future work it can be considered some applications and extensions of the model:
Detection of overlapping structures or vertices in the network Usage of different number of particles to provide hierarchical clustering

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

17 17

ACKNOWLEDGEMENTS

29/1/2012

Stochastic Competitive Learning Applied to Handwritten Digit and Letter Clustering

18 18

You might also like