MLCH9

Unsupervised Learning
Chapter 9
Unsupervised Learning
No target data are available in training examples We want to explore the data to find some intrinsic structure in them
Why Supervised Learning ?

if labeling expensive, train with small labeled sample then improve with large unlabeled sample if labeling expensive, train with large unlabeled sample then learn classes with small labeled sample tracking concept drift over time by unsupervised learning learn new features by clustering for later use in classication exploratory data analysis with visualization
Clustering
Clustering is a technique for finding similarity groups in data, called clusters.

it groups data instances that are similar to (near) each other in one cluster and data instances that are very different (far away) from each other into different clusters.
Clustering (contd..)
Clustering is often called an unsupervised learning task as no class values denoting an a priori grouping of the data instances are given, which is the case in supervised learning. Due to historical reasons, clustering is often considered synonymous with unsupervised learning.
K-Means Clustering
Suppose that you know the number of clusters, but not what the clusters look like
How do you assign each data point to a cluster? Position k centers at random in the space Assign each point to its nearest center according to some chosen distance measure Move the center to the mean of the points that it represents Iterate
K-Means Algorithm
Initialization
- choose a value for k - choose k random positions in the input space - assign the cluster j centers to those positions
Learning
- Repeat
* for each datapoint xi :
compute the distance to each cluster center assign the datapoint to the nearest cluster center with distance
K-Means Algorithm (contd..)

* For each cluster center: move the position of the center to the mean of the points in that cluster (Nj is the number of points in cluster j):
- Until the cluster centers stop moving
Usage
- for each point:
* compute the distance to each cluster center * assign the datapoint to the nearest cluster center with distance

4 means
. . .. . ++ ^^ *
.. . . . . -
++ -

These are local minima solutions
++ ^^ ++ ^^
++ -
--

More perfectly valid, wrong solutions
++ ++
^^
^^
--
-*

If you dont know the number of means the problem is worse
++ ^^ * ++ ++ + --
--
Dealing with Noise in kMeans Algorithm

Mean average is susceptible to outliers One way to avoid the problem is to replace the mean average with the median, which is what known as a robust statistic (i.e. not affected by outliers) Only change in the algorithm is to replace the computation of the mean with the computation of the median. Calculation median is computationally expensive
k-Means Neural Network

Neuron activation measures distance between input and neuron position in weight space
k-Means Neural Network (contd..)

Weight Space: Image we plot neuronal positions according to their weights
w2
w1 w2 w3
w1
w3
k-Means Neural Network (contd..)

Use winner-take-all neurons Winning neuron is the one closest to input
Best-matching cluster
How do we do training?
Update weights - move neuron positions Move winning neuron towards current input Ignore the rest
Normalization
Suppose the weights are: (0.2, 0.2, -0.1) w2 (0.15, -0.15, 0.1) (10, 10, 10) The input is (0.2, 0.2, -0.1)
w1
w3
Normalization (contd..)
For a perfect match with first neuron: 0.2*0.2 + 0.2*0.2 + -0.1*-0.1 = 0.09 0.15*0.2 + -0.15*0.2 + 0.1*-0.1 = -0.01 10*0.2 + 10*0.2 + 10*-0.1 = 3 Can only compare activations if the weights are about the same size
Normalization (contd..)
Make the distance between each neuron and the origin be 1 All neurons lie on the unit hypersphere Need to stop the weights growing unboundedly
Better Weight Update Rule for k-Means Neural Network

Normalize inputs too Then use:
The On-Line k-Means Algorithm

Initiation
- choose a value for k, which corresponds to the number of output nodes - Initialize the weights to have small random values
Learning
- normalize the data so that all the points lie on the unit sphere - repeat:
* For each datapoint:
Compute the activations of all the nodes Pick the winner as the node with the highest activations Update the weights using
* Until number of iterations is above a threshold
The On-Line k-Means Algorithm (contd..)

Usage * for each test point: compute the activations of all the nodes pick the winner as the node with the highest activation
Vector Quantization (VQ)

Think about the problem of data compression
Want to store a set of data (say, sensor readings) in as small an amount of memory as possible We dont mind some loss of accuracy
Could make a codebook of typical data and index each data point by reference to a codebook entry Thus, VQ is a coding method by mapping each data point x to the closest codeword, i.e., we encode x by replacing it with the closest codeword.
Vector Quantization (contd..)

Voronoi Tesselation:
Join neighboring points
Draw lines equidistant to each pair of points

These are perpendicular to other lines
Vector Quantization (contd..)

Learning Vector Quantization: How to select the prototype vector?
Prototype vector is chosen in such a way that they are as close as possible to all of the possible inputs that we might encounter.
K-Means algorithm can be used to solve the problem of learning vector quantization.
Self-Organizing map
Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen Also called Kohonen Networks, Competitive Learning, Winner-Take-All Learning Generally reduces the dimensions of data through the use of self-organizing neural networks Useful for data visualization; humans cannot visualize high dimensional data so this is often a useful technique to make sense of large data sets
Feature Map
Sounds that are similar (close together) excite neurons that are near to each other Sounds that are very different excite neurons that are a long way off This is known as topology preservation The ordering of the inputs is preserved
If possible (perfectly topology-preserving)
Topology Preservation
Inputs Outputs
Self-Organizing Maps (Kohonen Maps)

Common Output Layout Structure
One-dimensional (completely interconnected for determining winner unit)
Two-dimensional (connections omitted, only neighborhood relations shown)
i
Neighborhood of neuron i
29
The Self-Organizing Map

Inputs
The Self-Organizing Map (contd..)

The weight vectors are randomly initialized Input vectors are presented to the network The neurons are activated proportional to the Euclidean distance between the input and the weight vector The winning node has its weight vector moved closer to the input
So do the neighbors of the winning node

Over time, the network self-organizes so that the input topology is preserved
Self Organization
Global ordering from local interactions
Each neurons sees its neighbors The whole network becomes ordered
Understanding self-organization is part of complexity science

Appears all over the place
Self-Organizing Feature Map Algorithm

Initialization:
- choose a size (number of neurons) and number of dimensions d for the map - Either:
* choose random values for the weight vectors so that they are all different OR * set the weight values to increase in the direction of the first d principal components of the dataset
Self-Organizing Feature Map Algorithm (contd..)

Learning - repeat:
* for each datapoint:
select the best-matching neuron nb using the minimum Euclidean distance between the weights and the input,
* update the weight vector of the best-matching node using:
where (t) is the learning rate.

* update the weight vector of all other neurons using:
where n(t) is the learning rate for neighborhood nodes, and h(b,t) is the neighborhood function, which decides whether each neuron should be included in the neighborhood of the winning neuron (so h = 1 for neighbors and h = 0 for non-neighbors)
* Reduce the learning rates and adjust the neighborhood function, typically by (t+1) = (t)k/kmax where 01 decides how fast the size decreases, k is the number of iterations the algorithm has been running for, and kmax is when you want the learning to stop. The same equation is used for both learning rates (, n) and the neighborhood function h(nb,t).
Until the map stops changing or some maximum number of iterations is exceeded

Usage:
- for each test point:
* select the best-matching neuron nb using the minimum Euclidean distance between the weights and the input:
Result of Algorithm
Initially, some output nodes will randomly be a little closer to some particular type of input These nodes become winners and the weights move them even closer to the inputs
Over time nodes in the output become representative prototypes for examples in the input
Note there is no supervised training here
Classification:
Given new input, the class is the output node that is the winner
Weight Initialization
Generally, Weights are randomly selected as in MLP. Principle Components Analysis is also used to initialize the weights in the network. It finds two largest directions of variation in the data and initialize the weights so that they increase along these two directions. (batch mode)
Neighborhood
Neighborhood is another parameter that we need to control How large should the neighborhood of a neuron be?
Initially it is large because network is unordered (two nodes that are very close in weight space could be on opposite sides of map and vice versa) However, once the network has been learning for a while, the rough ordering has already been created, and the algorithm starts to fine-tune the individual local regions of the network the neighborhood size reduces.
Therefore the networks neighbor size reduces when it adapts These two phases of learning are known as ordering and convergence.
Selecting the Neighborhood

Typically, a Sombrero Function or Gaussian function is used
Strength
Distance
Neighborhood size usually decreases over time to allow initial jockeying for position and then fine-tuning as algorithm proceeds
Neighborhood
Before training (large neighborhood)
Neighborhood (contd..)
After training (small neighborhood)
Network Dimensionality
It depends on the intrinsic dimensionality (the number of the dimension that you actually need to represent the data) Noise and other inaccuracies in data often leads to being represented in more dimensions than are actually required, and so finding the intrinsic dimensionality can help reduce the noise.
Network Boundary
In some cases we can strictly define the boundary (e.g. if we are arranging the sounds from low pitch to high pitch, then we the lowest and highest pitches we cab hear are obvious end points ) However, there are some cases where we cant exactly define the boundary. In this case we might want to remove the boundary conditions. We do it by removing the boundary by tying the ends together.
In 1D, we turn line into a circle, while in 2D we turn a rectangle into a torus. Generally it means there are no neurons on the edge of feature maps.
Network Size
We have to predetermine the network size Big network
Each neuron represents exact feature Not much generalization
Small network
Too much generalization No differentiation
Try different sizes and pick the best

MLCH9

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MLCH9

Uploaded by

Copyright:

Available Formats

Unsupervised Learning

Why Supervised Learning ?

Clustering is a technique for finding similarity groups in data, called clusters.

K-Means Algorithm (contd..)

- Until the cluster centers stop moving

K-Means Algorithm (contd..)

K-Means Algorithm (contd..)

K-Means Algorithm (contd..)

K-Means Algorithm (contd..)

Dealing with Noise in kMeans Algorithm

k-Means Neural Network

k-Means Neural Network (contd..)

k-Means Neural Network (contd..)

Better Weight Update Rule for k-Means Neural Network

The On-Line k-Means Algorithm

* Until number of iterations is above a threshold

The On-Line k-Means Algorithm (contd..)

Vector Quantization (VQ)

Vector Quantization (contd..)

Draw lines equidistant to each pair of points

Vector Quantization (contd..)

Self-Organizing Maps (Kohonen Maps)

Two-dimensional (connections omitted, only neighborhood relations shown)

The Self-Organizing Map

The Self-Organizing Map (contd..)

So do the neighbors of the winning node

Understanding self-organization is part of complexity science

Self-Organizing Feature Map Algorithm

Self-Organizing Feature Map Algorithm (contd..)

* update the weight vector of the best-matching node using:

where (t) is the learning rate.

Self-Organizing Feature Map Algorithm (contd..)

Self-Organizing Feature Map Algorithm (contd..)

Selecting the Neighborhood

Try different sizes and pick the best

You might also like