You are on page 1of 10

INDEX

3rd page unsupervised learning and its applications and difference SL


4th page k means clustering algo (history)
5th page Algo along with different steps involved
6th page Distortion function and its analogy to centroid steps
7th page random init
8th page elbow method
9th page image
10th page image
11th references

IMAGE COMPRESSION

The objective of image compression is to reduce irrelevance and redundancy of the image
data in order to be able to store or transmit data in an efficient form
Image compression may be Lossy or Lossless.
Lossless compression is preferred for archival purposes and often for medical imaging,
technical drawings, clip art, or comics.
Lossy compression methods, especially when used at low bit rates, introduce compression
artifacts. Lossy methods are especially suitable for natural images such as photographs in
applications where minor (sometimes imperceptible) loss of fidelity is acceptable to achieve
a substantial reduction in bit rate. The lossy compression that produces differences may be
called visually lossless.
Methods for lossless image compression are:

Run-length encoding used in default method in PCX and as one of possible in


BMP, TGA, TIFF
Entropy encoding
Adaptive dictionary algorithms such as LZW used in GIF and TIFF

Methods for lossy compression:

Reducing the color space to the most common colors in the image. The selected
colors are specified in the color palette in the header of the compressed image.
Each pixel just references the index of a color in the color palette. (example KMeans Clustering)
Chroma subsampling. This takes advantage of the fact that the human eye
perceives spatial changes of brightness more sharply than those of color, by
averaging or dropping some of the chrominance information in the image.
(example JPEG)

Other

Properties
The best image quality at a given bit-rate (or compression rate) is the main goal of
image compression, however, there are other important properties of image compression
schemes.
This Depends on K.(K clustering algo)

MACHINE LEARNING

Machine learning is the science of getting computers to act without being explicitly
programmed.
A computer program is said to learn from experience E with respect to some class of tasks
T and performance measure P if its performance at tasks in T, as measured by P, improves
with experience E.
Example: playing checkers.

E = the experience of playing many games of checkers


T = the task of playing checkers.
P = the probability that the program will win the next game.

TYPES OF PROBLEMS
Machine learning tasks are typically classified into two broad categories, depending on the
nature of the learning "signal" or "feedback" available to a learning system.

Machine
Learning

Supervise
d

Unsupervised

Supervised learning: The computer is presented with example inputs and their
desired outputs, given by a "teacher", and the goal is to learn a general rule that
maps inputs to outputs.

Unsupervised learning: No labels are given to the learning algorithm, leaving it on its
own to find structure in its input. Unsupervised learning can be a goal in itself
(discovering hidden patterns in data) or a means towards an end.

Supervised Learning
In supervised learning, we are given a data set and already know what our correct output
should look like, having the idea that there is a relationship between the input and the output.
Supervised learning problems are categorized into "regression" and "classification"
problems. In a regression problem, we are trying to predict results within a continuous
output, meaning that we are trying to map input variables to some continuous function. In a
classification problem, we are instead trying to predict results in a discrete output. In other
words, we are trying to map input variables into discrete categories.

Given a set of
training examples of the form
such that is
the feature vector of the i-th example and
is its label , a learning algorithm seeks a
function

, where

is the input space and

is the output space

Example 1:
Given data about the size of houses on the real estate market, try to predict their price. Price
as a function of size is a continuous output, so this is a regression problem.
Another Example for Regression -Given a picture of Male/Female, We have to predict
his/her age on the basis of given picture.
Example 2: Classification - Given a picture of Male/Female, We have to predict Whether
He/She is of High school ,College, Graduate age.
Another Example for Classification - Banks have to decide whether or not to give a loan to
someone on the basis of his credit history.

/* include types of algorithms*/

Unsupervised Learning
Unsupervised learning, on the other hand, allows us to approach problems with little or no
idea what our results should look like. We can derive structure from data where we don't
necessarily know the effect of the variables.
Unsupervised learning is contrasted from supervised learning because it uses an unlabelled
training set rather than a labelled one.
In other words, we don't have the vector y of expected results, we only have a dataset of
features where we can find structure.
We can derive this structure by clustering the data based on relationships among the
variables in the data.
With unsupervised learning there is no feedback based on the prediction results, i.e., there is
no teacher to correct you.
Example:
Take a collection of 1000 essays written on the US Economy, and find a way to automatically
group these essays into a small number that are somehow similar or related by different
variables, such as word frequency, sentence length, page count, and so on.
Recommender system. // search

CLUSTERING
Cluster analysis or clustering is the task of grouping a set of objects in such a way that
objects in the same group (called a cluster) are more similar (in some sense or another) to
each other than to those in other groups (clusters).

The result of a cluster analysis


shown as the coloring of the
squares into three clusters

Clustering is good for:

Market Segmentation: Market researchers use cluster analysis to partition the


general population of consumers into market segments, Clustering can be used
to group all the shopping items available on the web into a set of unique products.
For example, all the items on eBay can be grouped into unique products.

Social network analysis: In the study of social networks, clustering may be used
to recognize communities within large groups of people

Astronomical data analysis

Crime analysis: Cluster analysis can be used to identify areas where there are
greater incidences of particular types of crime. By identifying these distinct areas
or "hot spots" where a similar crime has happened over a period of time, it is
possible to manage law enforcement resources more effectively.

Recommender System: Recommender systems are designed to recommend new


items based on a user's tastes. They sometimes use clustering algorithms to
predict a user's preferences based on the preferences of other users in the user's
cluster.

K-Means Algorithm
The K-Means Algorithm is the most popular and widely used algorithm for automatically
grouping data into coherent subsets.

1. Randomly initialize k points in the dataset called the cluster centroids.


2. Cluster assignment: Assign all examples into one of the k groups based on
which cluster centroid the example is closest to.
3. Move centroid: Compute the averages for all the points inside each of the k
cluster centroid groups, then move the cluster centroid points to those
averages.
4. Re-run (2) and (3) until we have found our k-clusters.

Our main variables are:


a. K (number of clusters)
b. Training set x(1),x(2),,x(m) ,where x(i)n
Pseudo Code:

RandomlyinitializeKclustercentroidsmu(1),mu(2),...,mu(K)
Repeat:
fori=1tom:
c(i):=index(from1toK)ofclustercentroidclosestto
x(i)
fork=1toK:
mu(k):=average(mean)ofpointsassignedtoclusterk

The first for-loop is the 'Cluster Assignment' step. We make a vector c where c(i)
represents the centroid assigned to example x(i).
We can write the operation of the Cluster Assignment step more mathematically as follows:
c(i)=min(k) x(i)(k)2
That is, each c(i) contains the index of the centroid that has minimal distance to x(i).
By convention, we square the right-hand-side, which makes the function we are trying to
minimize more sharply increasing. It is mostly just a convention.
The second for-loop is the 'Move Centroid' step where we move each centroid to the
average of its group.
More formally, the equation for this loop is as follows:
(k)=1/n * [x(k1)+x(k2)++x(kn)]

Where each of x(k1),x(k2),,x(kn) are the training examples assigned to group (k).
If you have a cluster centroid with 0 points assigned to it, it can be randomly re-initialized
to a new point. Or one can also simply eliminate that cluster group.
After a number of iterations the algorithm will converge, where new iterations do not affect
the clusters.

Optimization Objective
Recall some of the parameters we used in our algorithm:
c(i)= index of cluster (1,2,...,K) to which example x(i) is currently assigned
k = cluster centroid k (k n)
c(i) = cluster centroid of cluster to which example x(i) has been assigned
Using these variables we can define our cost function:

J(c

(1)

,,c

(m)

,1,,k)=1/m i=1:mx(i)(k)

Our optimization objective is to minimize all our parameters using the above cost function:

minc, J(c,)
That is, we are finding all the values in sets c, representing all our clusters, and ,
representing all our centroids, that will minimize the average of the distances of every
training example to its corresponding cluster centroid.
The above cost function is often called the distortion of the training examples.
In the cluster assignment step, our goal is to:
Minimize J() wrt c

(1)

,,c

(m)

(holding 1,,k fixed)

In the move centroid step, our goal is to:


Minimize J() wrt 1,,k
With k-means, it is not possible for the cost function to sometimes increase. It should
always descend.

Random Initialization
There's one particular recommended method for randomly initializing our cluster centroids.
1. Have K<m. That is, make sure the number of your clusters is less than the number of
your training examples.
2. Randomly pick K training examples.
3. Set 1,,k equal to these K examples.

K-means can get stuck in local optima. To decrease the chance of this happening, one can run the
algorithm on many different random initializations.

//
fori=1to100:
randomlyinitializekmeans
runkmeanstoget'c'and'm'
computethecostfunction(distortion)J(c,m)
picktheclusteringthatgaveusthelowestcost

//

Choosing the Number of Clusters


Choosing K can be quite arbitrary and ambiguous.

J and the number of clusters K. The cost function should reduce as


we increase the number of clusters, and then flatten out. Choose K at the point where the cost
The elbow method: plot the cost
function starts to flatten out.
However, fairly often, the curve is very gradual, so there's no clear elbow.
Note: J will always decrease as
local optimum.

K is increased. The one exception is if k-means gets stuck at a bad

K is to observe how well k-means performs on a downstream purpose. In


other words, you choose K that proves to be most useful for some goal you're trying to achieve from
Another way to choose
using these clusters.
//

IMAGES
AND REFERENCES

You might also like