You are on page 1of 17

Input normalization (Preprocessing)

• Small random values of weights for avoidance of


saturation.
• The connection weights from the inputs to a hidden unit
determine the orientation of the hyperplane. The bias
determines the distance of the hyperplane from the
origin.
• If the data are not centered at the origin, the hyperplane
may fail to pass through the data cloud.
•If all the inputs have a small coefficient of variation, it is
quite possible that all the initial hyperplanes will miss the
data entirely.
• To avoid saturation
• If the bias terms are all small random numbers, then
all the decision surfaces will pass close to the origin.
If the data are not centered at the origin, the decision
surfaces will not pass through the data points
‘prestd’ or ‘mapstd’ command in MATLAB
Consider an MLP with two inputs (X and Y)
and 100 hidden units.
It will be easy to learn a hyperplane passing through any part of these regions at any
angle.
Curse of Dimensionality

Example: Fisher Iris problem is a 3-class pattern recognition problem.


Assume that we are taking only one feature (x1), say sepal length.
If we are forced to work with a limited quantity of data then increasing the
dimensionality of the space rapidly leads to the point where the data is very
sparse, in which case it provides a very poor representation of the mapping.
Principal Component Analysis (PCA)

• Reduce the dimensionality of a data set


which consists of a large number of
interrelated variables by linearly transforming
the original data set to a new set of usually
fewer uncorrelated variables (PCs), while
retaining as much as possible of the variation
present in the original data set.
• The PC causing higher variation has more
impact on the observations, thus intuitively
more informational.
Mean, Standard Deviation and Variance
The average distance
from the mean of the data set to a point

Covariance
The covariance Matrix

Covariance is always measured between 2 dimensions. If we


have a data set with more than 2 dimensions, there is more than
one covariance measurement that can be calculated. For example,
from a 3 dimensional data set
Mean 1.81 and 1.91
Original data set
1.5

0.5

-0.5

-1

-1.5
-1.5 -1 -0.5 0 0.5 1 1.5
r'
ans =
-0.8280
1.7776
-0.9922
-0.2742
-1.6759
datared=(v(:,2)'*[xadj yadj]')' -0.9130
0.0991
v(:,2)‘=0.67787 0.735 1.1446
0.4381
1.2239
(v'*[xadj yadj]')'
Step 1: Get some data
Step 2: Subtract the mean
Step 3: Calculate the covariance matrix
Step 4: Calculate the eigenvectors and eigenvalues
of the covariance matrix
Step 5: Choosing components and forming a
feature vector
Step 6: Deriving the new data set

dataorig=(v'*datatrans')'+[xmean*ones(10,1) ymean*ones(10,1)]
A set of variables that define a projection that encapsulates the
maximum amount of variation in a dataset and is orthogonal (and
therefore uncorrelated) to the previous principal component of the
same dataset.

The blue lines represent 2 consecutive


principle components. Note that they are
orthogonal (at right angles) to each other.

You might also like