You are on page 1of 16

What is Soft Computing ?

(adapted from L.A. Zadeh)


Lecture 1 • Soft computing differs from conventional
(hard) computing in that, unlike hard
What is soft computing
computing, it is tolerant of imprecision,
Techniques used in soft uncertainty, partial truth, and
computing approximation. In effect, the role model for
soft computing is the human mind.

1 2

What is Soft Computing?


What is Hard Computing?
(Continued)
• Hard computing, i.e., conventional • The principal constituents, i.e., tools,
computing, requires a precisely stated techniques, of Soft Computing (SC) are
analytical model and often a lot of – Fuzzy Logic (FL), Neural Networks (NN),
computation time. Support Vector Machines (SVM), Evolutionary
Computation (EC), and
• Many analytical models are valid for ideal
cases. – Machine Learning (ML) and Probabilistic
Reasoning (PR)
• Real world problems exist in a non-ideal
environment.

3 4

1
Guiding Principles of Soft
Premises of Soft Computing
Computing
• The real world problems are pervasively • The guiding principle of soft computing is:
imprecise and uncertain – Exploit the tolerance for imprecision,
• Precision and certainty carry a cost uncertainty, partial truth, and approximation to
achieve tractability, robustness and low
solution cost.

5 6

Hard Computing Implications of Soft Computing


• Premises and guiding principles of Hard • Soft computing employs NN, SVM, FL etc, in a
Computing are complementary rather than a competitive way.
– Precision, Certainty, and rigor. • One example of a particularly effective
• Many contemporary problems do not lend combination is what has come to be known as
"neurofuzzy systems.”
themselves to precise solutions such as
• Such systems are becoming increasingly visible
– Recognition problems (handwriting, speech,
as consumer products ranging from air
objects, images
conditioners and washing machines to
– Mobile robot coordination, forecasting, photocopiers, camcorders and many industrial
combinatorial problems etc.
applications.

7 8

2
Current Applications using Soft
Unique Property of Soft computing
Computing
• Learning from experimental data • Application of soft computing to handwriting recognition
• Application of soft computing to automotive systems and
• Soft computing techniques derive their manufacturing
power of generalization from • Application of soft computing to image processing and
data compression
approximating or interpolating to produce • Application of soft computing to architecture
outputs from previously unseen inputs by • Application of soft computing to decision-support
using outputs from previous learned inputs systems
• Application of soft computing to power systems
• Generalization is usually done in a high- • Neurofuzzy systems
dimensional space. • Fuzzy logic control

9 10

Future of Soft Computing Overview of Techniques in Soft


(adapted from L.A. Zadeh) Computing
• Soft computing is likely to play an especially • Neural networks
important role in science and engineering, but
eventually its influence may extend much • Support Vector Machines
farther. • Fuzzy Logic
• Soft computing represents a significant
paradigm shift in the aims of computing • Genetic Algorithms in Evolutionary
– a shift which reflects the fact that the human mind, Computation
unlike present day computers, possesses a
remarkable ability to store and process information
which is pervasively imprecise, uncertain and lacking
in categoricity.

11 12

3
Neural Networks Overview Definitions of Neural Networks
• Neural Network Definition • According to the DARPA Neural Network Study
(1988, AFCEA International Press, p. 60):
• Some Examples of Neural Network
• ... a neural network is a system composed of
Algorithms and Architectures
many simple processing elements operating in
• Successful Applications parallel whose function is determined by network
structure, connection strengths, and the
processing performed at computing elements or
nodes.

13 14

Definitions of Neural Networks Definitions of Neural Networks


• According to Haykin (1994), p. 2: • According to Nigrin (1993), p. 11:
• A neural network is a massively parallel • A neural network is a circuit composed of
distributed processor that has a natural
a very large number of simple processing
propensity for storing experiential knowledge
and making it available for use. It resembles the
elements that are neurally based. Each
brain in two respects: element operates only on local
– Knowledge is acquired by the network through a information. Furthermore each element
learning process. operates asynchronously; thus there is no
– Interneuron connection strengths known as synaptic overall system clock.
weights are used to store the knowledge

15 16

4
Definitions of Neural Networks A Simple Neural Network
• According to Zurada (1992), p. xv:
• Artificial neural systems, or neural
networks, are physical cellular systems
which can acquire, store, and utilize
experiential knowledge.
The network has 2 inputs, and one output. All are binary.
The output is
1 if W0 *I0 + W1 * I1 + Wb > 0
0 if W0 *I0 + W1 * I1 + Wb <= 0
We want it to learn simple OR: output a 1 if either I0 or I1 is 1.

17 18

A Simple Neural Network A Simple Neural Network


• The simple Perceptron: We expose the net to the patterns:

• The network adapts as follows: change the I0 I1 Desired output


weight by an amount proportional to the
difference between the desired output and the
0 0 0
actual output. As an equation:
• ∆ Wi = η * (D-Y).Ii 0 1 1
• where η is the learning rate, D is the desired
output, and Y is the actual output. 1 0 1
• This is called the Perceptron Learning Rule, and
goes back to the early 1960's. 1 1 1

19 20

5
A Simple Neural Network A Simple Perceptron Network
• We train the network on these examples. Weights after The decision surface is a line in this case.
each epoch (exposure to complete set of patterns) This restricts the possible mappings
• At this point (8) the network has finished learning. Since available.
(D-Y)=0 for all patterns, the weights cease adapting.
Single perceptrons are limited in what they can learn:
• If we have two inputs, the decision surface is a line and
its equation is I1 = (W0/W1).I0 + (Wb/W1)
• In general, they implement a simple hyperplane decision
surface
• This restricts the possible mappings available.

21 22

Developments from the simple


Multilayer Perceptron (MLP)
perceptron
• Multi-layer perceptrons (MLPs) and Radial • The network topology is
constrained to be
Basis Function Networks (RBF) are both feedforward: i.e. loop-free
well-known developments of the - generally connections
Perceptron Learning Rule. are allowed from the input
layer to the first (and
• Both can learn arbitrary mappings or possibly only) hidden
classifications. Further, the inputs (and layer; from the first
hidden layer to the
outputs) can have real values second,..., and from the
last hidden layer to the
output layer.
23 24

6
Multilayer Perceptron Multilayer Perceptron
• The hidden layer learns to recode (or to • The units are a little more
complex than those in the
provide a representation for) the inputs. original perceptron: their
More than one hidden layer can be used. input/output graph is as
shown in the left.
• The architecture is more powerful than • Output Y as a function:
single-layer networks: it can be shown that Y = 1 / (1+ exp(-k.(Σ Win
* Xin))
any mapping can be learned, given two • The graph shows the
hidden layers (of units). output for k=0.5, 1, and
10, as the activation
varies from -10 to 10.

25 26

Training MLP Networks Training the MLP


• The weight change rule is a development • Running the network consists of
• Forward pass:
of the perceptron learning rule. Weights – the outputs are calculated and the error at the output units
are changed by an amount proportional to calculated.
• Backward pass:
the error at that unit times the output of the – The output unit error is used to alter weights on the output units.
unit feeding into the weight. Then the error at the hidden nodes is calculated (by back-
propagating the error at the output units through the weights),
and the weights on the hidden nodes altered using these values.
• For each data pair to be learned a forward pass and
backwards pass is performed. This is repeated over and
over again until the error is at a low enough level (or we
give up).
27 28

7
Radial Basis function Networks Training RBF Networks
• Radial basis function networks • Generally, the hidden unit
are also feedforward, but have function is a Gaussian:
only one hidden layer. • Gaussians with three different
• Like MLP, RBF nets can learn standard deviations as shown
arbitrary mappings: the primary in the left figure.
difference is in the hidden Training RBF Networks.
layer. • RBF networks are trained by
• RBF hidden layer units have a • deciding on how many hidden
receptive field which has a units there should be
centre: that is, a particular
input value at which they have • deciding on their centers and
a maximal output. Their output the sharpness (standard
tails off as the input moves deviation) of their Gaussians
away from this point.

29 30

Training up the output layer of


RBF Networks
RBF Networks
• Generally, the centers and SDs are decided on first by • RBFs have the advantage that one can add
examining the vectors in the training data. The output extra units with centers near parts of the input
layer weights are then trained using the Delta rule. Back
Propagation Network, i.e., MLP, is the most widely
which are difficult to classify
applied neural network technique. RBFs are gaining in • Both MLPs and RBFs can also be used for
popularity. processing time-varying data: one can consider
Nets can be a window on the data:
• trained on classification data (each output represents • Networks of this form (finite-impulse response)
one class), and then used directly as classifiers of new have been used in many applications.
data.
• trained on (x,f(x)) points of an unknown function f, and • There are also networks whose architectures
then used to interpolate. are specialised for processing time-series

31 32

8
Architectures for Processing Time-
Unsupervised networks
series
• Simple Perceptrons, • In an unsupervised net, the network adapts purely in
response to its inputs. Such networks can learn to pick
MLP, and RBF out structure in their input.
networks need a Applications for unsupervised nets
teacher to tell the • clustering data:
– exactly one of a small number of output units comes on in
network what the response to an input.
desired output should • reducing the dimensionality of data:
– data with high dimension (a large number of input units) is
be. These are compressed into a lower dimension (small number of output
supervised networks. units).
• Although learning in these nets can be slow, running the
trained net is very fast - even on a computer simulation
of a neural net.
33 34

Where are Neural Networks


Kohonen clustering Algorithm
applicable?
• takes a high-dimensional input, • in signature analysis:
and clusters it, but retaining – as a mechanism for comparing signatures made (e.g. in a bank)
some topological ordering of with those stored. This is one of the first large-scale applications
the output. of neural networks in the USA, and is also one of the first to use
• After training, an input will a neural network chip.
cause some the output units in
some area to become active. • in process control:
• Such clustering (and – there are clearly applications to be made here: most processes
dimensionality reduction) is cannot be determined as computable algorithms.
very useful as a preprocessing • in monitoring:
stage, whether for further – networks have been used to monitor
neural network data
• the state of aircraft engines. By monitoring vibration levels and
processing, or for more sound, early warning of engine problems can be given.
traditional techniques.
• British Rail have also been testing a similar application monitoring
diesel engines.

35 36

9
Support Vector Machines and
Neural Network Applications
Neural Networks
• Pen PC's
The learning machine that uses data to find the
– PC's where one can write on a tablet, and the APPROXIMATING FUNCTION (in regression
writing will be recognised and translated into problems) or the SEPARATION BOUNDARY (in
(ASCII) text. classification, pattern recognition problems), is
the same in high-dimensional situations.
• Speech and Vision recognition systems It is either the so-called SVM or the NN.
– Not new, but Neural Networks are becoming
increasingly part of such systems. They are
used as a system component, in conjunction
with traditional computers.
37 38

SVMs and NNs SVMs and NNs

39 40

10
SVMs and NNs Introduction of SVM

41 42

Background Knowledge Background Knowledge

43 44

11
Optimal Hyperplane Optimal Hyperplane (Cont’d)

45 46

Optimal Hyperplane (Cont’d) Support Vectors

47 48

12
Optimal Hyperplane (cont’d) Optimal Hyperplane (cont’d)

49 50

Optimization Problem Optimization Problem

51 52

13
Characteristics of Solutiions Geometrical Interpretation

53 54

Some Properties of SVM Linearly Nonseparable Patterns

55 56

14
Nonlinear Separation SVM: A Hyperplane Classifier

57 58

SVM: A Hyperplane Classifier

59 60

15
Some SVM Features
SVMs combine three important ideas
• Applying optimization algorithms from
Operations Research (Linear
Programming and Quadratic
Programming)
• Implicit feature transformation using
kernels
• Control of overfiting by maximizing the
margin
61

16

You might also like