Soft Computing and Its Applications: January 2011

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/236268490
Soft Computing and its Applications
Chapter · January 2011

DOI: 10.4018/978-1-61692-797-4.ch001
CITATIONS READS
11 3,642
3 authors:
Siddhartha Bhattacharyya Ujjwal Maulik

Rajnagar Mahavidyalaya Jadavpur University
386 PUBLICATIONS 2,094 CITATIONS 475 PUBLICATIONS 12,566 CITATIONS
SEE PROFILE SEE PROFILE
Sanghamitra Bandyopadhyay
Indian Statistical Institute
434 PUBLICATIONS 14,249 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
machine translation View project
Advanced Machine Vision Paradigms for Medical Image Analysis View project
All content following this page was uploaded by Siddhartha Bhattacharyya on 16 May 2014.
The user has requested enhancement of the downloaded file.

1
Chapter 1
Soft Computing and
its Applications
Siddhartha Bhattacharyya
The University of Burdwan, India
Ujjwal Maulik
Jadavpur University, India
Sanghamitra Bandyopadhyay
Indian Statistical Institute, India
ABStRACt
Soft Computing is a relatively new computing paradigm bestowed with tools and techniques for handling
real world problems. The main components of this computing paradigm are neural networks, fuzzy logic
and evolutionary computation. Each and every component of the soft computing paradigm operates either
independently or in coalition with the other components for addressing problems related to modeling,
analysis and processing of data. An overview of the essentials and applications of the soft computing
paradigm is presented in this chapter with reference to the functionalities and operations of its con-
stituent components. Neural networks are made up of interconnected processing nodes/neurons, which
operate on numeric data. These networks posses the capabilities of adaptation and approximation. The
varied amount of uncertainty and ambiguity in real world data are handled in a linguistic framework
by means of fuzzy sets and fuzzy logic. Hence, this component is efficient in understanding vagueness
and imprecision in real world knowledge bases. Genetic algorithms, simulated annealing algorithm
and ant colony optimization algorithm are representative evolutionary computation techniques, which
are efficient in deducing an optimum solution to a problem, thanks to the inherent exhaustive search
methodologies adopted. Of late, rough sets have evolved to improve upon the performances of either
of these components by way of approximation techniques. These soft computing techniques have been
put to use in wide variety of problems ranging from scientific to industrial applications. Notable among
these applications include image processing, pattern recognition, Kansei information processing, data
mining, web intelligence etc.
DOI: 10.4018/978-1-61692-797-4.ch001
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
1. IntRoduCtIon discovery from databases, expert systems, induc-

tive reasoning and pattern recognition. The main
The field of Soft Computing is a synergistic inte- advantage of rough set theory in data analysis
gration of essentially three computing paradigms, is that it does not need any preliminary or addi-
viz. neural networks, fuzzy logic and evolutionary tional information about data – like probability
computation entailing probabilistic reasoning in statistics, or basic probability assignment in
(belief networks, genetic algorithms and chaotic Dempster-Shafer theory, grade of membership or
systems) to provide a framework for flexible the value of possibility in fuzzy set theory.
information processing applications designed
to operate in the real world. Bezdek [Bezdek92]
referred to this synergism as computational intelli- 2. nEuRAl nEtwoRkS
gence [Kumar2004]. Soft computing technologies
are robust by design, and operate by trading off A neural network is a powerful data-modeling
precision for tractability. Since they can handle tool that is able to capture and represent complex
uncertainty with ease, they conform better to real input/output relationships similar to a human brain.
world situations and provide lower cost solutions. Artificial neural networks resemble the human
The three components of soft computing differ brain in the following two ways:
from one another in more than one way. Neural
networks operate in a numeric framework, and are • A neural network acquires knowledge
well known for their learning and generalization through learning.
capabilities. Fuzzy systems [Zadeh65] operate in • A neural network’s knowledge is stored
a linguistic framework, and their strength lies in within inter-neuron connection strengths
their capability to handle linguistic information known as synaptic weights.
and perform approximate reasoning. The evolu-
tionary computation techniques provide powerful The true power and advantage of neural net-
search and optimization methodologies. All the works lie in their ability to represent both linear
three facets of soft computing differ from one and non-linear relationships and in their ability
another in their time scales of operation and in the to learn these relationships directly from the data
extent to which they embed a priori knowledge. being modeled.
Of late, rough set theory has come up as a
new mathematical approach to model imperfect Artificial neural network
knowledge, crucial to addressing problems in ar-
eas of artificial intelligence. Apart from the fuzzy An artificial neural network [Haykin99], as the
set theory pointed out in the previous paragraph, name suggests, is a parallel and layered intercon-
rough set theory proposed by Pawlak [Pawlak82] nected structure of a large number of artificial
presents still another attempt to handle real world neurons, each of which constitutes an elementary
uncertainties. The theory has attracted attention computational primitive. The distributed repre-
of many researchers and practitioners all over sentation of the interconnections through massive
the world, who have contributed essentially to parallelism achieved out of the inherent network
its development and applications. The rough set structure, bestows upon such networks properties
approach seems to be of fundamental importance of graceful degradation and fault tolerance. These
to artificial intelligence and cognitive sciences, network structures differ from one to another in
especially in the areas of machine learning, knowl- the topology of the underlying interconnections
edge acquisition, decision analysis, knowledge as well as on the target problem they are put to.
2
Since the essence of neural network operation is network. These functions recognize specific
based on the behavior of human brain, these net- range of input signals and selectively tune the
works require a form of training or learning ability. neurons to respond to the input signals according
Once these are trained with the different aspects some learning algorithm. Most of these activa-
of the problem at hand, they can be used to solve tion functions take an input as an infinite range
similar problems given the immense generaliza- of activations (-∞, +∞) and squashes/transforms
tion capabilities embedded therein. Depending on them in the finite range [0, 1] or {-1, 1} [Le-
the type of learning procedure adopted, different ondes98]. Thus, these functions are able to map
neural network architectures have evolved from the input information into bipolar excitations.
time to time [Haykin99, Kumar2004]. Though these functions may vary from neuron
In the most general form, an artificial neural to neuron within the network, yet most network
network is a layered structure of neurons. It com- architectures are field-homogeneous i.e. all the
prises seven essential components [Kumar2004], neurons within a layer are characterized by the
viz., (i) neurons, (ii) activation state vector, (iii) same signal function. Some of the common neural
activation function, (iv) connection topology, network signal functions [Kumar2004, Haykin99]
(v) activity aggregation rule, (vi) learning rule include (i) binary threshold, (ii) bipolar threshold,
and (vii) environment. These components are (iii) linear, (iv) liner threshold, (v) sigmoid, (vi)
discussed in the following sections. hyperbolic tangent, (vii) Gaussian, (viii) stochastic
[Kumar2004] etc.
Neurons
Connection Topology
Neurons are the processing units of a neural net-
work. There are basically three types of neurons This refers to the interconnection topology of
viz. input, hidden and output. The input neurons the neural network architectures. These connec-
are designated to accept stimuli from the external tions may be either excitatory (+) or inhibitory
world. The output neurons generate the network (-) or absent (0). These connections, or synapses
outputs. The hidden neurons, which are shielded basically house the memory of the network. The
from the external world, are entrusted with the behavior of neural network architecture is decided
computation of intermediate functions necessary by its connection topology.
for the operation of the network.
Activity Aggregation Rule
Activation State Vector
This rule aggregates the activities of the neurons
Neural network models operate in a real n- at a given layer. It is usually computed as the
dimensional vector space Rn. The activation state inner product of the input vector and the neuron
vector, X = (x1, x2, …, xn)T ∈ Rn, is a vector of the fan-in interconnection strength (weight) vector.
activation levels xi of the individual neurons of An activation rule thereby, determines the new
the network. This state vector acts as the driving activation level of a neuron based on its current
force for a neural network. activation and external inputs.
Activation Function Learning Rule
The characteristic activation functions are used The neural network learning rules define an
to supplement the learning process of a neural architecture-dependent procedure to encode
3
pattern information into inter-neuron intercon- update their interconnection weights dur-
nections. This is a data driven process executed ing the process of self-organization.
by modifying the connection weights. Two types
of learning are in vogue; viz. supervised learning Environment
and unsupervised learning.
The operational environment of neural networks
• Supervised learning: Supervised learning can be either deterministic (noiseless) or stochastic
encodes a behavioristic pattern into a neu- (noisy). A neural network N is a weighted directed
ral network by attempting to approximate graph, where the nodes are connected as either
the function that best describes the data set
employed. For an input vector, Xk ∈ Rn (a • a feedforward architecture, in which
real n-dimensional vector space) related the network has no loops. Examples in-
to an output vector Dk ∈ Rp (in a real p- clude the perceptron, multilayer percep-
dimensional vector space), a supervised tron [Duda73], support vector machines
learning algorithm aims at deriving an un- [Cortes95] and radial basis function
known mapping function f: Rn→ Rp. The networks [Broomhead88], Kohonen’s
algorithm tries to reduce the error (Dk-Sk) Self Organizing Feature Map (SOFM)
in the system response, where Sk is the ac- [Kohonen95] etc., or
tual response of the system by employing • a feedback (recurrent) architecture, in
the desired output response of the system which loops occur in the network because
Dk (also referred to as the teaching input) of feedback connections. Examples in-
and the associate of Xk. Thus, input-output clude the Hopfield network [Hopfield84],
sample pairs are used to train/teach the net- BSB model [Hui92], Boltzmann machines
work through a simple form of error cor- [Ackley85], bidirectional associative
rection learning or gradient descent weight memories [Kosko88], adaptive resonance
adaptation. Hence, the system generates an theory [Carpenter95] etc.
output Dk in response to an input Xk. The
learning process achieves an association The following sections discuss the basic phi-
between Dk and Xk, when a stimulus X’k losophy of the supervised learning environment
close to Xk elicits a response S’k sufficiently with reference to the simple artificial neuron and
close to Dk. the multilayer perceptron.
• Unsupervised learning: This paradigm
simply provides the system with an input Simple Artificial neuron
Xk, and allows it to self organize/self super-
vise its parameters to generate internal pro- The basic computational element of an artificial
totypes of the sample vectors. Such a para- neural network model is often referred to as a node
digm attempts to represent the entire data or unit. It receives inputs from some other units,
set by employing a smaller number of pro- or perhaps from an external source. The basic
totypical vectors. These prototypes are in a function of a single neuron is to add up its inputs,
state of continual updating as newer system and to produce an output if this sum is greater than
inputs enter into the system. This is often some value, known as the threshold value. The
driven by a complex competitive-cooper- basic neuron model, also known as “perceptron”
ative process where the individual neurons after Frank Rosenblatt [Haykin99, Rosenblatt58],
compete and cooperate with each other to is a binary classifier that maps real-valued vectored
4
Figure 1. A basic neuron

the simplest case, if f is the identity function the
unit’s output is just its net input. Thus, the node
acts as a linear unit.
A variant of this architecture compares the com-
puted sum to a certain value in the neuron called
the threshold value. The thresholding process is
accomplished by comparison of the computed
sum to a predefined threshold value. If the sum is
greater than the threshold value, then the network
outputs a 1. On the other hand, if the sum is lesser
than the threshold value, the network generates
inputs to a single binary output value. It generally an output of 0. An example threshold function is
possesses the following features: shown graphically in Figure 2.
Equivalently, the threshold value can be sub-
1. The output from a neuron is either on or off. tracted from the weighted sum, and the resulting
2. The output depends only on the inputs. A value then compared to zero. Depending on
certain number of neurons must be on (or whether the difference yields a positive or nega-
activated) at any one time in order to make tive result, the network outputs a 1 or 0. The
a neuron fire. The efficiency of the synapses output yi of the ith neuron can then be written as:
at coupling the incoming signal into a neuron
n
can be modeled by having a multiplicative yi = f (∑ wij y j − k ) (2)
factor on each of the inputs to the neuron. j =1
This multiplicative factor, often referred

to as the interconnection weight w, can be where, k the neuron’s bias or offset, and f is a
modified so as to model synaptic learning. step function (actually known as the Heaviside
A more efficient synapse, which transmits function) and
more of the signal, has a correspondingly
larger weight, whilst a weak synapse has a 1 for x > 0
smaller weight. Its output, in turn, can serve f ( x) =  (3)
0 for x ≤ 0
as input to other units. The basic neuron can 
be represented as shown in Figure 1.
Thus the threshold function produces only a
Expressed mathematically, if there are n inputs 1 or a 0 as the output, so that the neuron is either
with n associated weights on the input lines, then activated or not.
the ith unit computes some function f of the
weighted sum of its inputs, given by Learning Algorithm
n
yi = f (∑ wij y j ) (1) Since the single layer perceptron is a supervised
j =1 neural network architecture, it requires learning
of some a priori knowledge base for its operation.
The following algorithm illustrates the learning
where, wij refers to the weight from jthto the ith unit.
paradigm for a single layer perceptron. It com-
The function f is the unit’s activation function. In
prises the following steps.
5
Figure 2. A threshold function
• Initialization of the interconnection error for a specific choice of perceptron model

weights and thresholds randomly. parameters. Minimizing the loss function would
• Calculating the actual outputs by taking yield more accurate predicted outputs by the per-
the thresholded value of the weighted sum ceptron. This is solved by an iterative numerical
of the inputs. technique called gradient descent [Haykin99,
• Altering the weights to reinforce correct Snyman2005]. It comprises the following steps.
decisions and discourage incorrect deci-
sions, i.e. reducing the error. Choose some (random) initial values
for the network parameters
The weights are however, unchanged if the Repeat until G:=0
network makes the correct decision. Also the Calculate the gradient G of the
weights are not adjusted on input lines, which error function with respect to each
do not contribute to the incorrect response, since parameter
each weight is adjusted by the value of the input Change the parameters slowly in the
on that line xi, which would be zero (see Table 1). direction of the greatest rate of
In order to predict the expected outputs, a loss decrease of the error, i.e., in the
(also called objective or error) function E can be negative direction of G i.e., -G
defined over the model parameters to ascertain the End
error in the prediction process. A popular choice
for E is the sum-squared error given by Computation of the Gradient
E= ∑ (y i
− di )2 (4) During the training process of neural networks
i by the gradient descent mechanism, the gradient
G of the loss function (E) with respect to each
In words, it is the sum of the squared difference weight wijof the network is computed [Haykin99,
between the target value di and the perceptron’s Snyman2005]. This gradient indicates as to how
prediction yi (calculated from the input value xi) the small changes in the network weights affect
computed over all points i in the data set. For a the overall error E. Let, the loss function be rep-
linear model, the sum-squared error is a quadratic resented for each p training sample, as
function of the model parameters. The loss func-
tion E provides an objective measure of predictive
6
Table 1.
Begin
Initialize interconnection weights and threshold
Set wi(t=0), (0<= i <= n), to small random values Interconnection weights from input i at time t
Set w0 := k k is the bias in the output node
Set x0:=1 x is the input to the network
Present the inputs and the desired output to the network

Present x0, x1, x2, x3, ..., xn x0, x1, x2, x3, ..., xn are the inputs to the network
Present d(t) d(t) is the desired output
Calculate the actual output

n
y (t ) := f (∑ wi (t ) xi (t ))
i =1
Adapt interconnection weights

wi (t + 1) := wi (t ) ± xi (t )
End
E= ∑E p
(5) The first factor of equation 8 can be obtained
p
by differentiating equation 6.
where, δE
= −(d o − yo ) (9)
δyo
1
Ep = ∑ (dop − yop )2
2 o
(6)
Using yo = ∑w oj
y j , the second factor be-
j
o refers to the range of the output units of the comes

network. Then,
δyo δ
δE δ ∂E p
δwoi
=
δwoi
∑w oj
y j = yi (11)
G=
δwoi
=
δwoi
∑ E = ∑ δw
p
(7) j
p p oi
Hence,
Generalizing for all training samples and
decomposing the gradient into two factors using δE
chain rule, one gets = −(d o − yo ) yi (12)
δwoi
δE δE δyo
= (8) The gradient G for the entire data set can be
δwoi δyo δwoi
obtained by summing at each weight the contribu-
7
tion given by equation 12 over all the data points. classes is much more complex, the single layer
Then, a small proportion μ (called the learning rate) model fails abruptly.
of G is subtracted from the weights to achieve the
required gradient descent. Multilayer Perceptron
The Gradient Descent Algorithm In order to overcome the shortcomings of the single
layer perceptron model, the first and foremost
The loss function minimization procedure during way out is to resort to a multilayer model with the
the training of a neural network involves the com- threshold function slightly smoothed out to provide
putation of the gradient of the loss function with some information about the nonlinearly separable
time. The algorithm shown in Table 2 illustrates inputs [Haykin99]. This means that the network
the steps in determining the gradient for attain- will be able to adjust the weights as and when
ing the minimum of the loss function [Haykin99, required. A possible multilayer neural network
Snyman2005]. model comprising of three layers of nodes, viz.,
The algorithm terminates once the minimum the input layer node, the hidden layer node and the
of the error function, i.e., G=0 is reached. At this output layer node along with their characteristic
point the algorithm is said to have converged. activation functions and inter-connection weights,
An important consideration is the learning rate is shown in Figure 3. In the figure, an extra node
(μ), which determines by how much the weights with a nonlinear activation function has been
are changed at each step. If μ is too small, the inserted between input and output. Since such a
algorithm will take a long time to converge. Con- node is “hidden” inside the network, it is com-
versely, if μ is too large, the algorithm diverges monly referred to as a hidden unit. The hidden unit
leading to imprecise learning. also has a weight from the bias unit. In general,
However, the single layer perceptrons suffer all non-input neural network units have such bias
from several limitations. They learn a solution weights. For simplicity however, the bias unit and
if there is a possibility of finding it. They can weights are usually omitted from neural network
separate linearly separable classes easily enough, diagrams. The sole output layer node shown in
but in situations where the division between the the figure is characterized by a linear activation
function. Since the input layer node acts only as a
Table 2.
Begin
Initialize wij to small random values wij are the interconnection weights
Repeat until done
For each weight wij set Dwij:=0
For each data point (x, t)p
Set input units to x
Compute value of output units
For each weight wij set Dwij:= Dwij+(di - yi)yj
For each weight wij set wij:=wij+m Dwij m is the learning rate
End
8
Figure 3. Schematic diagram of a multilayer perceptron
receptor of information from the external world, Learning Algorithm

it is not driven by such activation mechanism.
As stated before, neural networks are often Since the target values for the hidden layers of
driven by nonlinear activation functions. The multilayer networks are known, the gradient
typical standard nonlinear hyperbolic tangent descent algorithm used for the training of linear
characteristic activation function is shown in networks (discussed earlier) cannot be applied to
Figure 4. Such a network model exhibits nonlin- train multilayer neural networks. This inherent
ear behavior since theoretical results indicate that problem led to the sudden downfall of the neural
given enough hidden units. networking paradigm after the 1950s until the
A multilayer neural network architecture like error backpropagation algorithm [Haykin99,
the one shown in Figure 5 can approximate any Rumelhart86, Chauvin95], or in short, backprop
nonlinear function to any required degree of ac- came to the rescue.
curacy. Hence, a multilayer neural network In principle, backprop provides a way to train
model is able to classify nonlinearly separable networks with any number of hidden units arranged
datasets. However, too many hidden layers can in any number of layers. The basic requirement
degrade the network’s performance [Kumar2004]. of the backprop algorithm is to ensure that the
network connection pattern must not contain any
Figure 4. Hyperbolic tangent function
9
Figure 5. Schematic diagram of a multilayer perceptron with multiple hidden nodes
cycles. Networks that respect this constraint are deh65] explains the varied nature of ambiguity
called feedforward networks and their connection and uncertainty that exist in the real world. This
pattern forms a directed acyclic graph or dag. is in sheer contradiction to the concept of crisp
For the purpose of training a multilayer feedfor- sets, where information is more often expressed
ward neural network by gradient descent, a training in quantifying propositions.
dataset consisting of pairs (x, d) is considered. The underlying concept behind the notion of
Here, vector x represents an input pattern to the fuzzy sets [Zadeh65] is that each and every observa-
network and vector d is the corresponding target tion, exists with a varied degree of containment in
output. The corresponding learning algorithm the universe of discourse. This degree of contain-
employing backprop is illustrated as shown in ment is referred to as the membership value of the
Table 3. observation. A fuzzy set is a mapping from an input
Neural networks are powerful and robust tools universe of discourse into the interval [0, 1] that
for the retrieval of incomplete data, finding pat- describes the membership of the input variable.
terns in datasets, and to mimic the human behavior This is referred to as fuzzification. The reverse
when it comes to the analysis and interpretation mechanism to revert to the crisp world is termed
of data. As such, they find wide use in processing, as defuzzification. Thus, it can be inferred that
retrieval and recognition of data patterns.
Whereas, crisp sets quantify quantities, fuzzy sets
qualify qualities.
3. Fuzzy SEtS And Fuzzy logIC
Fuzzy logic is a collection of conditional
It may be pointed out that much of the informa- statements or fuzzy rules, which form the basis
tion available in the real world exhibit vagueness, of the linguistic reasoning framework, which
imprecision and uncertainty. In fact, the fuzzy embodies representation of shallow knowledge.
sets approach fits in with the linguistic modes The fundamental atomic terms in this linguistic
of reasoning that are natural to human beings. or natural language-reasoning framework are
The fuzzy set theory, introduced by Zadeh [Za- often modified with adjectives or “linguistic
10
Table 3.
Begin
dE dE
Define d j := the error signal for unit j; Dwij := − the (negative) gradient for weight wij.
d net j d wij
Let Ai={j:$ wij} represents the set of preceding nodes to unit i and Pj={i: $ wij} represents the set of succeeding nodes
to unit j of the network.
Computation of the gradient

d E d netij
Dwij := − the first factor is the error of unit i.
d netij d wij
d netij d
The second factor,
d wij
:=
d wij
∑w
k∈ Ai
ik yk := y ' j
Dwij := d i y ' j
So,
Forward activation
Remark: The activity of the input units is determined by the network’s external input x. For all other units, the activity
is propagated forward as
yi := fi ( ∑ wij y j )
j∈ Ai
Remark: The activity of all the preceding nodes Ai to unit i must be known before calculating the activity of i.
Calculation of output error

1
E := ∑ (d o − yo ) 2
2 o ; So, d :=d -y the error for output unit o.
o o o
Error backpropagation
Remark: The output error is propagated back for deriving the errors of the hidden units in terms of the succeeding
nodes.
d E d neti d y j
d j := − ∑ the first factor is the error of node i.
i∈Pj d neti d y j d net j
d neti d d yj d f j (net j )
The second factor,
d yj
:=
d yj
∑w
k∈ Ak
ik yk := wij ; The third factor,
d net j
:=
d net j
:= f '(net j )
If, the hidden units are characterized by the tanh activation function, then
∑
f '(neth ) := 1 − yh2 ; So, d j := f '(net j ) d i wij
i∈Pj
End
hedges”. These linguistic hedges have the effect which is similar to natural language expressions,
of modifying the membership function for a basic can be written as
atom. The general form of a fuzzy rule [Zadeh65],
11
IF premise (antecedent) THEN conclusion (con- that represents distinct α-cuts of a given fuzzy set
sequent) A, is called a level set of A, i.e.,
It is generally referred to as the IF-THEN rule Λ A = {α | µ A ( x) = α, x ∈ U } (15)

based form. It typically expresses an inference
such that if a fact (premise, hypothesis or ante-
The support SA ∈ [0, 1], of such a fuzzy set A
cedent) is known, then another fact (conclusion
is defined as
or consequent) can be derived.
n µ A ( xi )
Fuzzy Set theoretic Concepts S A = {∑ : xi ∈ X ∀µ A ( xi ) > 0}
i =1 xi
As already stated, all the elements in the uni- (16)
verse of discourse X exhibit varying degrees of
membership. This membership or containment The core (CA) of a fuzzy set A represents all
of an element in a fuzzy set A is decided by a those constituent elements whose membership
characteristic membership function, μA(x) ∈ values are equal to unity, i.e.,
[0,1]. The closer is the membership value of an
element to unity, the stronger is the containment C A = { xi ∈ U | µ A ( xi ) = 1} (17)
of the element within the fuzzy set. Similarly, a
lower membership value implies a weaker con-
The bandwidth (BWA) of a fuzzy set A is ex-
tainment of the element within the set. A fuzzy
pressed as
set A, characterized by a membership function
μA(xi) and comprising elements xi,i = 1,2,3, ..., n,
BWA = { xi ∈ U | µ A ( xi ) ≥ 0.5} (18)
is mathematically expressed as
µ A ( xi ) Figure 6 provides a graphical representation

A= ∑ xi
; i = 1, 2, 3, ..., n (13) of the core, bandwidth, α-level and support of a
i
fuzzy set.
The maximum of all the membership values in
a fuzzy set A is referred to as the height (hgtA) of
where, å represents a collection of elements. the fuzzy set. If hgtA is equal to 1, then the fuzzy
i
set is referred to as a normal fuzzy set. If hgtA is
The resolution of a fuzzy set A is determined
less than 1, then it is referred to as a subnormal
by the α-cut (or α-level set) of the fuzzy set. It
fuzzy set. A normal fuzzy set is a superset of several
is a crisp set Aα containing all the elements of
nonempty subnormal fuzzy subsets.
the universal set U, that have a membership in A
A subnormal fuzzy subset (As) can be con-
greater than or equal to α, i.e.
verted to its normalized equivalent by means of
the normalization operator given by [Bhattacha-
Aα = { xi ∈ U | µ A ( xi ) ≥ α}, α ∈ [0, 1] (14) ryya2008]
If Aα = {x ε U| μA(x)>α}, then Aα is referred As ( x)

to as strong α-cut. The set of all levels α ∈ [0, 1] Norm A ( x ) = (19)
s
hgt A
s
12
Figure 6. Representation of fuzzy set concepts

Fuzzy Cardinality
The scalar cardinality of a fuzzy set A is the sum-

mation of the membership grades of all elements
of x in A. It is given by [Zadeh65, Ross95]
| A |= ∑µ A
( x) (25)
x∈U
where,U is the universe of discourse. When a

fuzzy set A has a finite support, its cardinality
can be defined as a fuzzy set. This fuzzy cardinal-
ity is denoted by |Af| and is defined by Zadeh as
[Zadeh65, Ross95]
The corresponding denormalization operation
is given by [Bhattacharyya2008] α
| Af |= ∑|A |
(26)
α ∈Λ A α
DeNorm A ( x ) = hgt A ( x ) × Norm A ( x ) (20)

s s s
where, α is the cut-off value, Aα is the α-level set of

In general, for a subnormal fuzzy subset with the fuzzy set and ΛA is the corresponding level set.
support, SAs ∈ [L, U], the normalization and the
denormalization operators are expressed as [Bhat- Fuzzy operators
tacharyya2008]
Several operators are used to form the “linguistic
As ( x) − L hedges” in fuzzy logic. These are (i) Concentra-
Norm A ( x ) = ; DeNorm A ( x ) = L + (U − L) × Norm A ( x )
s
U −L s s
tion (ii) Dilation and (iii) Intensification operators
(21) [Ross95, Bhattacharyya2006].
Fuzzy Set theoretic operations • Concentration: This operator tends to con-

centrate the elements of a fuzzy set by re-
The fuzzy union, intersection and complement ducing the degree of membership of those
operations [Zadeh65] on two fuzzy sets A, B for elements that are “partly” in the set. It is
an element x in the universe of discourse X, are expressed as
defined as
µ A ( x) = µ A ( x)2 for 0 ≤ µ A ( x) ≤ 1 (27)
Union : µ A∪ B ( x) = max[µ A ( x), µ B ( x)] (22)
• Dilation: This operator dilates or stretches

Intersection : µ A∩ B ( x) = min[µ A ( x), µ B ( x)] a fuzzy set by increasing the membership
(23) of elements that are “partly” in the set. It is
expressed as
Complement : µ A ( x) = 1 − µ A ( x) (24)
13
Figure 7. Fuzzy operators
Due to the ability of handling uncertainties,

µ A ( x) = d × µ A ( x) for 0 ≤ µ A ( x) ≤ 1
fuzzy sets and fuzzy logic play a crucial role in
(28) decision-making processes. Fuzzy sets and fuzzy
logic find application in modeling imprecise da-
where, d is the amount of dilation tasets, designing intelligently controlled systems,
quantifying the varied amount of ambiguities in
• Intensification: This operator acts as a the domains of signal processing and computer
combination of the concentration and dila- vision, etc. They are also used to quantify the
tion operators. It is expressed as inherent vagueness/ambiguities often encountered
in real life situations.

 2µ A ( x)2 for 0 ≤ µ A ( x) ≤ 0.5
INT ( A) = 

 1 − 2[1 − µ A ( x)]2 for 0.5 < µ A ( x) ≤ 1


(29) 4. HEuRIStIC SEARCH
tECHnIquES
Thus, intensification increases the contrast
between the elements, which have more than This computing paradigm employs several search
half-membership and those elements, which have and optimization algorithms based on Darwinian
less than half-membership. Figure 7 illustrates laws of biological evolution. Several evolutionary
the operations of concentration, dilation and in- algorithms are in vogue. These include (a) genetic
tensification for fuzzy linguistic hedges [Short, programming (GP), which evolve programs, (b)
Medium, Long] on a typical fuzzy set A. evolutionary programming (EP), which focuses
The resultant hedges are represented by [Some- on optimizing continuous functions without
what Short, Indeed Medium, Very Long]. It is recombination, (c) evolutionary strategies (ES),
evident from the figure that the fuzzy operators which focuses on optimizing continuous functions
tend to smooth out the hedges represented by the with recombination, and (d) genetic algorithms
membership functions [Short, Medium, Long]. (GAs), which focuses on optimizing general com-
The steeper slopes of the hedges [Short, Medium, binatorial problems. Most of these evolutionary
Long] are smoothed out to gradual variations. algorithms are characterized by a population of
This change is attributed to the manipulation of trial solutions and a collection of operators to act
the membership values of these hedges. on the population. The basic philosophy behind
14
these algorithms is to search the population space • Stochastic optimization, which is an um-
by the application of the embedded operators so as brella set of methods that includes simu-
to arrive at an optimal solution space. Generally, lated annealing and numerous other
two types of operators are used, viz. reproduc- approaches.
tion and evolution. The reproduction operator is
guided by a selection mechanism. The evolution The following sections discuss the operational
operator includes the crossover and mutation procedures of three popular heuristic search tech-
operators. The search technique is implemented niques, viz., genetic algorithms, simulated anneal-
through a series of iterations, whereby the differ- ing and ant colony optimization.
ent operators are applied in a loop on the initial
population. Each iteration is referred to as a gen- genetic Algorithms
eration. Each generation produces a new solution
space of parent individuals, which are selectively Genetic algorithms (GAs) [Goldberg89, Davis91,
chosen for participating in the next generation of Michal92, Bandyopadhyay2007a] are efficient,
the optimization procedure. The selection of the adaptive and robust multi-point search and op-
participating parents for the next generation is timization techniques guided by the principles
decided by a figure of merit, often referred to as of evolution and natural genetics. They provide
the objective function. This objective function is parallel near optimal and solutions of an objec-
entrusted with the evaluation of the fitness of the tive or fitness function in complex, large and
candidate solutions in a particular generation to multimodal landscapes. GAs are modeled on
qualify for the next generation of operations. Other the principles of natural genetic systems, where
notable and related search techniques include: the genetic information of each individual or
potential solution is encoded in structures called
• Quantum annealing [Apolloni89, chromosomes. They use some domain or problem
Das2005], which uses “quantum fluctua- dependent knowledge for directing the search in
tions” instead of thermal fluctuations to get more promising areas of the solution space; this
through high but thin barriers in the target is known as the fitness function. Each individual
function. or chromosome has an associated fitness function,
• Tabu search [Glover97], which normally which indicates its degree of goodness with respect
moves to neighboring states of lower en- to the solution it represents. Various biologically
ergy, but takes uphill moves when it finds inspired operators like selection, crossover and
itself stuck in a local minimum; and avoids mutation are applied on the chromosomes to yield
cycles by keeping a “taboo list” of solu- potentially better solutions.
tions already seen.
• Ant colony optimization (ACO) Basic Principles and Features
[Colorni91, Dorigo92], which uses many
ants (or agents) to traverse the solution A GA essentially comprises a set of individual
space and find locally productive areas. solutions or chromosomes (called the population)
• Harmony search, which mimics musicians and some biologically inspired operators that
in improvisation process where each mu- create a new (and potentially better) population
sician plays a note for finding a best har- from an old one. The different steps of a GA can
mony all together. be represented as follows.
15
Initialize the population Genetic Operators

Do
Encode the strings and compute The frequently used genetic operators are the se-
their fitness values lection, crossover and mutation operators. These
Reproduce/select strings to create are applied to a population of chromosomes to
new mating pool yield potentially new offspring. The operators are
Generate new population by cross- described in the following sections.
over and mutation
Loop while (not_termination) Selection
The selection/reproduction process copies indi-
The components of GAs are described in the vidual strings (called parent chromosomes) into a
following sections. tentative new population (known as mating pool)
for genetic operations. The number of copies that
Encoding Strategy and Population an individual receives for the next generation
is usually taken to be directly proportional to
To solve an optimization problem, GAs start with its fitness value thereby mimicking the natural
the chromosomal representation of a parameter selection procedure to some extent. This scheme
set which is to be encoded as a finite size string is commonly called the proportional selection
over an alphabet of finite length. For example, scheme. Roulette wheel parent selection, stochas-
the string tic universal selection, and binary tournament
selection [Goldberg89, Michal92] are some of
1 0 0 1 1 0 1 0 the most frequently used selection procedures.
In the commonly used elitist model of GAs, the
is a binary chromosome (string of 0’s and 1’s) best chromosome seen up to the last generation
of length 8. Each chromosome actually refers to is retained either in the population, or in a loca-
a coded possible solution. A set of such chromo- tion outside it.
somes in a generation is called a population, the
size of which may be constant or may vary from Crossover
one generation to another. The chromosomes in the The main purpose of crossover is to exchange
initial population are either generated randomly information between randomly selected parent
or using domain specific information. chromosomes by recombining parts of their
genetic information. It combines parts of two
Evaluation Technique parent chromosomes to produce offspring for
the next generation. Single point crossover is
The fitness function is chosen depending on the one of the most commonly used schemes. Here,
problem to be solved, in such a way that the strings first of all, the members of the selected strings in
(possible solutions) representing good points in the mating pool are paired at random. Then each
the search space have high fitness values. This is pair of chromosomes is subjected to crossover
the only information (also known as the payoff with a probability μc where an integer position
information) that GAs use while searching for k (known as the crossover point) is selected uni-
possible solutions. formly at random between 1 and l-1 (l>1 is the
string length). Two new strings are created by
swapping all characters from position (k+1) to l.
16
For example, let the two parents and the crossover the population size, probabilities of performing
points be as shown below. crossover (usually kept in the range 0.6 to 0.8) and
mutation (usually kept below 0.1) and the termina-
1 0 0 1 1 | 0 1 0 tion criteria. Moreover, one must decide whether
0 0 1 0 1 | 1 0 0 to use the generational replacement strategy where
the entire population is replaced by the new popula-
After crossover the offspring will be the fol- tion, or the steady state replacement policy where
lowing: only the less fit individuals are replaced. Most of
such parameters in GAs are problem dependent,
1 0 0 1 1 1 0 0 and no guidelines for their choice exist in the lit-
0 0 1 0 1 0 1 0 erature. Therefore, several researchers have also
kept some of the GA parameters variable and/or
Some other common crossover techniques are adaptive [Baker85, Srinivas94].
two-point crossover, multiple point crossover, The cycle of selection, crossover and muta-
shuffle-exchange crossover and uniform crossover tion is repeated a number of times till one of the
[Davis91]. following occurs:
Mutation 1. average fitness of a population becomes more

The main objective of mutation is to introduce or less constant over a specified number of
genetic diversity into the population. It may so generations,
happen that the optimal solution resides in a por- 2. desired objective function value is attained
tion of the search space, which is not represented by at least one string in the population,
in the population’s genetic structure. Hence, the 3. number of generations is greater than some
algorithm will therefore be unable to attain the predefined threshold.
global optima. In such a scenario, only mutation
can possibly direct the population to the optimal Simulated Annealing
section of the search space by randomly altering
the information in a chromosome. Mutating a Simulated annealing (SA) [Kirkpatrick83,
binary gene involves simple negation of the bit, Cerny85] is a probabilistic metaheuristic search
while that for real coded genes are defined in a technique useful for finding a good approximation
variety of ways [Eshelman93, Michal92]. to the global minimum of a given function in a
For example in binary bit-by-bit mutation every large search space. It is generally efficient with a
bit in a chromosome is subject to mutation with a discrete search space. In situations, which demand
probability μm. The result of applying the bit-by-bit an acceptably good solution in a fixed amount of
mutation on positions 3 and 7 of a chromosome time, SA has been found to be more effective than
is shown below. other exhaustive search techniques.
The term “simulated annealing” is derived
1 0 0 1 1 0 1 0 from the common annealing process in metallurgy
1 0 1 1 1 0 0 0 where subsequent heating and controlled cooling
of a material are performed to increase the size
Parameters of a Genetic Algorithm of its crystals. Heating excites the material atoms
and causes them to wander randomly in higher
There are several parameters in GAs that have energy states. Subsequent slow cooling settles
to be tuned by the user. Some among these are
17
them to configurations with lower internal energy which implies that the system may migrate to the
than the initial one. new state even when it has a higher energy than
Similarly, in the SA algorithm, each step re- the current one. This prevents the process from
places the current solution by a random neighbor- becoming stuck in a local minimum. When T goes
ing solution. The probability of choosing such a to zero, the acceptance probability tends to zero if
neighboring solution depends on the difference g’ > g. P however, attains a positive value if g’ < g.
between the corresponding energy function Thus, the system favors transitions that go
values and a global parameter T referred to as to lower energy values, and avoid those that go
the temperature. This temperature is gradually higher for sufficiently small values of T. When
decreased during the cooling process. Thus, the T becomes 0, the procedure will ensure making
current solution changes almost randomly when the migration only if it goes to lower energy.
T is large, but the rate of change of states goes Thus, it is clear that the evolution of the state
down as T is reduced. depends crucially on the temperature T. Roughly
speaking, the evolution of a state ρ is sensitive to
Overview of SA coarser changes of energy variations when T is
large, and to finer changes of energy variations
The basic objective of SA is to minimize the system when T is small.
internal energy function F(ρ), where ρ corresponds
to each point of the search space. Thus, it aims to The Annealing Schedule
bring the system, from an arbitrary initial state, to Another essential feature of the SA method is that
a state with the minimum possible energy. the temperature (T) should be gradually reduced as
For this purpose, SA considers some user the simulation proceeds [Kirkpatrick83, Cerny85].
specified neighbors ρ’ of the current state ρ, and Initially, T is set to a high value (i.e. ∞), and then
probabilistically decides between migrating the it is decreased at each step according to some
system to either state ρ’ or to retain in state ρ. annealing schedule. The user generally specifies
The probabilities are chosen such that the system this schedule for the decrement of T. However, it
ultimately migrates to lower energy states. This must be ensured that the schedule should be such
step is repeated until the system reaches a state that it would end up with T = 0 towards the end
that is a good approximation to the required one, of the annealing process.
or until a prespecified limit to the approximation Thus, the system is expected to migrate initially
has been reached. The following sections highlight towards a broader region of the search space con-
the important aspects of simulated annealing. taining good solutions ignoring smaller features of
the energy function in the process. Subsequently,
Acceptance Probabilities it would drift towards the lower energy regions
This is the probability of migrating from the that become narrower and narrower. Finally, the
current state ρ to a candidate new state ρ’. This system migrates downhill according to the steepest
is decided by an acceptance probability function descent heuristic. However, the pure version of SA
P(g, g’, T), where g = F(ρ) and g’ = F(ρ’) and T does not keep track of the best solution obtained
is the temperature (mentioned earlier). The ac- in terms of the lower energy levels attained at
ceptance probability is usually chosen such that any point of time.
the probability of allowing a transition decreases
when the difference (g’− g) increases. This means SA Pseudocode
that smaller uphill migrations are more likely than The pseudocode shown in Table 4 implements
the larger ones. P must be nonzero when g’ > g, the simulated annealing heuristic, as described
18
above, starting from state ρ0 and continuing to a implementation of the algorithm. In other words,
maximum of kmax steps or until a state with energy the diameter of the search graph must be small to
gmax or less is found. facilitate faster transitions between states. Hence,
the choice of the search graph diameter is an es-
Selection of Operating sential criterion for successful operation of the
Parameters of SA simulated annealing algorithm.
Several parameters need to be specified for the Transition Probabilities

application of SA. These include the state space, The migration from the current state ρ to the state
the energy function F, the candidate generator ρ’ is governed by another probability, viz., the
procedure, the acceptance probability function transition probability. The transition probability
P and the annealing schedule. The performance depends on the current temperature (Tc), on the
of SA depends on the suitability of the choice of order of candidate transitions and on the accep-
these parameters. The following sections throw tance probability function P.
some light on the selection of these parameters.
Efficient Candidate Generation
Search Graph Diameter It is evident that the current state is expected to have
Considering all the possible states of a simulated much lower energy than a random state after a few
annealing process to be the vertices of a graph with iterations of the SA algorithm. This observation is
the edges representing the candidate transitions, important in the selection of the candidate genera-
simulated annealing may be modeled as a search tor function. In practice, the generator function is
graph which aims to provide a sufficiently short sensitive towards those candidate migrates where
path from the initial state to any intermediate state the energy of the destination state ρ’ is likely to
or the global optimum state. This implies that the be similar to that of the current state. This implies
search space must be small enough for an efficient
Table 4.
Initialize state, energy and “best” solutions

r:=r0; g:=F(r); rbest:=r; gbest:=g initial and best states, initial and best energy
k:=0 starting point
Do
Neighborhood selection and energy computation
rnew:=neighbor(r) new neighboring state

gnew:=F(rnew) new energy
if gnew < gbest then
rbest:=rnew; gbest:=gnew best state and energy
if P(g, gnew, Tc(k/kmax)) > j then Tc is current temperature and j is a random number
r:=rnew; g:=gnew updated states and energy
k :=k + 1 next step
Loop while k < kmax and g > gmax
19
that those candidate states ρ’ for which P[F(ρ), gbest, respectively and the annealing schedule is
F(ρ’), T] is large should be opted for first. restarted. The decision to restart could be based
on a fixed number of steps, or based on the current
Avoidance of Getting Stuck to Local Minima energy being too high from the best energy so far.
Another aspect of the selection of the candidate
generator function is to reduce the number of local Ant Colony optimization
minima, which may come up during the anneal-
ing process. Otherwise, the SA algorithm may be The ant colony optimization algorithm (ACO) is
trapped in these minima with a high probability another probabilistic computational search tech-
for a very long time. The probability of occurrence nique useful for finding the best possible paths
of such traps is proportional to the number of in search graphs. It is a member of the family of
states achieved in the local minimal state of SA. ant colony algorithms, which are referred to as
The time of trapping is exponential on the energy swarm intelligence methods. Initially proposed
difference between the local minimal state and its by Marco Dorigo in 1992 in his PhD thesis [Co-
surrounding states. These requirements, however, lorni91, Dorigo92], the first algorithm was aimed
can be met by resorting to slighter changes to the at searching for an optimal path in a graph based
candidate generator function. on the behaviors exhibited by ants while searching
for food out of their colony.
Cooling Schedule
The simulated annealing algorithm assumes that Overview of ACO
the cooling rate is always low enough such that
the probability distribution of the current state In their quest for food, ants generally start search-
remains near the thermodynamic equilibrium at ing randomly through all possible paths that lead
all times. But the time required for attaining the to food. Once food is found, they return to their
equilibrium state (referred to as the relaxation time) colony leaving behind pheromone trails. If the
after a change in temperature strongly depends on following ants make the path, they do not hover
the nature of the energy function, the current tem- randomly, instead follow the trail to find food.
perature (Tc) as well as on the candidate generator. This pheromone trail however, starts to evapo-
Hence, there is no basis for selecting an ideal rate with time, thereby reducing its attractive
cooling rate for the algorithm. It should be esti- strength. The more time it takes for an ant to travel
mated and adjusted empirically for a particular down the path and back again, the more time the
problem. pheromones have to evaporate. A short path, by
However, this problem has been taken care of comparison, gets marched over faster, and thus
by the thermodynamic simulated annealing algo- the pheromone density remains high as it is laid
rithm, which adjusts the temperature at each step on the path as fast as it can evaporate.
based on the energy difference between the two The phenomenon of pheromone evaporation
states according to the laws of thermodynamics prevents the convergence of the algorithm to a
instead of applying any cooling schedule. local optimum. In absence of any pheromone
evaporation the following ants would always be
Restarting of SA attracted to the paths traversed by the first/leading
Sometimes it is better to revert back to a solution ant thereby leading to a constrained solution space.
that was significantly better rather than always This procedure adopted by real world ants is
migrate from the current state. This is called adapted in implementing the ant colony optimiza-
restarting. To do this ρ and g are set to ρbest and tion algorithm, which always leads to the short-
20
est one [Goss89, Deneubourg90] between two 1

∆ψijk = , where C is the cost/length of the kth
unequal length paths. This self-organized system C
adopted by ants, referred to as “Stigmergy”, is ant’s path. For all other ants, ∆ψijk = 0 .
characterized by both a positive feedback result-
Several variations of the ACO algorithm are in
ing out of the deposit of pheromone for attracting
vogue. These include the Elitist Ant System, the
other ants and a negative one resulting out of the
Max-Min Ant System (MMAS) [Stutzle2000], the
dissipation of the pheromone due to evaporation. A
proportional pseudo-random rule [Dorigo97] and
pseudocode for the ACO algorithm is listed below.
the Rank-Based Ant System (ASrank).
The evolutionary algorithms provide a platform
Do
for deriving at the global optimal solution to a
Generate_TrialSolutions()
problem by means of their searching capabilities.
Update_Pheromone()
Notable application areas include the traveling
Search_Paths()
salesman problem, determination of optimal
Loop while (not_termination)
clusters in a clustering algorithm etc.
The generation of the trial solutions in the ACO
algorithm by the Generate_TrialSolutions() pro-
5. RougH SEt tHEoRy
cedure involves the selection of subsequent nodes
in the search space. This process is referred to as
Rough set theory [Pawlak82] is a comparatively
edge selection. An ant will switch from node i to
new approach to explain vagueness/uncertainty
node j with probability
inherent in real world datasets. The rough set
theory, as an extension of the classical set theory,
ψijγ (t )κijδ (t ) stems from Frege’s idea of vagueness [Frege93].
pij = (30)
∑ψ γ
ij
(t )κijδ (t ) Instead of the concepts of membership as used in
fuzzy set theory, imprecision is expressed in rough
sets by a boundary region of a set.
where, ψij(t) is the amount of pheromone on edge(i, The underlying postulates of rough set theory
j) at time t, κij(ts the acceptability of edge(i, j) at are defined by means of two topological opera-
time t, γ and δ control the influence of parameters tions, viz., interior and closure, collectively re-
κij and ψij, respectively. ferred to as approximations.
The Update_Pheromone() procedure on Let, for a given a set of objects U (the universe
a given edge(i, j) would yield the amount of of discourse), an indiscernibility relation R ⊆ U ×
pheromone at the next instant of time (t+1), It is U exists, which can be used to represent the lack
determined as of knowledge about the elements of U. A subset P
of U can then be characterized by approximations
ψij (t + 1) = (1 − ε)ψij (t ) + ∆ψij (31) with respect to R as shown below.
where, ψij(t+1) is the amount of pheromone on • The lower approximation of P with respect
that edge at time (t+1), ε is the rate of pheromone to R is the set of all objects which can be
evaporation and Δψij is the amount of pheromone certainly classified as belonging to P with
deposited. For the kth ant traveling on edge(i, j), respect to R,
• the upper approximation of P with respect
to R is the set of all objects which can be
21
Figure 8. Rough set representation

• R-lower approximation of P is given as
R* ( x) =  {R( x) : R( x) ⊆ P}
x∈U
(32)
• R-upper approximation of P is given as
R* ( x) =  {R( x) : R( x) ∩ P ≠ ϕ}
x∈U
(33)
• R-boundary region of P is given as
RN R ( x) = R* ( x) − R* ( x) (34)
From the definition it is seen expressed in

terms of granules of knowledge [Polkowski2002,
Polkowski2001], the lower approximation of a set
is union of all granules entirely included in the
possibly classified as belonging to P with set. On the other hand, the upper approximation
respect to R and is the union of all granules having non-empty
• the boundary region of a P with respect to intersection with the set. The difference between
R (i.e. the rough region) is the set of all ob- the upper and lower approximations is the rough/
jects which can be neither be classified as boundary region of set. Figure 8 shows a generic
belonging to P nor as not belonging to P representation of a rough set.
with respect to R. Rough sets can be also defined in terms of the
rough membership function ( m RP ) [Pawlak94]
Given the aforestated concepts, it can be
instead of the lower and upper approximations.
inferred that P is a crisp set, i.e. it is exact with
The rough membership function indicates the
respect to R if the boundary region of P is empty.
degree with which x belongs to P given the knowl-
On the other hand, if the boundary region of P is
edge of x expressed by R. It is defined as
nonempty, P is a rough set, i.e. it is inexact with
respect to R. Thus, a set is defined as a rough set
| P ∩ R( x) |
if it has nonempty boundary region, otherwise it µ RP ( x) = ; µ RP ( x) ∈< 0, 1 >
is a crisp set. | R( x) |
The indiscernibility relation (R), as stated (35)
above, explains the lack of precise knowledge
about the elements of the universe. Equivalence where, |P| is the cardinality of P.
classes of R are referred to as the granules. The The lower and upper approximations and the
granules represent the elementary portion of boundary region of a set are then defined as
knowledge perceivable due to R.
For R(x), an equivalence class of R determined R* ( x) = { x ∈ U : µ RP ( x) = 1} (36)
by element x, the following definitions of approxi-
mations and the boundary region hold. R* ( x) = { x ∈ U : µ RP ( x) > 0} (37)
22
RN R ( x) = { x ∈ U : 0 < µ RP ( x) < 1} (38) Roth90, Scott97]. Several attempts [Amari88,

Fukushima80] have also been reported where
Rough sets have flourished over the years, self-organizing neural network architectures are
thanks to the approximating capabilities and the used for object extraction and pattern recognition.
non-requirement of any a priori knowledge regard- Chiu et al. [Chiu90] applied artificial neural
ing the problem domain. This feature of rough network architectures for the processing of photo-
sets has envisaged the use of these set theoretic grammetric targets. Carpenter et al. [Carpenter91}
concepts in several fields of engineering and applied self-organizing neural networks for the
scientific applications both with and without the recognition of patterns from images.
conjunction of fuzzy set theory. Neural networks of varying topology and as-
sisted by fuzzy set theory, have also been widely
used to deal with the problem of segmentation
6. APPlICAtIonS oF and clustering of image data, given the inherent
SoFt CoMPutIng features of susceptibility to dynamic environ-
ments [Baraldi2001, Leondes98, Tzeng98].
The field of soft computing has been successfully Tatem et al. [Tatem2001] applied a Hopfield
applied in a variety of real life applications. No- neural network for the purpose of identification
table among these include image preprocessing of super-resolution targets from remotely sensed
and enhancement, pattern recognition, image images. Charalampidis et al. [Charalampidis2001]
segmentation, image analysis and understanding, used a fuzzy ARTMAP to classify different noisy
image mining, Kansei information processing, signals. Details of image and pattern classification
networking, VLSI system design and testing, approaches using neural networks are available
engineering design, information retrieval etc. in [Zhang2000].
The following sections illustrate the notable Lin [Lin96] proposed an unsupervised parallel
applications of the soft computing paradigm to medical image segmentation technique using a
image processing, pattern recognition and Kansei fuzzy Hopfield neural network. In this approach,
information processing. fuzzy clustering is embedded into a Hopfield
neural network architecture for the purpose of
Soft Computing Applications segmentation of the images. In [Chen2003], a
to Image Processing and fuzzy neural network is used to classify synthetic
Pattern Recognition aperture radar (SAR) images using the statistical
properties of polarimetric data of the images. The
Neural networks have often been employed by images are clustered by the fuzzy c-means cluster-
researchers for dealing with the daunting tasks ing algorithm based on Wishart data distribution.
of extraction [Forrest88, Pham98, Hertz91, The clustered data is finally incorporated into the
Lippmann87, Haykin99}, classification [Ch- neural network for the purpose of classification.
ua88a, Chua88b, Antonucci94, Egmont2002, Boskovitz et al. [Boskovitz2002] developed an
Lippmann89, Pao89} of relevant object specific autoadaptive multilevel image segmentation and
information from redundant image information edge detection system using a neural network
bases, segmentation of image data [Tsao93, architecture similar to a multilayer perceptron. A
Bilbro88] and identification and recognition of fuzzy clustering technique is involved in selecting
objects from an image [Carpenter89, Parsi95, the labels required for the thresholding operation
Tang96, Perlovsky97, Abdallah95, Egmont99, on the image.
23
Efforts have also been made by inducing proper analysis of uncertainty and vagueness in
multivalued logical reasoning to the existing color image information. Color image segmen-
neural networks for the purpose of classification tation techniques involving fuzzy set theory and
of multidimensional data. Multilevel thresholding fuzzy logic are also available in the literature
ability on input data has been induced in discrete- [Gillet2002, Chung2003, Ito95, Choi95].
time neural networks by means of synthesis of
multi-level threshold functions [Si91]. Other soft Soft Computing Applications to
computing techniques like genetic algorithms have kansei Information Processing
also been widely used in the domain of medical
image segmentation. An extensive review is avail- Another noteworthy application area where soft
able in [Maulik2009]. computing has made a mark is Kansei information
Of late, the processing of multichannel in- processing [Schutte2004]. Kansei engineering
formation has assumed great importance mainly invented in the 1970s by Professor Mitsuo Na-
due to the evolving fields of remote sensing, GIS, gamachi of Hiroshima International University, is
biomedical imaging and multispectral data man- a method for translating feelings and impressions
agement. Color image segmentation is a classical into product parameters. Kansei engineering es-
example of multichannel information processing. sentially deals with the processing and manipula-
The complexity of the problem of color image tion of nonverbal information such as voice pitch,
segmentation and object extraction is mainly due facial expressions, gestures as well as verbal
to the variety of the color intensity gamut. Due to information. Kansei encompass the total concept
the inherent parallelism and ability of approxima- of senses, consciousness, and feelings that relate
tion, adaptation and graceful degradation, neural to human behavior in social living.
networks are also suited for addressing the problem Since Kansei information has subjectivity,
of color image processing. Lee et al. [Lee96] em- human linguistic understanding plays a central
ployed a CNN multilayer neural network structure role in Kansei information processing. Fuzzy set
for processing of color images following the RGB theory [Zadeh65] has been effectively applied for
color model. In this approach, each primary color explaining the ambiguity and vagueness in Kan-
is assigned to a unique CNN layer allowing paral- sei information [Yan2008]. In [Hayashida2002],
lel processing of color component information. Hayashida and Takagi proposed an evolution-
Zhu [Zhu94] however, transferred the RGB color ary computation (EC) and interactive EC (IEC)
space to Munsell color space and used an adaptive system for visualizing individuals in a multi-
resonance theory (ART) neural network to clas- dimensional searching space thereby enabling the
sify objects into recognition categories. Roska envisioning of the landscape of an n-dimensional
et al. [Roska93] also applied a multilayer CNN searching space. The proposed system has been
structure for handling the enormous amount of found to be efficient than that obtained with the
data involved in the processing of color images. conventional genetic algorithm. Unehara and
Self-organizing neural network architectures Yamada [Unehara2008] introduced an Interactive
[Moreira96, Wu2000] have also been used for Genetic Algorithm (IGA) for the designing of
segmentation and classification of color images. products’ shapes through evaluation by Kansei.
The vagueness in image information arising out of Other notable applications of soft computing in
the admixtures of the color components has often Kansei information processing can be found in
been dealt with the soft computing paradigm. In the literature [Onisawa2005, Nagata94].
[Chen95], Chen et al. applied fuzzy set theory for
24
other Soft Computing Applications uncertainties in data without even requiring any
a priori information regarding the data content
Recently, soft computing tools are being widely and distribution. These techniques are applied
used in the domain of data mining and bioinfor- extensively in a wide variety of engineering and
matics. In data mining, genetic algorithms, neural scientific problems with astounding results.
networks and fuzzy logic are widely used in solving
many problems of optimization, feature selection,
classification and clustering [Bigus96, Cox2005, REFEREnCES
Bandyopadhyay2005]. In the domain of bioinfor-
matics, soft computing tools have been used for Abdallah, M. A., Samu, T. I., & Grisson, W. A.
sequence alignment, fragment assembly, gene (1995). Automatic target identification using
and promoter identification, phylogenetic tree neural networks. SPIE Proceedings on Intelligent
analysis, prediction of gene regulatory network, Robots and Computer Vision XIV, 2588, 556–565.
protein structure and function prediction, protein Ackley, D. H., Hinton, G. E., & Sejnowski, T.
classification, molecule design and docking, to J. (1985). A learning algorithm for Boltzmann
name just a few [Bandyopadhyay2007a, Bandyo- Machines. Cognitive Science, 9, 147–169. doi:.
padhyay2007b]. Web intelligence is another recent doi:10.1207/s15516709cog0901_7
area that has seen a spurt of successful application
of soft computing tools [Bandyopadhyay2007a]. Amari, S. (1988). Mathematical theory of self-
organization in neural nets. In Seelen, W. V.,
Shaw, G., & Leinhos, U. M. (Eds.), Organization
7. ConCluSIon of Neural Networks: Structures and Models. New
York: Academic Press.
A brief overview of the essence and applications
Antonucci, M., Tirozzi, B., & Yarunin, N. D.
of the soft computing paradigm is presented in
(1994). Numerical simulation of neural networks
this chapter. This computing framework is es-
with translation and rotation invariant pattern
sentially built around several intelligent tools and
recognition. International Journal of Modern
techniques. Notable among them are the neural net-
Physics B, 8(11-12), 1529–1541. doi:.doi:10.1142/
works, fuzzy sets, fuzzy logic, genetic algorithms
S0217979294000658
and rough sets. These tools form the backbone of
this widely used computing paradigm meant for Apolloni, B., Caravalho, C., & De Falco, D. (1989).
analysis and understanding of knowledge bases. Quantum stochastic optimization. Stochastic Pro-
Neural networks are used for the analysis of cesses and their Applications, 33, 233–244. doi:.
numeric data. These networks comprise intercon- doi:10.1016/0304-4149(89)90040-9
nected processing units or nodes in different to-
Baker, J. E. (1985). Adaptive selection methods
pologies depending on the problem at hand. Fuzzy
for genetic algorithms In J. J. Grefenstette (Ed.),
sets and fuzzy logic deal with the uncertainties
Proceedings of 1st International Conference on
inherent in data under consideration by represent-
Genetic Algorithms (pp. 101-111). Hillsdale, NJ:
ing the degree of vagueness in terms of linguistic
Lawrence Erlbaum Associates.
information. Genetic algorithm, simulated an-
nealing algorithm and ant colony optimization Bandyopadhyay, S., Maulik, U., Holder, L., &
algorithm are random search techniques meant Cook, D. (Eds.). (2005). Advanced Methods
for arriving at an optimum probable solution to for Knowledge Discovery from Complex Data.
a problem. Rough sets also handle the underlying London: Springer.
25
Bandyopadhyay, S., Maulik, U., & Wang, J. T. Boskovitz, V., & Guterman, H. (2002). An adap-
L. (Eds.). (2007). Analysis of Biological Data: tive neuro-fuzzy system for automatic image
A Soft Computing Approach. Singapore: World segmentation and edge detection. IEEE Trans-
Scientific. actions on Fuzzy Systems, 10(2), 247–262. doi:.
doi:10.1109/91.995125
Bandyopadhyay, S., & Pal, S. K. (2007). Classi-
fication and Learning Using Genetic Algorithms: Broomhead, D. S., & Lowe, D. (1988). Multivari-
Application in Bioinformatics and Web Intel- ate functional interpolation and adaptive networks.
ligence. Germany: Springer. Complex Systems., 2, 321–355.
Baraldi, A., Binaghi, E., Blonda, P., Brivio, P. A., Carpenter, G. A. (1989). Neural network mod-
& Rampini, A. (2001). Comparison of the multi- els for pattern recognition and associative
layer perceptron with neuro-fuzzy techniques in memory. Neural Networks, 2(4), 243–258. doi:.
the estimation of cover class mixture in remotely doi:10.1016/0893-6080(89)90035-X
sensed data. IEEE Transactions on Geoscience
Carpenter, G. A., & Grossberg, S. (1991). Pattern
and Remote Sensing, 39(5), 994–1005. doi:.
recognition by self-organizing neural networks.
doi:10.1109/36.921417
Cambridge, MA: MIT Press.
Bezdek, J. C. (1992). On the relationship be-
Carpenter, G. A., & Ross, W. D. (1995). ART-
tween neural networks, pattern recognition and
EMAP: A neural network architecture for object
intelligence. International Journal of Approxi-
recognition by evidence accumulation. IEEE
mate Reasoning, 6, 85–107. doi:10.1016/0888-
Transactions on Neural Networks, 6(4), 805–818.
613X(92)90013-P
doi:.doi:10.1109/72.392245
Bhattacharyya, S., & Dutta, P. (2006). Designing
Cerny, V. (1985). A thermodynamical approach to
pruned neighborhood neural networks for object
the traveling salesman problem: an efficient simu-
extraction from noisy background. Journal of
lation algorithm. Journal of Optimization Theory
Foundations of Computing and Decision Sciences,
and Applications, 45, 41–51. doi:.doi:10.1007/
31(2), 105–134.
BF00940812
Bhattacharyya, S., Dutta, P., & Maulik, U. (2008).
Charalampidis, D., Kasparis, T., & Georgiopoulos,
Self Organizing Neural Network (SONN) based
M. (2001). Classification of noisy signals using
gray scale object extractor with a Multilevel Sig-
fuzzy ARTMAP neural networks. IEEE Transac-
moidal (MUSIG) activation function. Journal of
tions on Neural Networks, 12(5), 1023–1036. doi:.
Foundations of Computing and Decision Sciences,
doi:10.1109/72.950132
33(2), 131–165.
Chauvin, Y., & Rumelhart, D. E. (1995). Backpropa-
Bigus, J. P. (1996). Data Mining With Neural
gation: theory, architectures, and applications.
Networks: Solving Business Problems from Ap-
Hillsdale, NJ: L. Erlbaum Associates Inc.
plication Development to Decision Support. New
York: Mcgraw-Hill. Chen, C. T., Chen, K. S., & Lee, J. S. (2003). Fuzzy
neural classification of SAR images. IEEE Transac-
Bilbro, G. L., White, M., & Synder, W.(n.d.). Image
tions on Geoscience and Remote Sensing, 41(9),
segmentation with neurocomputers. In Eckmiller,
2089–2100. doi:.doi:10.1109/TGRS.2003.813494
R., & Malsburg, C. V. D. (Eds.), Neural comput-
ers. New York: Springer-Verlag.
26
Chen, Y., Hwang, H., & Chen, B. (1995). Color Davis, L. (Ed.). (1991). Handbook of Genetic
image analysis using fuzzy set theory. Proceedings Algorithms. New York: Van Nostrand Reinhold.
of International Conference on Image Processing.
Deneubourg, J.-L., Aron, S., Goss, S., & Pasteels,
(pp. 242-245).
J.-M. (1990). The self-organizing exploratory
Chiu, W. C., Hines, E. L., Forno, C., Hunt, R., & pattern of the Argentine ant. Journal of Insect
Oldfield, S. (1990). Artificial neural networks for Behavior, 3, 159. doi:.doi:10.1007/BF01417909
photogrammetric target processing. SPIE Pro-
Dorigo, M. (1992). Optimization, Learning and
ceedings on Close-Range Photogrammetry Meets
Natural Algorithms. PhD thesis. Politecnico di
Machine Vision, 1395(2), 794–801.
Milano, Italie.
Choi, Y., & Krishnapuran, R. (1995). Image en-
Dorigo, M., & Gambardella, L. M. (1997). Ant
hancement based on fuzzy logic. Proceedings of
Colony System: A Cooperative Learning Ap-
International Conference on Image Processing.
proach to the Traveling Salesman Problem. IEEE
(pp. 167-170).
Transactions on Evolutionary Computation, 1(1),
Chua, L. O., & Yang, L. (1988). Cellular net- 53–66. doi:.doi:10.1109/4235.585892
work: Applications. IEEE Transactions on
Duda, R. O., & Hart, P. E. (1973). Pattern clas-
Circuits and Systems, 35(10), 1273–1290. doi:.
sification and scene analysis. New York: Wiley.
doi:10.1109/31.7601
Egmont-Petersen, M., & Arts, T. (1999). Rec-
Chua, L. O., & Yang, L. (1988). Cellular
ognition of radiopaque markers in X-ray images
network: Theory. IEEE Transactions on Cir-
using a neural network as nonlinear filter. Pat-
cuits and Systems, 35(10), 1257–1282. doi:.
tern Recognition Letters, 20(5), 521–533. doi:.
doi:10.1109/31.7600
doi:10.1016/S0167-8655(99)00024-0
Chung, F., & Fung, B. (2003). Fuzzy color quan-
Egmont-Petersen, M., de Ridder, D., & Handels, H.
tization and its application to scene change detec-
(2002). Image processing using neural networks - a
tion. Proceedings of MIR, 03, 157–162.
review. Pattern Recognition, 35(10), 2279–2301.
Colorni, A., Dorigo, M., & Maniezzo, V. (1991). doi:.doi:10.1016/S0031-3203(01)00178-9
Distributed Optimization by Ant Colonies, actes
Eshelman, L. J., & Schaffer, J. D. (1993). Real-
de la première conférence européenne sur la
coded genetic algorithms and interval schemata.
vie artificielle (pp. 134–142). France: Elsevier
In L. Whitley (Ed.), Foundations of Genetic
Publishing.
Algorithms (pp 187-202). 2, San Mateo, CA:
Cortes, C., & Vapnik, V. N. (1995). Support vector Morgan Kaufmann.
networks. Machine Learning, 20, 273–297. doi:.
Forrest, B. M. (1988). Neural network models. Par-
doi:10.1007/BF00994018
allel Computing, 8, 71–83. doi:.doi:10.1016/0167-
Cox, E. (2005). Fuzzy Modeling and Genetic Al- 8191(88)90110-X
gorithms for Data Mining and Exploration. San
Frege, G. (1893). Grundlagen der Arithmetik (Vol.
Francisco: Morgan Kaufmann.
2). Jena: Verlag von Herman Pohle.
Das, A., & Chakrabarti, B. K. (Eds.). (2005).
Quantum Annealing and Related Optimization
Methods. Lecture Note in Physics (Vol. 679).
Heidelberg: Springer. doi:10.1007/11526216
27
Fukushima, K. (1980). Neocognitron: A self- Ito, N., Shimazu, Y., Yokoyama, T., & Matushita,
organizing multilayer neural network model for Y. (1995). Fuzzy logic based non-parametric color
a mechanism of pattern recognition unaffected image segmentation with optional block processing
by shift in position. Biological Cybernetics, 36, (pp. 119–126). ACM.
193–202. doi:.doi:10.1007/BF00344251
Kamgar-Parsi, B. (1995). Automatic target extrac-
Gillet, A., Macaire, L., Lecocq, C. B., & Postaire, tion in infrared images (pp. 143–146). NRL Rev.
J. G. (2002). Color image segmentation by analy-
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P.
sis of 3D histogram with fuzzy morphological
(1983). Optimization by Simulated Annealing.
filters. Studies in Fuzziness and Soft Computing,
Science. New Series 220., 4598, 671–680.
122, 153–177.
Kohonen, T. (1995). Self-organizing maps. Spring-
Glover, F., & Laguna, M. (1997). Tabu Search.
er Series in Information Sciences, 30.
Norwell, MA: Kluwer.
Kosko, B. (1988). Bidirectional associative memo-
Goldberg, D. E. (1989). Genetic Algorithms:
ries. IEEE Transactions on Systems, Man, and Cy-
Search, Optimization and Machine Learning.
bernetics, 18(1), 49–60. doi:.doi:10.1109/21.87054
New York: Addison-Wesley.
Kumar, S. (2004). Neural networks: A classroom
Goss, S., Aron, S., Deneubourg, J.-L., & Pasteels,
approach. New Delhi: Tata McGraw-Hill.
J.-M. (1989). The self-organized exploratory pat-
tern of the Argentine ant. Naturwissenschaften, Lee, C.-C., & de Gyvez, J. P. (1996). Color image
76, 579–581. doi:.doi:10.1007/BF00462870 processing in a cellular neural-network environ-
ment. IEEE Transactions on Neural Networks, 7(5),
Hayashida, N., & Takagi, H. (2002). Acceleration
1086–1098. doi:.doi:10.1109/72.536306
of EC convergence with landscape visualization
and human intervention. doi:10.1016/S1568- Leondes, C. T. (1998). Neural network techniques
4946(01)00023-0. and applications. In Image processing and pattern
recognition. New York: Academic Press.
Haykin, S. (1999). Neural networks: A compre-
hensive foundation (2nd ed.). Upper Saddle River, Lin, J.-S., Cheng, K.-S., & Mao, C.-W. (1996).
NJ: Prentice Hall. A fuzzy Hopfield neural network for medi-
cal image segmentation. IEEE Transactions
Hertz, J., Krogh, A., & Palmer, R. G. (1991).
on Nuclear Science, 43(4), 2389–2398. doi:.
Introduction to the theory of neural computation.
doi:10.1109/23.531787
Reading, MA: Addison-Wesley.
Lippmann, R. P. (1987). An introduction to com-
Hopfield, J. J. (1984). Neurons with graded response
puting with neural nets. IEEE ASSP Magazine,
have collective computational properties like those
3–22.
of two state neurons. In Proceedings of Nat (pp.
3088–3092). U. S: Acad. Sci. Lippmann, R. P. (1989). Pattern classification
using neural networks. IEEE Communications
Hui, S., & Zak, S. H. (1992). Dynamical analysis
Magazine, 27, 47–64. doi:.doi:10.1109/35.41401
of the Brain-State-in-a-Box (BSB) neural model.
IEEE Transactions on Neural Networks, 3, 86–94. Maulik, U. (2009). Medical Image Segmentation
doi:.doi:10.1109/72.105420 using Genetic Algorithms. IEEE Transactions
on Information Technology in Biomedicine, 13,
166–173. doi:.doi:10.1109/TITB.2008.2007301
28
Michalewicz, Z. (1992). Genetic Algorithms + Polkowski, L., & Skowron, A. (2001). Rough
Data Structures = Evolution Programs. New mereological calculi granules: a rough set ap-
York: Springer-Verlag. proach to computation. International Journal of
Computational Intelligence, 17, 472–479.
Moreira, J., & Costa, L. D. F. (1996). Neural-based
color image segmentation and classification us- Rosenblatt, F. (1958). The Perceptron: A Probabi-
ing self-organizing maps (pp. 47–54). Anais do listic Model for Information Storage and Organi-
IX SIBGRAPI. zation in the Brain. Cornell Aeronautical Labora-
tory. Psychological Review, 65(6), 386–408. doi:.
Nagata, T., Kakihara, K., Ohkawa, T., & Tobita,
doi:10.1037/h0042519
N. (1994). Concept Space Generation Oriented
Design Using Kansei by Individual Subjectivity. Roska, T., Zarandy, A., & Chua, L. O. (1993).
Journal of IEEJ, 116(4). Color image processing using multilayer CNN
structure. In Didiev, H. (Ed.), Circuit theory and
Onisawa, T., & Unehara, M. (2005). Application
design. New York: Elsevier.
of Interactive Genetic Algorithm toward Human
Centered System. Journal of SICE, 44(1), 50–57. Ross, T. J., & Ross, T. (1995). Fuzzy logic with
engineering applications. New York: McGraw
Pao, Y. H. (1989). Adaptive pattern recognition
Hill College Div.
and neural networks. New York: Addison-Wesley.
Roth, M. W. (1990). Survey of neural network
Pawlak, Z. (1982). Rough sets. International
technology for automatic target recognition. IEEE
Journal of Computer and Information Sciences,
Transactions on Neural Networks, 1(1), 28–43.
11, 341–356. doi:.doi:10.1007/BF01001956
doi:.doi:10.1109/72.80203
Pawlak, Z., & Skowron, A. (1994). Rough mem-
Rumelhart, D. E., Hinton, G. E., & Williams,
bership function. In Yeager, R. E., Fedrizzi, M., &
R. J. (1986). Learning representations by back-
Kacprzyk, J. (Eds.), Advances in the Dempster-
propagating errors. Nature, 323, 533–536. doi:.
Schafer Theory of Evidence (pp. 251–271). New
doi:10.1038/323533a0
York: Wiley.
Schutte, S., Eklund, J., Axelsson, J. R. C., &
Perlovsky, L. I., Schoendor, W. H., & Burdick,
Nagamachi, M. (2004). Concepts, methods and
B. J. (1997). Model-based neural network for
tools in Kansei Engineering. Theoretical Issues
target detection in SAR images. IEEE Transac-
in Ergonomics Science, 5(3), 214–232. doi:.
tions on Image Processing, 6(1), 203–216. doi:.
doi:10.1080/1463922021000049980
doi:10.1109/83.552107
Scott, P. D., Young, S. S., & Nasrabadi, N. M. (1997).
Pham, D. T., & Bayro-Corrochano, E. J. (1998).
Object recognition using multilayer Hopfield neural
Neural computing for noise filtering, edge detec-
network. IEEE Transactions on Image Processing,
tion and signature extraction. Journal of Systems
6(3), 357–372. doi:.doi:10.1109/83.557336
Engineering, 2(2), 666–670.
Si, J., & Michel,A. N. (1991). Analysis and synthesis
Polkowski, L. (2002). Rough Sets, Mathemati-
of discrete-time neural networks with multi-level
cal Foundations. Advances in Soft Computing,
threshold functions. Proceedings of IEEE Interna-
Physica – Verlag, A Springer-Verlag Company.
tional Symposium on Circuits.
29
Snyman, J. A. (2005). Practical Mathematical Op- Unehara, M., & Yamada, K. (2008). Interactive
timization: An Introduction to Basic Optimization Conceptual Design Support System Using Hu-
Theory and Classical and New Gradient-Based man Evaluation with Kansei. Proceesings of 2nd
Algorithms. New York: Springer Publishing. International Conference on Kansei Engineering
and Affective Systems, (pp. 175-180).
Srinivas, M., & Patnaik, L. M. (1994). Adap-
tive probabilities of crossover and mutation in Wu, Y., Liu, Q., & Huang, T. S. (2000). An adap-
genetic algorithm. IEEE Transactions on Sys- tive self-organizing color segmentation algorithm
tems, Man, and Cybernetics, 24, 656–667. doi:. with application to robust real-time human hand
doi:10.1109/21.286385 localization. Proceedings of Asian Conference on
Computer Vision.
Stutzle, T., & Hoos, H. H. (2000). MAX MIN Ant
System. Future Generation Computer Systems, 16, Yan, H.-B., Huynh, V.-N., Murai, T., & Nakamori,
889–914. doi:10.1016/S0167-739X(00)00043-1 Y. (2008). Kansei evaluation based on prioritized
multi-attribute fuzzy target-oriented decision
Tang, H. W., Srinivasan, V., & Ong, S. H. (1996).
analysis. International Journal of Information
Invariant object recognition using a neural template
Sciences, 178(21), 4080–4093.
classifier. Image and Vision Computing, 14(7),
473–483. doi:.doi:10.1016/0262-8856(95)01065-3 Zadeh, L. A. (1965). Fuzzy sets. Information and
Control, 8, 338–353. doi:.doi:10.1016/S0019-
Tatem, A. J., Lewis, H. G., Atkinson, P. M., &
9958(65)90241-X
Nixon, M. S. (2001). Super-resolution target
identification from remotely sensed images using Zhang, G. P. (2000). Neural networks for clas-
a Hopfield neural network. IEEE Transactions on sification: a survey. IEEE Transactions on
Geoscience and Remote Sensing, 39(4), 781–796. Systems, Man and Cybernetics. Part C, Ap-
doi:.doi:10.1109/36.917895 plications and Reviews, 30(4), 451–462. doi:.
doi:10.1109/5326.897072
Tsao, E. C. K., Lin, W. C., & Chen, C.-T. (1993).
Constraint satisfaction neural networks for image Zhu, Z. (1994). Color pattern recognition in an
recognition. Pattern Recognition, 26(4), 553–567. image system with chromatic distortion. Optical
doi:.doi:10.1016/0031-3203(93)90110-I Engineering (Redondo Beach, Calif.), 33(9),
3047–3051. doi:.doi:10.1117/12.177509
Tzeng, Y. C., & Chen, K. S. (1998). A fuzzy neu-
ral network to SAR image classification. IEEE
Transactions on Geoscience and Remote Sensing,
36(1), 301–307. doi:.doi:10.1109/36.655339
30
View publication stats

Soft Computing and Its Applications: January 2011

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Soft Computing and Its Applications: January 2011

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Soft Computing and its Applications

Chapter · January 2011

Siddhartha Bhattacharyya Ujjwal Maulik

SEE PROFILE SEE PROFILE

machine translation View project

The user has requested enhancement of the downloaded file.

1. IntRoduCtIon discovery from databases, expert systems, induc-

Activation Function Learning Rule

Figure 1. A basic neuron

This multiplicative factor, often referred

Figure 2. A threshold function

• Initialization of the interconnection error for a specific choice of perceptron model

Present the inputs and the desired output to the network

Calculate the actual output

Adapt interconnection weights

o refers to the range of the output units of the comes

Figure 3. Schematic diagram of a multilayer perceptron

receptor of information from the external world, Learning Algorithm

Figure 4. Hyperbolic tangent function

Figure 5. Schematic diagram of a multilayer perceptron with multiple hidden nodes

Computation of the gradient

Calculation of output error

It is generally referred to as the IF-THEN rule Λ A = {α | µ A ( x) = α, x ∈ U } (15)

µ A ( xi ) Figure 6 provides a graphical representation

If Aα = {x ε U| μA(x)>α}, then Aα is referred As ( x)

Figure 6. Representation of fuzzy set concepts

The scalar cardinality of a fuzzy set A is the sum-

where,U is the universe of discourse. When a

DeNorm A ( x ) = hgt A ( x ) × Norm A ( x ) (20)

where, α is the cut-off value, Aα is the α-level set of

Fuzzy Set theoretic operations • Concentration: This operator tends to con-

• Dilation: This operator dilates or stretches

Figure 7. Fuzzy operators

Due to the ability of handling uncertainties,

Initialize the population Genetic Operators

Mutation 1. average fitness of a population becomes more

Several parameters need to be specified for the Transition Probabilities

Initialize state, energy and “best” solutions

rnew:=neighbor(r) new neighboring state

est one [Goss89, Deneubourg90] between two 1

Figure 8. Rough set representation

• R-upper approximation of P is given as

• R-boundary region of P is given as

From the definition it is seen expressed in

RN R ( x) = { x ∈ U : 0 < µ RP ( x) < 1} (38) Roth90, Scott97]. Several attempts [Amari88,

View publication stats

You might also like