Soft Computing

Soft Computing
2
WHAT IS SOFT COMPUTING?
Usually the primary considerations of traditional (“hard”)

computing are:
 precision,
 certainty, and
 rigor
The challenge is to exploit the tolerance for imprecision by
devising methods of computation that lead to an acceptable,
approximate solution at low cost.
The principal notion in soft computing is that precision and

certainty carry a cost; and that computation, reasoning, and
decision-making should exploit the tolerance for imprecision,
uncertainty, approximate reasoning, and partial truth for
obtaining low-cost solutions.
3
 Soft computing is a consortium of methodologies that

provide flexible information processing capability for
handling real-life ambiguous situations.
 This leads to the remarkable human ability of understanding

distorted speech, deciphering sloppy handwriting,
comprehending the nuances of natural language,
summarizing text, recognizing and classifying images,
driving a vehicle in dense traffic, and, more generally, making
rational decisions in an environment of uncertainty and
imprecision.
4
 Soft Computing became a formal Computer Science area of study in

the early 1990's.
 Earlier computational approaches could model and precisely analyse

only relatively simple systems.
 More complex systems arising in biology, medicine,

humanities, management sciences, and similar fields often
remained intractable to conventional mathematical and
analytical methods.
 Soft computing deals with imprecision, uncertainty, partial

truth, and approximation to achieve tractability, robustness and
low solution cost.
5
 Generally speaking, soft computing techniques resemble biological

processes more closely than traditional techniques, which are largely
based on formal logical systems, such as sentential
logic andpredicate logic, or rely heavily on computer-aided
numerical analysis (as in finite element analysis).
 Soft computing techniques are intended to complement each other.
 Unlike hard computing schemes, which strive for exactness

and full truth, soft computing techniques exploit the given
tolerance of imprecision, partial truth, and uncertainty for a
particular problem.
6
Soft Computing
 The main constituents of soft computing include:
 fuzzy logic
 neural networks
 genetic algorithms
 rough sets and
 signal processing tools such as wavelets.
 Each of them contribute a distinct methodology for addressing

problems in its domain.
 An intelligent and robust system that provides

 human-interpretable,
 low-cost &
 approximate solution
7
Soft Computing
There are ongoing efforts to integrate artificial neural networks
(ANNs), fuzzy set theory, genetic algorithms (GAs), rough set
theory and other methodologies in the soft computing
paradigm.
Hybridization, exploiting the characteristics of these theories

include:
 neuro-fuzzy,
rough-fuzzy,
neuro-genetic,
fuzzy-genetic,
neuro-rough,
rough-neuro-fuzzy
approaches.
8
Soft Computing
 Fuzzy sets provide a natural framework for the process in
dealing with uncertainty or imprecise data - suitable for
handling the issues related to understanding patterns,
incomplete and noisy data
 Neural networks are robust and exhibit good learning and

generalization capabilities in data-rich environments.
 Genetic algorithms (GAs) provide efficient search

algorithms to optimally select a model, from mixed media
data, based on some preference criterion or objective
function.
9
Soft Computing
 Rough sets are suitable for handling different types of
uncertainty in data.
Neural networks and rough sets are widely used for classification and rule
generation.
 Application of wavelet-based signal processing techniques

is new in the area of soft computing.
 Wavelet transformation of a signal results in
decomposition of the original signal in different multi-
resolution sub-bands.
 This is useful in dealing with compression and
retrieval of data, particularly images.
10
Relevance?
11
Machine Learning
Arthur Samuel (1959):
Field of study that gives computers the ability to learn without

being explicitly programmed
Tom Mitchell (1998):
A computer program is said to learn from experience E w.r.t.

some task T and some performance measure P, if it’s
performance on T, as measured by P, improves with
experience E
12
Machine Learning:
An Indispensable Tool in
Bioinformatics
13
Introduction
 The development of high-throughput data acquisition
technologies in biological sciences in the last 5 to 10 years, together
with advances in digital storage, computing, and information
and communication technologies in the 1990s, has begun to
transform biology from a data-poor into a data-rich science.
 This phenomenon is gradually transforming biology from classic

hypothesis-driven approaches, in which a single answer to a single
question is provided, to a data-driven research, in which many
answers are given at a time and we have to seek the hypothesis that
best explains them.
 As a reaction to the exponential growth in the amount of biological

data to handle, the discipline of bioinformatics stores, retrieves,
analyzes and assists in understanding biological information.
14
Introduction (Cont.)
 The development of methods for the analysis of this massive (and
constantly increasing) amount of information is one of the key
challenges in bioinformatics.
 This analysis step – also known as computational biology – faces the

challenge of extracting biological knowledge from all the in-house and
publicly available data.
 Furthermore, the knowledge should be formulated in a transparent

and coherent way if it is to be understood and studied by bio-experts.
 The term “data mining” in bioinformatics refers to the set of

techniques aimed at discovering useful relationships and patterns in
biological data that were previously undetected.
15
Data mining techniques provide a robust means to evaluate the
generalization power of extracted patterns on unseen data, although
these must be further validated and interpreted by the domain
expert. 16
Machine Learning
 Machine learning methods are essentially computer programs that
make use of sampled data or past experience information to
provide solutions to a given problem.
 A wide spectrum of algorithms, commonly based on the artificial

intelligence and statistics fields, have been proposed by the machine
learning community in the last decades.
 Machine learning is able to deal with the huge volumes of data

generated by novel high-throughput devices, in order to extract
hidden relationships that exist and that are not noticeable to
experts.
 As new data and novel concept types are generated every day in
molecular biology research, it is essential to apply techniques able to fit
this fast-evolving nature - Machine learning can be adapted efficiently
to these changing environments. 17
Machine Learning (Cont.)
 Machine learning is able to deal with the abundance of missing and
noisy data from many biological scenarios.
 In several biological scenarios, experts can only specify input–

output data pairs, and they are not able to describe the general
relationships between the different features that could serve to
further describe how they interrelate.
 Machine learning is able to adjust its internal structure to the existing

data, producing approximate models and results.
 Machine learning methods are used to investigate the
underlying mechanisms and the interactions between
biological molecules in many diseases.
 They are also essential for the biomarker discovery process.
18
 Mainly due to the availability of novel types of biology throughput
data, the set of biology problems on which machine learning is
applied is constantly growing.
 Two practical realities severely condition many bioinformatics

applications:
 a limited number of samples, and

 several thousands of features characterizing each sample
 The development of machine learning techniques capable of

dealing with these problems is currently a challenge for the
bioinformatics community.
19
Machine Learning Applications
20
Machine learning algorithms have been taxonomized in the following way:
• Supervised learning:
 Starting from a database of training data that consists of pairs of
input cases and desired outputs, its goal is to construct a function
(or model) to accurately predict the target output of future cases
whose output value is unknown.
 When the target output is a continuous-value variable, the task is

known as regression.
 Otherwise, when the output (or label) is defined as a finite set of

discrete values, the task is known as classification.
21
• Unsupervised learning/ Clustering:
 Starting from a database of training data that consists of input
cases, its goal is to partition the training samples into subsets
(clusters) so that the data in each cluster show a high level of
proximity.
 In contrast to supervised learning, the labels for the data are not
used or are not available in clustering.
• Semi-supervised learning:
 Starting from a database of training data that combines both
labeled and unlabeled examples, the goal is to construct a model
able to accurately predict the target output of future cases for
which its output value is unknown.
 Typically, this database contains a small amount of labeled data
together with a large amount of unlabeled data.
22
• Reinforcement learning:
 These algorithms are aimed at finding a policy that maps states
of the world to actions.
 The actions are chosen among the options that an agent ought to
take under those states, with the aim of maximizing some notion
of long-term reward.
 Its main difference regarding the previous types of machine
learning techniques is that input–output pairs are not present
in a database, and its goal resides in online performance.
• Optimization:
 The task of searching for an optimal solution in a space of
multiple possible solutions.
 As the process of learning from data can be regarded as
searching for the model that best fits the data, optimization
methods can be considered an ingredient in modeling. 23
Supervised and unsupervised classification are the most
broadly applied machine learning types in most application
areas, including bioinformatics.
24
Supervised learning Algorithms
 Averaged One- Dependence  Ripple down rules, a knowledge
Estimators (AODE) acquisition methodology
 Artificial neural network:  Statistical classification : Hidden
Backpropagation Markov models
 Bayesian statistics  Symbolic machine learning algorithms
 Case-based reasoning  Sub-symbolic machine
 Decision trees learning algorithms
 Inductive logic programming  Support vector machines
 Gaussian process regression  Random Forests
 Learning automata  Ensembles of classifiers
 Minimum message length (decision  Bootstrap aggregating (bagging)
trees, decision graphs, etc.)  Boosting
 Lazy learning  Ordinal classification
 Instance-based learning: Nearest  Regression analysis
Neighbor Algorithm  Information fuzzy networks (IFN)
 Probably approximately correct
learning (PAC) learning
25
Fuzzy sets
 We are continuously having to recognize
 people,
 objects,
 handwriting,
 voice, images, and other patterns,
using :
 Distorted
 unfamiliar,
 incomplete,
 occluded,
 fuzzy, and
 inconclusive data,
where a pattern should be allowed to have membership or
belongingness to more than one class. 26
Fuzzy sets
 This is also very significant in medical diagnosis, where a
patient afflicted with a certain set of symptoms can be
simultaneously suffering from more than one disease.
 Again, the symptoms need not necessarily be strictly

numerical.
 This is how the concept of fuzziness comes into the

picture.
Example?
27
Fuzzy control system
A fuzzy control system is a control system based

on fuzzy logic—a mathematical system that
analyzes input values in terms of variables that take
continuous values between 0 and 1, in contrast to
classical or digital logic, which operates on discrete values
of either 0 or 1 (true or false).
28
Overview
 Fuzzy logic is widely used in machine learning.
 The term itself inspires a certain skepticism, sounding equivalent to

"half-baked logic" or "bogus logic", but the "fuzzy" part does not refer
to a lack of rigour in the method, rather to the fact that the logic
involved can deal with fuzzy concepts—concepts that cannot be
expressed as "true" or "false" but rather as "partially true".
 Although genetic algorithms and neural networks can perform just as

well as fuzzy logic in many cases, fuzzy logic has the advantage that the
solution to the problem can be cast in terms that human
operators can understand, so that their experience can be used in
the design of the controller.
 This makes it easier to mechanize tasks that are already successfully

performed by humans. 29
History and Applications
 Fuzzy logic was first proposed by Lotfi A. Zadeh of the University of
California at Berkeley in 1965.
 He elaborated on his ideas in a 1973 paper that introduced the concept

of "linguistic variables", which equate to a variable defined as a fuzzy
set.
 Other research followed, with the first industrial application, a

cement kiln built in Denmark, coming on line in 1975.
 Interest in fuzzy systems was sparked by Seiji Yasunobu and Soji

Miyamoto of Hitachi, who in 1985 provided simulations that
demonstrated the superiority of fuzzy control systems for
the Sendai railway.
 Their ideas were adopted, and fuzzy systems were used to control
accelerating, braking, and stopping when the line opened in 1987.
30
History and Applications
Following such demonstrations, Japanese engineers developed a

wide range of fuzzy systems for both industrial and consumer
applications.
In 1988 Japan established the Laboratory for International

Fuzzy Engineering (LIFE), a cooperative arrangement
between 48 companies to pursue fuzzy research.
Japanese consumer goods often incorporate fuzzy systems

(next slide)
31
Japanese consumer goods involving fuzzy systems
 Matsushita vacuum cleaners use microcontrollers running fuzzy

algorithms to interrogate dust sensors and adjust suction power
accordingly.
 Hitachi washing machines use fuzzy controllers to load-weight,

fabric-mix, and dirt sensors and automatically set the wash cycle for
the best use of power, water, and detergent.
 Canon developed an autofocusing camera that uses a charge-

coupled device (CCD) to measure the clarity of the image in six
regions of its field of view and use the information provided to
determine if the image is in focus.
 It also tracks the rate of change of lens movement during

focusing, and controls its speed to prevent overshoot.
32
Japanese consumer goods involving fuzzy systems (Cont.)
An industrial air conditioner designed by Mitsubishi uses 25

heating rules and 25 cooling rules.
 A temperature sensor provides input, with control

outputs fed to an inverter, a compressor valve, and a
fan motor.
 Compared to the previous design, the fuzzy controller

heats and cools five times faster, reduces power
consumption by 24%, increases temperature stability by
a factor of two, and uses fewer sensors.
33
Japanese consumer goods involving fuzzy systems (Cont.)
The enthusiasm of the Japanese for fuzzy logic is reflected in the

wide range of other applications they have investigated or
implemented:
 character and handwriting recognition;

 optical fuzzy systems;
 robots, including one for making Japanese flower
arrangements;
 voice-controlled robot helicopters
 elevator systems; etc.
34
Fuzzy systems in US & Europe
Work on fuzzy systems is also proceeding in the US and

Europe, though not with the same enthusiasm shown in Japan.
The US Environmental Protection Agency has investigated

fuzzy control for energy-efficient motors, and NASA has
studied fuzzy control for automated space docking:
simulations show that a fuzzy control system can greatly

reduce fuel consumption.
Firms such as Boeing, General Motors, Allen-Bradley, Chrysler,

Eaton, and Whirlpool have worked on fuzzy logic for use in
low-power refrigerators, improved automotive
transmissions, and energy-efficient electric motors.
35
Fuzzy systems in US & Europe (Cont.)
 In 1995 Maytag introduced an "intelligent" dishwasher based on a
fuzzy controller and a "one-stop sensing module" that combines:
 a thermistor, for temperature measurement;

 a conductivity sensor, to measure detergent level from the ions
present in the wash;
 a turbidity sensor that measures scattered and transmitted light to
measure the soiling of the wash; and
 a magnetostrictive sensor to read spin rate.
 The system determines the optimum wash cycle for any load to obtain
the best results with the least amount of energy, detergent, and water.
 It even adjusts for dried-on foods by tracking the last time the door
was opened, and estimates the number of dishes by the number of
times the door was opened.
36
Research and development is also continuing on
fuzzy applications in software, as opposed
to firmware, design, including fuzzy expert
systems and integration of fuzzy logic
with neural-network and so-called adaptive
"genetic" software systems, with the ultimate
goal of building "self-learning" fuzzy control
systems.
37
Fuzzy sets
The input variables in a fuzzy control system are in general
mapped into by sets of membership functions, known as "fuzzy
sets‖.
The process of converting a crisp input value to a fuzzy value is

called "fuzzification".
A control system may also have various types of switch, or

"ON-OFF", inputs along with its analog inputs, and such switch
inputs will always have a truth value equal to either 1 or 0, but
the scheme can deal with them as simplified fuzzy functions that
happen to be either one value or another.
38
Fuzzy sets
Given "mappings" of input variables into membership functions
and truth values, the microcontroller then makes decisions for
what action to take based on a set of "rules", each of the form:
IF brake temperature IS warm AND speed IS not very fast

THEN brake pressure IS slightly decreased.
In this example, the two input variables are "brake temperature"
and "speed" that have values defined as fuzzy sets.
The output variable, "brake pressure", is also defined by a fuzzy

set that can have values like "static", "slightly increased", "slightly
decreased", and so on.
39
Fuzzy sets (Cont.)
The decision is based on a set of rules:
All the rules that apply are invoked, using the membership
functions and truth values obtained from the inputs, to
determine the result of the rule.
This result in turn will be mapped into a membership function

and truth value controlling the output variable.
These results are combined to give a specific ("crisp") answer,

the actual brake pressure, a procedure known as
"defuzzification".
This combination of fuzzy operations and rule-

based "inference" describes a "fuzzy expert system“. 40
Fuzzy control in detail
Fuzzy controllers are very simple conceptually.
They consist of an input stage, a processing stage, and an

output stage.
The input stage maps sensor or other inputs to the appropriate

membership functions and truth values.
The processing stage invokes each appropriate rule and

generates a result for each, then combines the results of the
rules.
Finally, the output stage converts the combined result back

into a specific control output value.
41
Fuzzy control in detail (Cont.)
The most common shape of membership functions is
triangular, although trapezoidal and bell curves are also used,
but the shape is generally less important than the number of
curves and their placement.
From three to seven curves are generally appropriate to cover

the required range of an input value, or the "universe of
discourse" in fuzzy jargon.
As discussed earlier, the processing stage is based on a

collection of logic rules in the form of IF-THEN
statements, where the IF part is called the "antecedent"
and the THEN part is called the "consequent".
Typical fuzzy control systems have dozens of rules. 42

Consider a rule for a thermostat:
IF (temperature is "cold") THEN (heater is "high")
 This rule uses the truth value of the "temperature" input, which is
some truth value of "cold", to generate a result in the fuzzy set for
the "heater" output, which is some value of "high".
 This result is used with the results of other rules to finally generate
the crisp composite output.
 Obviously, the greater the truth value of "cold", the higher the truth
value of "high", though this does not necessarily mean that the output
itself will be set to "high", since this is only one rule among many.
43
 In some cases, the membership functions can be modified by
"hedges" that are equivalent to adjectives.
 Common hedges include "about", "near", "close to", "approximately",

"very", "slightly", "too", "extremely", and "somewhat".
 These operations may have precise definitions, though the definitions

can vary considerably between different implementations.
 "Very", for example, squares membership functions; since the

membership values are always less than 1, this narrows the
membership function.
 "Extremely" cubes the values to give greater narrowing, while

"somewhat" broadens the function by taking the square root.
44
 In practice, the fuzzy rule sets usually have several antecedents

that are combined using fuzzy operators, such as AND, OR,
and NOT, though again the definitions tend to vary:
 AND uses the minimum weight of all the antecedents, while

 OR uses the maximum value.
 NOT subtracts a membership function from 1 to give the
"complementary" function.
 There are several different ways to define the result of a rule, but
one of the most common and simplest is the "max-min"
inference method, in which the output membership function is
given the truth value generated by the premise.
45
Neural Networks
46
Human Brain
47
Human Brain
48
Neural Networks in Brain
49
50
51
Brain as an information processing system
• Consists of ~10 billion nerve cells or neurons.
• ~60 trillion inter-connections
Neural Network?
52
Introduction to Artificial Neural Networks
Usefulness and Capabilities
1. Non-linearity
2. Input-Output Mapping
- associations
- Auto-associations
3. Adaptivity
- “free parameters”
4. Evidential Response
- Decision with a measure of confidence
5. Fault Tolerance
- graceful degradation
6. VLSI implementability
7. Neurobiological analogy
53
Introduction
Computer science has been widely adopted by modern medicine.
One reason is that an enormous amount of data has to be

gathered and analysed which is very hard or even impossible
without making use of computer systems.
The majority of medical tools are able to send results of their

work directly to a computer facilitating significantly collection of
necessary information.
A large number of such tools already exists and they provide an

aid to the doctors in their everyday work.
54
ANN use in medicine
 ANNs’ effectiveness in recognizing patterns and relations is

a reason why they are being used to aid doctors in solving
medical problems.
 They have shown large efficiency not only in diagnosis but

also in modelling parts of the human body.
 One of the most important dermatological problems is

melanoma diagnosis.
 Dermatologists achieve accuracy in recognizing malignant

melanoma between 65 and 85%, whereas early detection means
decreasing of mortality.
55
ANN use in medicine
 Diagnostic and neural analysis of skin cancer (DANAOS)

showed the results comparable to results of dermatologists.
 It was also found that images hard to recognize by DANAOS

differed from those causing problems to dermatologists.
 Cooperation between humans and computers could therefore

lower the probability of mistakes.
 Results obtained are also dependent on the size and quality of

the database used.
56
ANN use in medicine
 ANNs have also been adopted in pharmaceutical

research and in many other different clinical applications
using pattern recognition; for example:
- diagnosis of breast cancer,
- interpreting ECGs,
- diagnosing dementia,
- predicting prognosis and survival rates.
57
Neural networks
 The ANNs consist of many connected neurons simulating a
brain at work.
 A basic feature which distinguishes an ANN from an
algorithmic program is the ability to generalize the knowledge
of new data which was not presented during the learning
process.
 Expert systems need to gather actual knowledge of its
designated area.
 However, ANNs only need one training and show tolerance
for discontinuity, accidental disturbances or even defects in the
training data set.
 This allows for usage of ANNs in solving problems which
cannot be solved by other means effectively. 58
Neural networks
 These features and advantages are the reason why the area
of ANN’s application is very wide and includes for
example:
– Pattern recognition,
– Object classification,
– Medical diagnosis,
– Forecast of economical risk, market prices
changes, need for electrical power, etc.,
– Selection of employees
59
Biological neural networks
The human brain consists of around 1011 nerve cells called neurons
60
What is a neural network (NN)?
• Neural networks is a branch of "Artificial Intelligence".
• Artificial Neural Network is a system loosely modeled based

on the human brain.
• The field goes by many names, such as connectionism, parallel

distributed processing, neuro-computing, natural intelligent
systems, machine learning algorithms, and artificial neural
networks.
61
Description
• Most neural networks have some sort of "training" rule

whereby the weights of connections are adjusted on the
basis of presented patterns.
• In other words, neural networks "learn" from examples,

just like children learn to recognize dogs from examples of
dogs, and exhibit some structural capability for
generalization.
• Neural networks normally have great potential for

parallelism, since the computations of the components are
independent of each other
62
Introduction
• Neural networks are a powerful technique to solve many real
world problems.
• They have the ability to learn from experience in order to
improve their performance and to adapt themselves to
changes in the environment.
• In addition to that they are able to deal with incomplete
information or noisy data and can be very effective
especially in situations where it is not possible to define the
rules or steps that lead to the solution of a problem.
• They typically consist of many simple processing units, which

are wired together in a complex communication network.
63
The Brain
The Brain as an Information

Processing System
The human brain contains

about 10 billion nerve cells, or
neurons. On average, each
neuron is connected to other
neurons through about 10000
synapses.
64
Computation in the brain
• The brain's network of neurons forms a massively parallel
information processing system. This contrasts with conventional
computers, in which a single processor executes a single series of
instructions.
• Against this, consider the time taken for each elementary

operation: neurons typically operate at a maximum rate of about
100 Hz, while a conventional CPU carries out several hundred
million machine level operations per second. Despite of being
built with very slow hardware, the brain has quite remarkable
capabilities:
– Its performance tends to degrade gracefully under partial damage. In

contrast, most programs and engineered systems are brittle: if you
remove some arbitrary parts, very likely the whole will cease to function.
65
Computation in the brain
• It can learn (reorganize itself) from experience.
• This means that partial recovery from damage is possible if healthy
units can learn to take over the functions previously carried out by
the damaged areas.
• It performs massively parallel computations extremely efficiently.
For example, complex visual perception occurs within less than
100 ms, that is, 10 processing steps!
• It supports our intelligence and self-awareness.
• As a discipline of Artificial Intelligence, Neural Networks attempt
to bring computers a little closer to the brain's capabilities by
imitating certain aspects of information processing in the brain, in
a highly simplified way.
66
Introduction to ANNs
Structure of an artificial neuron, worked out by McCulloch

and Pitts in 1943, is similar to biological neuron.
The weighted sum of the inputs is transformed by the

activation function to give the final output
67
Introduction to ANNs (Cont.)
 It consist of two modules: summation module Σ and
activation module F.
 Roughly the summation module corresponds to biological
nucleus.
 There algebra summation of weighted input signals is realised
and the output signal is generated.
Output signal can be calculated using the formula:
where w - vector of weights (synapses equivalent),

u - vector of input signals (dendrites equivalent),
m - number of inputs.
68
Signal is processed by the activation module F, which can

be specified by different functions according to needs.
A simple linear function can be used, then the output signal y

has form
where k is coefficient.
Networks using this function are called Madaline and their

neurons are called Adaline (ADAptive LINear Element).
They are the simplest networks, which have found practical

application.
69
ADALINE
 ADALINE (Adaptive Linear Neuron or later Adaptive

Linear Element) is a single layer neural network.
 It was developed by Bernard Widrow and his graduate

student Ted Hoff at Stanford University in 1960.
 It is based on the McCulloch–Pitts neuron.
 It consists of a weight, a bias and a summation function.
70
ADALINE (Cont.)
 The difference between Adaline and the standard

(McCulloch-Pitts) perceptron is that in the learning phase
the weights are adjusted according to the weighted sum of
the inputs (the net).
 In the standard perceptron, the net is passed to the

activation (transfer) function and the function's output is
used for adjusting the weights.
71
ADALINE - Definition
Adaline is a single layer neural network with multiple nodes where

each node accepts multiple inputs and generates one output.
Given the following variables:

• x is the input vector
• w is the weight vector
• n is the number of inputs
• θ is some constant
• y is the output
Then we find that the output is
72
ADALINE - Learning Algorithm
Let us assume:
• η is the learning rate (some constant)

• d is the desired output
• o is the actual output
then the weights are updated as follows:
The ADALINE converges to the least squares error which is
E = (d − o)
73
Madaline
 Madaline (Multiple Adaline) is a two layer neural network

with a set of ADALINEs in parallel as its input layer and
a single PE (processing element) in its output layer.
 The madaline network is useful for problems which involve

prediction based on multiple inputs, such as:
- weather forecasting
74
Another type of the activation module function is a threshold

function
where is constant threshold value.
75
However, functions which describe a non-linear profile of
biological neuron more precisely are:
A sigmoid function:
where β is given parameter, and
A tangensoid function:
where α is given parameter

76
 Information capacity and processing ability of a single neuron

is relatively small.
 However, it can be raised by the appropriate connection of

many neurons.
 In 1958, the first ANN, called perceptron, was developed by

Rosenblatt.
 It was used for alphanumerical character recognition.
77
Neurons in the multilayer ANNs are grouped into 3 different
types of layers: input, output, and hidden layer
There can be one or more hidden layers in the network but

only one output and one input layer.
78
 The number of neurons in the input layer is specified by the

type and amount of data which will be given to the input.
 The number of output neurons corresponds to the type of

answer of the network.
 The amount of hidden layers and their neurons is more

difficult to determine.
 A network with one hidden layer suffices to solve most tasks.
 None of the known problems needs a network with more than

three hidden layers in order to be solved.
79
Selection of the number of layers for solving different problems.
More complicated networks can solve more complicated issues.

80
 There is no good recipe for the number of hidden neurons

selection.
 One of the methods is described by formula
where Nh is the number of neurons in the hidden layer, and

Ni and No are the corresponding numbers for the
input and output layers, respectively.
However, usually the quantity of hidden neurons is

determined empirically
81
 Two types of a multilayer ANNs can be distinguished with

regards to the architecture:
- feed-forward and
- feedback networks.
82
 In the feed-forward networks, signal can move in one
direction only and cannot move between neurons in the same
layer.
Multilayer feed-forward ANN
Such networks can be used in the pattern recognition.

83
 Feedback networks are more complicated, because a signal can

be sent back to the input of the same layer with a changed value.
Signals can move in these loops until the proper state is achieved.
These networks are also called interactive or recurrent. 84

Training of the ANN
 The process of training of the ANN consists in changing the
weights assigned to connections of neurons until the achieved
result is satisfactory.
 Two main kinds of learning can be distinguished:

- supervised and
- unsupervised learning.
 In supervised learning, external teacher is being used to

correct the answers given by the network.
 ANN is considered to have learned when computed errors are

minimized.
85
Training of the ANN (Cont.)
 Unsupervised learning does not use a teacher.
 ANN has to distinguish patterns using the information

given to the input without external help.
 This learning method is also called self-organisation.
 It works like a brain which uses sensory impressions to

recognise the world without any instructions
86
Training of the ANN (Cont.)
 One of the best known learning algorithms is the Back-
Propagation Algorithm (BPA).
 This basic, supervised learning algorithm for multilayered

feed-forward networks gives a recipe for changing the weights
of the elements in neighbouring layers.
 It consists in minimization of the sum-of-squares errors,

known as least squares.
87
Back-Propagation Algorithm (BPA)
To teach ANN using BPA the following steps have to be carried
out for each pattern in the learning set:
1. Insert the learning vector uμ as an input to the network.
2. Evaluate the output values ujmμ of each element for all layers
using the formula
3. Evaluate error values for the output layer using the

formula
88
Back-Propagation Algorithm (BPA) (Cont.)
4. Evaluate sum-of-squares errors from
5. Carry out the back-propagation of output layer error

to all elements of hidden layers calculating their errors
from
89
6. Update the weights of all elements between output and hidden
layers and then between all hidden layers moving towards the
input layer.
Changes of the weights can be obtained from
Back-propagation of errors values
90
Above steps have to be repeated until satisfactory minimum of
complete error function is achieved:
91
 Every iteration of these instructions is called epoch.
 After the learning process is finished another set of patterns

can be used to verify the knowledge of the ANN.
 For complicated networks and large sets of patterns the

learning procedure can take a lot of time.
 Usually it is necessary to repeat the learning process many

times with different coefficients selected by trial and error.
 There are a variety of optimisation methods that can be used

to accelerate the learning process.
92
One of them is momentum technique, which consists in

calculating the changes of the weights for the pattern (k + 1)
using formula
where α is constant value which determines the influence of

the previous change of weights to the current change.
93
A Simple Artificial Neuron
•The basic computational element (model neuron) is often called a node or unit. It
receives input from some other units, or perhaps from an external source. Each
input has an associated weight w, which can be modified so as to model synaptic
learning. The unit computes some function f of the weighted sum of its inputs
•Its output, in turn, can serve as input to other

units.
•The weighted sum is called the net input to

unit i, often written neti.
•Note that wij refers to the weight from unit j to

unit i (not the other way around).
•The function f is the unit's activation function.

In the simplest case, f is the identity function,
and the unit's output is just its net input. This is
called a linear unit. 94
Applications:
Neural Network Applications can be grouped in following categories:
• Clustering:
A clustering algorithm explores the similarity between patterns and

places similar patterns in a cluster. Best known applications include
data compression and data mining.
• Classification/Pattern recognition:
The task of pattern recognition is to assign an input pattern (like

handwritten symbol) to one of many classes. This category includes
algorithmic implementations such as associative memory.
95
Applications:
• Function approximation:
The tasks of function approximation is to find an estimate of

the unknown function f() subject to noise. Various engineering
and scientific disciplines require function approximation.
• Prediction/Dynamical Systems:
The task is to forecast some future values of a time-sequenced

data. Prediction has a significant impact on decision support
systems. Prediction differs from Function approximation by
considering time factor.
Here the system is dynamic and may produce different results

for the same input data based on system state (time).
96
Types of Neural Networks
Neural Network types can be classified based on following attributes:
• Applications
-Classification
-Clustering
-Function approximation
-Prediction
• Connection Type
- Static (feedforward)
- Dynamic (feedback)
• Topology
- Single layer
- Multilayer
- Recurrent
- Self-organized
• Learning Methods
- Supervised
- Unsupervised
97
The McCulloch-Pitts Model of Neuron
• The early model of an artificial neuron is introduced by Warren McCulloch
and Walter Pitts in 1943. The McCulloch-Pitts neural model is also known as
linear threshold gate. It is a neuron of a set of inputs I1,I2,I3…Im and one
output y . The linear threshold gate simply classifies the set of inputs into two
different classes. Thus the output y is binary. Such a function can be described
mathematically using these equations:
•W1,W2…Wm are weight values

normalized in the range of either (0,1) or (-
1,1) and associated with each input line,
Sum is the weighted sum, and T is a
threshold constant. The function f is a
linear step function at threshold T as
shown in figure
98
The Perceptron
• In late 1950s, Frank Rosenblatt introduced a network composed of
the units that were enhanced version of McCulloch-Pitts Threshold
Logic Unit (TLU) model. Rosenblatt's model of neuron, a perceptron,
was the result of merger between two concepts from the 1940s,
McCulloch-Pitts model of an artificial neuron and Hebbian learning
rule of adjusting weights. In addition to the variable weight values,
the perceptron model added an extra input that represents bias.
Thus, the modified equation is now as follows:
where b represents the bias value.
99
The McCulloch-Pitts Model of Neuron
Symbolic Illustration of Linear Threshold Gate
• The McCulloch-Pitts model of a neuron is simple yet has substantial computing potential.
It also has a precise mathematical definition. However, this model is so simplistic that it
only generates a binary output and also the weight and threshold values are fixed. The
neural computing algorithm has diverse features for various applications . Thus, we need to
obtain the neural model with more flexible computational features.
100
Artificial Neuron with Continuous
Characteristics
• Based on the McCulloch-Pitts model described previously, the
general form an artificial neuron can be described in two stages
shown in figure. In the first stage, the linear combination of inputs is
calculated. Each value of input array is associated with its weight
value, which is normally between 0 and 1. Also, the summation
function often takes an extra input value Theta with weight value
of 1 to represent threshold or bias of a neuron. The summation
function will be then performed as
• The sum-of-product value is then passed into the second stage to

perform the activation function which generates the output from
the neuron. The activation function ``squashes" the amplitude the
output in the range of [0,1] or [-1,1] alternately. The behavior of
the activation function will describe the characteristics of an
artificial neuron model. 101
Characteristics
•The signals generated by actual biological neurons are the action-potential spikes, and the
biological neurons are sending the signal in patterns of spikes rather than simple absence or
presence of single spike pulse. For example, the signal could be a continuous stream of
pulses with various frequencies. With this kind of observation, we should consider a signal to
be continuous with bounded range. The linear threshold function should be ``softened".
•One convenient form of such ``semi-linear" function is the logistic sigmoid function, or in
short, sigmoid function as shown in figure. As the input x tends to large positive value, the
output value y approaches to 1. Similarly, the output gets close to 0 as x goes negative.
However, the output value is neither close to 0 nor 1 near the threshold point.
102
Characteristics
• This function is expressed
mathematically as follows:
• Additionally, the sigmoid

function describes the
``closeness" to the threshold
point by the slope. As x
approaches to
- infinity or + infinity ,
the slope is zero; the slope

increases as x approaches to 0.
This characteristic often plays an
important role in learning of
neural networks.
103
Support Vector Machines
104
 SVMs are a set of related supervised learning methods used for
classification and regression.
 In simple words, given a set of training examples, each marked as

belonging to one of two categories, a SVM training algorithm
builds a model that predicts whether a new example falls into
one category or the other.
 An SVM model is a representation of the examples as points in

space, mapped so that the examples of the separate categories
are divided by a clear gap that is as wide as possible.
 New examples are then mapped into that same space and
predicted to belong to a category based on which side of the gap
they fall on. 105
A linear support vector machine is composed of a set of

given support vectors z and a set of weights w.
The computation for the output of a given SVM with N

support vectors z1, z2, … , zN and weights w1, w2, … , wN is
then given by:
106
 SVMs map input samples into a higher-dimensional space

where a maximal separating hyperplane among the instances
of different classes is constructed.
107
Support Vector Machines (Cont.)
 The method works by constructing another two parallel

hyperplanes on each side of this hyperplane.
 The SVM method tries to find the separating hyperplane that

maximizes the area of separation between the two parallel
hyperplanes.
 A larger separation between these parallel hyperplanes

implies a better predictive accuracy of the classifier.
108
Support Vector Machines (Cont.)
 As the widest area of separation is, in fact, determined by a few

samples that are close to both parallel hyperplanes, these
samples are called support vectors.
 They are also the most difficult samples to be correctly

classified.
109
 SVM proposed by Vapnik was originally designed for

classification and regression tasks.
 Essence of SVM method is construction of optimal

hyperplane, which can separate data from opposite
classes using the biggest possible margin.
 Margin is a distance between optimal hyperplane and a

vector which lies closest to it.
110
 SVMs are a set of related supervised learning methods

which analyse data and recognize patterns, used for statistical
classification and regression analysis.
 Given a set of training examples, each marked as belonging to

one of two categories, an SVM training algorithm builds a
model that predicts whether a new example falls into one
category or the other.
 An SVM model is a representation of the examples as points in

space, mapped, so that the examples of the separate categories
are divided by a clear gap that is as wide as possible.
111
 New examples are then mapped into that same space and
predicted to belong to a category based on which side of the
gap they fall on.
 More formally, SVM constructs a hyperplane or set of

hyperplanes in a high or infinite dimensional space, which can
be used for classification, regression or other tasks.
 A good separation is achieved by the hyperplane that has the

largest distance to the nearest training datapoints of any class
(so-called functional margin), since in general the larger the
margin the lower the generalization error of the classifier.
112
Motivation
 Classifying data is a common task in machine learning.
 Suppose some given data points each belong to one of two

classes, and the goal is to decide which class a new data point will
be in.
 There are many hyperplanes that might classify the data.
 One reasonable choice as the best hyperplane is the one that

represents the largest separation, or margin, between the two
classes.
113
Motivation
 So we choose the hyperplane so that the distance from it to the
nearest data point on each side is maximized.
 If such a hyperplane exists, it is known as the maximum-

margin hyperplane and the linear classifier it defines is
known as a maximum margin classifier.
114
Optimal hyperplane separating two classes
115
Classification with Large Margin
 Whenever a dataset is linearly separable, i.e. there exists a
hyperplane that correctly classifies all data points, there exist
many such separating hyperplanes.
 We are thus faced with the question of which hyperplane to

choose, ensuring that not only the training data, but also future
examples, unseen by the classifier at training time, are classified
correctly.
 Our intuition as well as statistical learning theory suggests that

hyperplane classifiers will work better if the hyperplane not only
separate the examples correctly, but does so with a large margin.
 Here, the margin of a linear classifier is defined as the distance

of the closest example to the decision boundary. 116
Hard Margin
 Let us adjust b such that the hyperplane is half way in between
the closest positive and negative example, respectively.
 This ―hard margin‖ SVM, applicable to linearly separable data,

is the classifier with maximum margin among all classifiers
that correctly classify all the input examples.
117
Soft Margin
In practice, data is often not linearly separable; and even if it is, a
greater margin can be achieved by allowing the classifier to
misclassify some points.
118
Theory and experimental results show that the
resulting larger margin will generally provide better
performance than the hard margin SVM.
119
The two points closest to the Here, those points move
hyperplane strongly affect its inside the margin, and the
orientation, leading to a hyperplane’s orientation is
hyperplane that comes close changed, leading to a much
to several other data points. larger margin for the rest of
the data.
120
Soft Margin
The constant C > 0 sets the relative importance of maximizing

the margin and minimizing the amount of slack.
This formulation is called the soft-margin SVM.
For a large value of C a large penalty is assigned to errors.
121
Non-linear classification
 The original optimal hyperplane algorithm proposed by Vladimir
Vapnik in 1963 was a linear classifier.
 In 1992, Bernhard Boser, Isabelle Guyon and Vapnik suggested a

way to create non-linear classifiers by applying the kernel trick to
maximum-margin hyperplanes.
 This allows the algorithm to fit the maximum-margin hyperplane

in a transformed feature space.
122
Non-linear classification (Cont.)
 The transformation may be non-linear and the transformed

space high dimensional; thus though the classifier is a
hyperplane in the high-dimensional feature space, it may be
non-linear in the original input space.
 If the kernel used is a Gaussian radial basis function, the

corresponding feature space is a Hilbert space of infinite
dimension.
 Maximum margin classifiers are well regularized, so the

infinite dimension does not spoil the results.
123
Kernel Support Vector Machines
 Using kernels, the original formulation for the SVM given

SVM with support vectors z1, z2, … , zN and weights w1,
w2, … , wN is now given by:
124
The Kernel trick
 The Kernel trick is a very interesting and powerful tool.
 It is powerful because it provides a bridge from linearity to non-

linearity to any algorithm that solely depends on the dot
product between two vectors.
 It comes from the fact that, if we first map our input data into a
higher-dimensional space, a linear algorithm operating in this
space will behave non-linearly in the original input space.
 Now, the Kernel trick is really interesting because that mapping

does not need to be ever computed.
125
The Kernel trick
If our algorithm can be expressed only in terms of a inner

product between two vectors, all we need is replace this inner
product with the inner product from some other suitable
space.
That is where resides the "trick": wherever a dot product is

used, it is replaced with a Kernel function.
The kernel function denotes an inner product in feature space

and is usually denoted as:
K(x,y) = <φ(x),φ(y)>
126
Kernel functions (Cont.)
We are now looking for solution in other space, but the problem
is linearly separable, so it is more effective, even if the problem
was linearly non-separable in the input space
127
Kernel functions (Cont.)
128
The Kernel trick
Using the Kernel function, the algorithm can then be carried

into a higher-dimension space without explicitly mapping
the input points into this space.
This is highly desirable, as sometimes our higher-

dimensional feature space could even be infinite-dimensional
and thus infeasible to compute.
129
Kernel functions
 Possibility of occurrence of the linear non-separability in the

input space consist the cause why the idea of SVM is not
optimal for hyperplane construction in the input space but
rather in high-dimensional so called feature space Z.
130
Kernels: from Linear to Non-Linear
Classifiers
In many applications a non-linear classifier provides better

accuracy.
And yet, linear classifiers have advantages, one of them being

that they often have simple training algorithms that scale well
with the number of examples.
Whether the machinery of linear classifiers can be

extended to generate non-linear decision boundaries?
131
Kernels: from Linear to Non-Linear
Classifiers
There is a straightforward way of turning a linear classifier non-
linear, or making it applicable to non-vectorial data.
It consists of mapping our data to some vector space, which we

will refer to as the feature space, using a function φ.
The discriminant function then is
132
Kernels for Real-valued Data
 Real-valued data, i.e. data where the examples are vectors of a

given dimensionality, is common in bioinformatics and other
areas.
 A few examples of applying SVM to real-valued data include:

- prediction of disease state from microarray data
- prediction of protein function from a set of
features that include amino acid composition and
various properties of the amino acids in the protein.
 The two most commonly used kernel functions for real-valued

data are the polynomial and the Gaussian kernel.
133
The polynomial kernel of degree d is defined as:
The kernel with d = 1 and kappa = 0, denoted by klinear, is

the linear kernel leading to a linear discriminant function.
134
The degree of the polynomial kernel controls the flexibility of
the resulting classifier.
The lowest degree polynomial is the linear kernel, which is not

sufficient when a non-linear relationship between features exists.
In some cases, a degree 2 polynomial may be flexible enough to

discriminate between the two classes with a good margin.
The degree 5 polynomial yields a similar decision boundary, with

greater curvature.
Normalization can help to improve performance and numerical

stability for large d.
135
Nonlinear kernels such as the polynomial kernel provide
additional flexibility.
136
Standard Kernels
Other common kernel functions include:
 Linear
 Polynomial
 Radial Basis Function
 Gaussian Radial basis function
 Hyperbolic tangent
137
Gaussian Kernel
 The Gaussian kernel is by far one of the most versatile

Kernels.
 It is a radial basis function kernel, and is the preferred Kernel

when we don’t know much about the data we are trying to
model.
138
Learning Algorithms
The algorithm is governed by extra parameters besides the

Kernel function and the data points:
The parameter C controls the trade off between allowing some

training errors and forcing rigid margins.
- Increasing the value of C increases the cost of

misclassifications but may result in models that do not
generalize well to points outside the training set.
139
Example: Splice Site Recognition
 It is a problem arising in computational gene finding and concerns
the recognition of splice sites that mark the boundaries between
exons and introns in eukaryotes.
 Introns are excised from premature mRNAs in a processing step

after transcription.
 The vast majority of all splice sites are characterized by the

presence of specific dimers on the intronic side of the splice site:
- GT for donor and

- AG for acceptor sites.
 However, only about 0.1-1% of all GT and AG occurrences in the

genome represent true splice sites.
140
141
 There are two different splice sites: the exon-intron boundary,

referred to as the donor site or 5’ site (of the intron) and the
intron-exon boundary, that is the acceptor or 3’ site.
 Splice sites have quite strong consensus sequences, i.e. almost

each position in a small window around the splice site is
representative of the most frequently occurring nucleotide
when many existing sequences are compared in an alignment,
142
Performance of SVM
 To evaluate the classifier performance, receiver operating
characteristic (ROC) curves are used, which show the true positive
rates (y-axis) over the full range of false positive rates (x-axis).
 Different values are obtained by using different thresholds on the

value of the discriminant function for assigning the class
membership.
 The area under the curve quantifies the quality of the classifier,
and a larger value indicates better performance.
 Research has shown that it is a better measure of classifier

performance, in particular when the fraction of examples in one
class is much smaller than the other.
143
Sequence logo for acceptor splice sites:
Splice sites have quite strong consensus sequences, i.e. almost each
position in a small window around the splice site is representative
of the most frequently occurring nucleotide when many existing
sequences are compared in an alignment.
The sequence logo shows the region around the intron/ exon
boundary—the acceptor splice site.
144
Multi-class classification
 Although SVM method is naturally adapted for separating

data from two classes, it can be easily transformed into
very useful tool for the classification of more than two
classes.
 There are two basic ways of solving the N-class problem:
– Solving N number of two class classification tasks,
– Pairwise classification
145
Multi-class classification (Cont.)
 The first method consists in teaching of many classifiers

using one versus the rest method.
 It means that while solving every i-th task (i = 1, 2, ...,N)

we carry out the separation of one, current class from the
other classes and every time new hyperplane comes into
being
146
 Support vectors, which belong to class i satisfy y(x;wi, bi) = 1,

whereas the other ones satisfy condition
y(x;wi, bi) = −1.
 If for a new vector we have
then the vector is assigned for class i. However, it may happen,

that is true for many i or is not true for any of them.
For such cases the classification is unfeasible.
147
 In pairwise classification N-class problem is replaced with

N(N−1)/2 differentiation tasks between two classes.
 However, the number of classifiers is greater than in the

previous method, individual classifiers can be trained faster
and depending on the dataset this results in time savings.
 Unambiguous classification is not always possible, since it

may happen that more than one class will get the same
number of votes.
148
Evaluation and Comparison of the
Model Predictive Power (Cont.)
 Receiver operating characteristic (ROC) curves are an
interesting tool for representing the accuracy of a classifier.
 The ROC analysis evaluates the accuracy of an algorithm

over a range of possible operating (or tuning) scenarios.
 A ROC curve is a plot of a model’s true-positive rate

against its false-positive rate: sensitivity versus 1-specifity.
 The ROC curve represents a plot of these two concepts

for a number of values of a parameter (operating
scenarios) of the classification algorithm.
149
Genetic Algorithms
150
Genetic Algorithms
 Genetic algorithms are based on evolutionary principles
wherein a particular function or definition that best fits the
constraints of an environment survives to the next generation,
and the other functions are eliminated.
 This iterative process continues indefinitely, allowing the

algorithm to adapt dynamically to the environment as needed.
 Genetic algorithms evaluate a large number of solutions to a

problem that are generated at random.
 The members highest fitness scores are allowed to "mate" with

crossovers and mutations, creating the next generation.
151
Genetic Algorithm - Concept
152
Genetic Algorithm
• Inspired by natural evolution
• Population of individuals
– Individual is feasible solution to problem
• Each individual is characterized by a Fitness function
– Higher fitness is better solution
• Based on their fitness, parents are selected to reproduce offspring
for a new generation
– Fitter individuals have more chance to reproduce
– New generation has same size as old generation; old generation
dies
• Offspring has combination of properties of two parents
• If well designed, population will converge to optimal solution
153
Example of convergence
154
Genetic Algorithm
{
initialize population;
evaluate population;
while TerminationCriteriaNotSatisfied
{
select parents for reproduction;
perform recombination and mutation;
evaluate population;
}
}
155

Soft Computing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Soft Computing

Uploaded by

Copyright:

Available Formats

Soft Computing

Usually the primary considerations of traditional (“hard”)

The principal notion in soft computing is that precision and

 Soft computing is a consortium of methodologies that

 This leads to the remarkable human ability of understanding

 Soft Computing became a formal Computer Science area of study in

 Earlier computational approaches could model and precisely analyse

 More complex systems arising in biology, medicine,

 Soft computing deals with imprecision, uncertainty, partial

 Generally speaking, soft computing techniques resemble biological

 Soft computing techniques are intended to complement each other.

 Unlike hard computing schemes, which strive for exactness

 Each of them contribute a distinct methodology for addressing

 An intelligent and robust system that provides

Hybridization, exploiting the characteristics of these theories

 Neural networks are robust and exhibit good learning and

 Genetic algorithms (GAs) provide efficient search

 Application of wavelet-based signal processing techniques

Arthur Samuel (1959):

Field of study that gives computers the ability to learn without

Tom Mitchell (1998):

A computer program is said to learn from experience E w.r.t.

 This phenomenon is gradually transforming biology from classic

 As a reaction to the exponential growth in the amount of biological

 This analysis step – also known as computational biology – faces the

 Furthermore, the knowledge should be formulated in a transparent

 The term “data mining” in bioinformatics refers to the set of

 A wide spectrum of algorithms, commonly based on the artificial

 Machine learning is able to deal with the huge volumes of data

 In several biological scenarios, experts can only specify input–

 Machine learning is able to adjust its internal structure to the existing

 Two practical realities severely condition many bioinformatics

 a limited number of samples, and

 The development of machine learning techniques capable of

 When the target output is a continuous-value variable, the task is

 Otherwise, when the output (or label) is defined as a finite set of

 Again, the symptoms need not necessarily be strictly

 This is how the concept of fuzziness comes into the

A fuzzy control system is a control system based

 The term itself inspires a certain skepticism, sounding equivalent to

 Although genetic algorithms and neural networks can perform just as

 This makes it easier to mechanize tasks that are already successfully

 He elaborated on his ideas in a 1973 paper that introduced the concept

 Other research followed, with the first industrial application, a

 Interest in fuzzy systems was sparked by Seiji Yasunobu and Soji

Following such demonstrations, Japanese engineers developed a

In 1988 Japan established the Laboratory for International

Japanese consumer goods often incorporate fuzzy systems

 Matsushita vacuum cleaners use microcontrollers running fuzzy

 Hitachi washing machines use fuzzy controllers to load-weight,

 Canon developed an autofocusing camera that uses a charge-

 It also tracks the rate of change of lens movement during

An industrial air conditioner designed by Mitsubishi uses 25

 A temperature sensor provides input, with control

 Compared to the previous design, the fuzzy controller

The enthusiasm of the Japanese for fuzzy logic is reflected in the

 character and handwriting recognition;

Work on fuzzy systems is also proceeding in the US and

The US Environmental Protection Agency has investigated

simulations show that a fuzzy control system can greatly

Firms such as Boeing, General Motors, Allen-Bradley, Chrysler,

 a thermistor, for temperature measurement;

The process of converting a crisp input value to a fuzzy value is