Professional Documents
Culture Documents
2
WHAT IS SOFT COMPUTING?
4
WHAT IS SOFT COMPUTING?
5
WHAT IS SOFT COMPUTING?
6
Soft Computing
The main constituents of soft computing include:
fuzzy logic
neural networks
genetic algorithms
rough sets and
signal processing tools such as wavelets.
9
Soft Computing
Rough sets are suitable for handling different types of
uncertainty in data.
Neural networks and rough sets are widely used for classification and rule
generation.
10
Relevance?
11
Machine Learning
12
Machine Learning:
An Indispensable Tool in
Bioinformatics
13
Introduction
The development of high-throughput data acquisition
technologies in biological sciences in the last 5 to 10 years, together
with advances in digital storage, computing, and information
and communication technologies in the 1990s, has begun to
transform biology from a data-poor into a data-rich science.
15
Data mining techniques provide a robust means to evaluate the
generalization power of extracted patterns on unseen data, although
these must be further validated and interpreted by the domain
expert. 16
Machine Learning
Machine learning methods are essentially computer programs that
make use of sampled data or past experience information to
provide solutions to a given problem.
As new data and novel concept types are generated every day in
molecular biology research, it is essential to apply techniques able to fit
this fast-evolving nature - Machine learning can be adapted efficiently
to these changing environments. 17
Machine Learning (Cont.)
Machine learning is able to deal with the abundance of missing and
noisy data from many biological scenarios.
19
Machine Learning Applications
20
Machine Learning (Cont.)
Machine learning algorithms have been taxonomized in the following way:
• Supervised learning:
Starting from a database of training data that consists of pairs of
input cases and desired outputs, its goal is to construct a function
(or model) to accurately predict the target output of future cases
whose output value is unknown.
21
• Unsupervised learning/ Clustering:
Starting from a database of training data that consists of input
cases, its goal is to partition the training samples into subsets
(clusters) so that the data in each cluster show a high level of
proximity.
In contrast to supervised learning, the labels for the data are not
used or are not available in clustering.
• Semi-supervised learning:
Starting from a database of training data that combines both
labeled and unlabeled examples, the goal is to construct a model
able to accurately predict the target output of future cases for
which its output value is unknown.
Typically, this database contains a small amount of labeled data
together with a large amount of unlabeled data.
22
• Reinforcement learning:
These algorithms are aimed at finding a policy that maps states
of the world to actions.
The actions are chosen among the options that an agent ought to
take under those states, with the aim of maximizing some notion
of long-term reward.
Its main difference regarding the previous types of machine
learning techniques is that input–output pairs are not present
in a database, and its goal resides in online performance.
• Optimization:
The task of searching for an optimal solution in a space of
multiple possible solutions.
As the process of learning from data can be regarded as
searching for the model that best fits the data, optimization
methods can be considered an ingredient in modeling. 23
Supervised and unsupervised classification are the most
broadly applied machine learning types in most application
areas, including bioinformatics.
24
Supervised learning Algorithms
Averaged One- Dependence Ripple down rules, a knowledge
Estimators (AODE) acquisition methodology
Artificial neural network: Statistical classification : Hidden
Backpropagation Markov models
Bayesian statistics Symbolic machine learning algorithms
Case-based reasoning Sub-symbolic machine
Decision trees learning algorithms
Inductive logic programming Support vector machines
Gaussian process regression Random Forests
Learning automata Ensembles of classifiers
Minimum message length (decision Bootstrap aggregating (bagging)
trees, decision graphs, etc.) Boosting
Lazy learning Ordinal classification
Instance-based learning: Nearest Regression analysis
Neighbor Algorithm Information fuzzy networks (IFN)
Probably approximately correct
learning (PAC) learning
25
Fuzzy sets
We are continuously having to recognize
people,
objects,
handwriting,
voice, images, and other patterns,
using :
Distorted
unfamiliar,
incomplete,
occluded,
fuzzy, and
inconclusive data,
where a pattern should be allowed to have membership or
belongingness to more than one class. 26
Fuzzy sets
This is also very significant in medical diagnosis, where a
patient afflicted with a certain set of symptoms can be
simultaneously suffering from more than one disease.
Example?
27
Fuzzy control system
28
Overview
Fuzzy logic is widely used in machine learning.
Their ideas were adopted, and fuzzy systems were used to control
accelerating, braking, and stopping when the line opened in 1987.
30
History and Applications
31
Japanese consumer goods involving fuzzy systems
33
Japanese consumer goods involving fuzzy systems (Cont.)
34
Fuzzy systems in US & Europe
The system determines the optimum wash cycle for any load to obtain
the best results with the least amount of energy, detergent, and water.
It even adjusts for dried-on foods by tracking the last time the door
was opened, and estimates the number of dishes by the number of
times the door was opened.
36
Research and development is also continuing on
fuzzy applications in software, as opposed
to firmware, design, including fuzzy expert
systems and integration of fuzzy logic
with neural-network and so-called adaptive
"genetic" software systems, with the ultimate
goal of building "self-learning" fuzzy control
systems.
37
Fuzzy sets
The input variables in a fuzzy control system are in general
mapped into by sets of membership functions, known as "fuzzy
sets‖.
38
Fuzzy sets
Given "mappings" of input variables into membership functions
and truth values, the microcontroller then makes decisions for
what action to take based on a set of "rules", each of the form:
In this example, the two input variables are "brake temperature"
and "speed" that have values defined as fuzzy sets.
All the rules that apply are invoked, using the membership
functions and truth values obtained from the inputs, to
determine the result of the rule.
This rule uses the truth value of the "temperature" input, which is
some truth value of "cold", to generate a result in the fuzzy set for
the "heater" output, which is some value of "high".
This result is used with the results of other rules to finally generate
the crisp composite output.
Obviously, the greater the truth value of "cold", the higher the truth
value of "high", though this does not necessarily mean that the output
itself will be set to "high", since this is only one rule among many.
43
Fuzzy control in detail (Cont.)
In some cases, the membership functions can be modified by
"hedges" that are equivalent to adjectives.
There are several different ways to define the result of a rule, but
one of the most common and simplest is the "max-min"
inference method, in which the output membership function is
given the truth value generated by the premise.
45
Neural Networks
46
Human Brain
47
Human Brain
48
Neural Networks in Brain
49
50
51
Brain as an information processing system
Neural Network?
52
Introduction to Artificial Neural Networks
Usefulness and Capabilities
1. Non-linearity
2. Input-Output Mapping
- associations
- Auto-associations
3. Adaptivity
- “free parameters”
4. Evidential Response
- Decision with a measure of confidence
5. Fault Tolerance
- graceful degradation
6. VLSI implementability
7. Neurobiological analogy
53
Introduction
54
ANN use in medicine
56
ANN use in medicine
- interpreting ECGs,
- diagnosing dementia,
57
Neural networks
The ANNs consist of many connected neurons simulating a
brain at work.
A basic feature which distinguishes an ANN from an
algorithmic program is the ability to generalize the knowledge
of new data which was not presented during the learning
process.
Expert systems need to gather actual knowledge of its
designated area.
However, ANNs only need one training and show tolerance
for discontinuity, accidental disturbances or even defects in the
training data set.
This allows for usage of ANNs in solving problems which
cannot be solved by other means effectively. 58
Neural networks
These features and advantages are the reason why the area
of ANN’s application is very wide and includes for
example:
– Pattern recognition,
– Object classification,
– Medical diagnosis,
– Forecast of economical risk, market prices
changes, need for electrical power, etc.,
– Selection of employees
59
Biological neural networks
The human brain consists of around 1011 nerve cells called neurons
60
What is a neural network (NN)?
61
Description
62
Introduction
• Neural networks are a powerful technique to solve many real
world problems.
• They have the ability to learn from experience in order to
improve their performance and to adapt themselves to
changes in the environment.
• In addition to that they are able to deal with incomplete
information or noisy data and can be very effective
especially in situations where it is not possible to define the
rules or steps that lead to the solution of a problem.
64
Computation in the brain
• The brain's network of neurons forms a massively parallel
information processing system. This contrasts with conventional
computers, in which a single processor executes a single series of
instructions.
66
Introduction to ANNs
67
Introduction to ANNs (Cont.)
It consist of two modules: summation module Σ and
activation module F.
Roughly the summation module corresponds to biological
nucleus.
There algebra summation of weighted input signals is realised
and the output signal is generated.
where k is coefficient.
70
ADALINE (Cont.)
71
ADALINE - Definition
72
ADALINE - Learning Algorithm
Let us assume:
E = (d − o)
73
Madaline
- weather forecasting
74
Introduction to ANNs (Cont.)
75
Introduction to ANNs (Cont.)
However, functions which describe a non-linear profile of
biological neuron more precisely are:
A sigmoid function:
A tangensoid function:
77
Introduction to ANNs (Cont.)
Neurons in the multilayer ANNs are grouped into 3 different
types of layers: input, output, and hidden layer
- feed-forward and
- feedback networks.
82
Introduction to ANNs (Cont.)
In the feed-forward networks, signal can move in one
direction only and cannot move between neurons in the same
layer.
Signals can move in these loops until the proper state is achieved.
85
Training of the ANN (Cont.)
86
Training of the ANN (Cont.)
One of the best known learning algorithms is the Back-
Propagation Algorithm (BPA).
87
Back-Propagation Algorithm (BPA)
To teach ANN using BPA the following steps have to be carried
out for each pattern in the learning set:
2. Evaluate the output values ujmμ of each element for all layers
using the formula
88
Back-Propagation Algorithm (BPA) (Cont.)
89
Back-Propagation Algorithm (BPA) (Cont.)
6. Update the weights of all elements between output and hidden
layers and then between all hidden layers moving towards the
input layer.
90
Back-Propagation Algorithm (BPA) (Cont.)
Above steps have to be repeated until satisfactory minimum of
complete error function is achieved:
91
Back-Propagation Algorithm (BPA) (Cont.)
92
Back-Propagation Algorithm (BPA) (Cont.)
93
A Simple Artificial Neuron
•The basic computational element (model neuron) is often called a node or unit. It
receives input from some other units, or perhaps from an external source. Each
input has an associated weight w, which can be modified so as to model synaptic
learning. The unit computes some function f of the weighted sum of its inputs
• Clustering:
• Classification/Pattern recognition:
95
Applications:
• Function approximation:
• Prediction/Dynamical Systems:
96
Types of Neural Networks
Neural Network types can be classified based on following attributes:
• Applications
-Classification
-Clustering
-Function approximation
-Prediction
• Connection Type
- Static (feedforward)
- Dynamic (feedback)
• Topology
- Single layer
- Multilayer
- Recurrent
- Self-organized
• Learning Methods
- Supervised
- Unsupervised
97
The McCulloch-Pitts Model of Neuron
• The early model of an artificial neuron is introduced by Warren McCulloch
and Walter Pitts in 1943. The McCulloch-Pitts neural model is also known as
linear threshold gate. It is a neuron of a set of inputs I1,I2,I3…Im and one
output y . The linear threshold gate simply classifies the set of inputs into two
different classes. Thus the output y is binary. Such a function can be described
mathematically using these equations:
99
The McCulloch-Pitts Model of Neuron
Symbolic Illustration of Linear Threshold Gate
• The McCulloch-Pitts model of a neuron is simple yet has substantial computing potential.
It also has a precise mathematical definition. However, this model is so simplistic that it
only generates a binary output and also the weight and threshold values are fixed. The
neural computing algorithm has diverse features for various applications . Thus, we need to
obtain the neural model with more flexible computational features.
100
Artificial Neuron with Continuous
Characteristics
• Based on the McCulloch-Pitts model described previously, the
general form an artificial neuron can be described in two stages
shown in figure. In the first stage, the linear combination of inputs is
calculated. Each value of input array is associated with its weight
value, which is normally between 0 and 1. Also, the summation
function often takes an extra input value Theta with weight value
of 1 to represent threshold or bias of a neuron. The summation
function will be then performed as
•The signals generated by actual biological neurons are the action-potential spikes, and the
biological neurons are sending the signal in patterns of spikes rather than simple absence or
presence of single spike pulse. For example, the signal could be a continuous stream of
pulses with various frequencies. With this kind of observation, we should consider a signal to
be continuous with bounded range. The linear threshold function should be ``softened".
•One convenient form of such ``semi-linear" function is the logistic sigmoid function, or in
short, sigmoid function as shown in figure. As the input x tends to large positive value, the
output value y approaches to 1. Similarly, the output gets close to 0 as x goes negative.
However, the output value is neither close to 0 nor 1 near the threshold point.
102
Artificial Neuron with Continuous
Characteristics
• This function is expressed
mathematically as follows:
- infinity or + infinity ,
103
Support Vector Machines
104
Support Vector Machines
SVMs are a set of related supervised learning methods used for
classification and regression.
New examples are then mapped into that same space and
predicted to belong to a category based on which side of the gap
they fall on. 105
Support Vector Machines
106
Support Vector Machines
107
Support Vector Machines (Cont.)
108
Support Vector Machines (Cont.)
109
Support Vector Machines
110
Support Vector Machines
111
Support Vector Machines
New examples are then mapped into that same space and
predicted to belong to a category based on which side of the
gap they fall on.
112
Motivation
Classifying data is a common task in machine learning.
113
Motivation
So we choose the hyperplane so that the distance from it to the
nearest data point on each side is maximized.
114
Optimal hyperplane separating two classes
115
Classification with Large Margin
Whenever a dataset is linearly separable, i.e. there exists a
hyperplane that correctly classifies all data points, there exist
many such separating hyperplanes.
117
Soft Margin
In practice, data is often not linearly separable; and even if it is, a
greater margin can be achieved by allowing the classifier to
misclassify some points.
118
Theory and experimental results show that the
resulting larger margin will generally provide better
performance than the hard margin SVM.
119
The two points closest to the Here, those points move
hyperplane strongly affect its inside the margin, and the
orientation, leading to a hyperplane’s orientation is
hyperplane that comes close changed, leading to a much
to several other data points. larger margin for the rest of
the data.
120
Soft Margin
121
Non-linear classification
The original optimal hyperplane algorithm proposed by Vladimir
Vapnik in 1963 was a linear classifier.
122
Non-linear classification (Cont.)
123
Kernel Support Vector Machines
124
The Kernel trick
It comes from the fact that, if we first map our input data into a
higher-dimensional space, a linear algorithm operating in this
space will behave non-linearly in the original input space.
125
The Kernel trick
K(x,y) = <φ(x),φ(y)>
126
Kernel functions (Cont.)
We are now looking for solution in other space, but the problem
is linearly separable, so it is more effective, even if the problem
was linearly non-separable in the input space
127
Kernel functions (Cont.)
128
The Kernel trick
129
Kernel functions
130
Kernels: from Linear to Non-Linear
Classifiers
131
Kernels: from Linear to Non-Linear
Classifiers
There is a straightforward way of turning a linear classifier non-
linear, or making it applicable to non-vectorial data.
132
Kernels for Real-valued Data
134
Kernels for Real-valued Data
The degree of the polynomial kernel controls the flexibility of
the resulting classifier.
136
Standard Kernels
Linear
Polynomial
Hyperbolic tangent
137
Gaussian Kernel
138
Learning Algorithms
139
Example: Splice Site Recognition
It is a problem arising in computational gene finding and concerns
the recognition of splice sites that mark the boundaries between
exons and introns in eukaryotes.
141
Example: Splice Site Recognition
142
Performance of SVM
To evaluate the classifier performance, receiver operating
characteristic (ROC) curves are used, which show the true positive
rates (y-axis) over the full range of false positive rates (x-axis).
The area under the curve quantifies the quality of the classifier,
and a larger value indicates better performance.
Splice sites have quite strong consensus sequences, i.e. almost each
position in a small window around the splice site is representative
of the most frequently occurring nucleotide when many existing
sequences are compared in an alignment.
The sequence logo shows the region around the intron/ exon
boundary—the acceptor splice site.
144
Multi-class classification
– Pairwise classification
145
Multi-class classification (Cont.)
146
Multi-class classification (Cont.)
147
Multi-class classification (Cont.)
148
Evaluation and Comparison of the
Model Predictive Power (Cont.)
Receiver operating characteristic (ROC) curves are an
interesting tool for representing the accuracy of a classifier.
150
Genetic Algorithms
Genetic algorithms are based on evolutionary principles
wherein a particular function or definition that best fits the
constraints of an environment survives to the next generation,
and the other functions are eliminated.
151
Genetic Algorithm - Concept
152
Genetic Algorithm
• Inspired by natural evolution
• Population of individuals
– Individual is feasible solution to problem
• Each individual is characterized by a Fitness function
– Higher fitness is better solution
• Based on their fitness, parents are selected to reproduce offspring
for a new generation
– Fitter individuals have more chance to reproduce
– New generation has same size as old generation; old generation
dies
• Offspring has combination of properties of two parents
• If well designed, population will converge to optimal solution
153
Example of convergence
154
Genetic Algorithm
{
initialize population;
evaluate population;
while TerminationCriteriaNotSatisfied
{
select parents for reproduction;
perform recombination and mutation;
evaluate population;
}
}
155