You are on page 1of 18

Soft Computing

Assignment

BY
D.Mohanraj
20052336
Genetic Algorithms
Introduction

Genetic Algorithms (GAs) are adaptive heuristic search algorithm premised on the evolutionary ideas
of natural selection and genetic. The basic concept of GAs is designed to simulate processes in
natural system necessary for evolution, specifically those that follow the principles first laid down by
Charles Darwin of survival of the fittest. As such they represent an intelligent exploitation of a
random search within a defined search space to solve a problem.

First pioneered by John Holland in the 60s, Genetic Algorithms has been widely studied,
experimented and applied in many fields in engineering worlds. Not only does GAs provide an
alternative method to solving problem, it consistently outperforms other traditional methods in
most of the problems link. Many of the real world problems involved finding optimal parameters,
which might prove difficult for traditional methods but ideal for GAs. However, because of its
outstanding performance in optimisation, GAs has been wrongly regarded as a function optimiser. In
fact, there are many ways to view genetic algorithms. Perhaps most users come to GAs looking for a
problem solver, but this is a restrictive view

Herein, we will examine GAs as a number of different things:

a. GAs as problem solvers


b. GAs as challenging technical puzzle
c. GAs as basis for competent machine learning
d. GAs as computational model of innovation and creativity
e. GAs as computational model of other innovating systems
f. GAs as guiding philosophy

However, due to various constraints, we would only be looking at GAs as problem solvers and
competent machine learning here. We would also examine how GAs is applied to completely
different fields.

Many scientists have tried to create living programs. These programs do not merely simulate life but
try to exhibit the behaviours and characteristics of real organisms in an attempt to exist as a form of
life. Suggestions have been made that a life would eventually evolve into real life. Such suggestion
may sound absurd at the moment but certainly not implausible if technology continues to progress
at present rates. Therefore it is worth, in our opinion, taking a paragraph out to discuss how a life is
connected with GAs and see if such a prediction is farfetched and groundless.
Brief Overview

GAs was introduced as a computational analogy of adaptive systems. They are modelled loosely on
the principles of the evolution via natural selection, employing a population of individuals that
undergo selection in the presence of variation-inducing operators such as mutation and
recombination (crossover). A fitness function is used to evaluate individuals, and reproductive
success varies with fitness.

The Algorithms

1. Randomly generate an initial population M (0)

2. Compute and save the fitness u (m) for each individual m in the current population M (t)

3. Define selection probabilities p (m) for each individual m in M (t) so that p (m) is proportional to
u (m)

4. Generate M (t+1) by probabilistically selecting individuals from M (t) to produce offspring via
genetic operators

5. Repeat step 2 until satisfying solution is obtained.

The paradigm of GAs described above is usually the one applied to solving most of the problems
presented to GAs. Though it might not find the best solution. More often than not, it would come up
with a partially optimal solution.

Who can benefit from GA

Nearly everyone can gain benefits from Genetic Algorithms, once he can encode solutions of a given
problem to chromosomes in GA, and compare the relative performance (fitness) of solutions. An
effective GA representation and meaningful fitness evaluation are the keys of the success in GA
applications. The appeal of GAs comes from their simplicity and elegance as robust search
algorithms as well as from their power to discover good solutions rapidly for difficult high-
dimensional problems. GAs is useful and efficient when

a. The search space is large, complex or poorly understood.


b. Domain knowledge is scarce or expert knowledge is difficult to encode to narrow the search
space.
c. No mathematical analysis is available.
d. Traditional search methods fail.

The advantage of the GA approach is the ease with which it can handle arbitrary kinds of constraints
and objectives; all such things can be handled as weighted components of the fitness function,
making it easy to adapt the GA scheduler to the particular requirements of a very wide range of
possible overall objectives.
GAs has been used for problem-solving and for modelling. GAs is applied to many scientific,
engineering problems, in business and entertainment, including:

Optimization: GAs have been used in a wide variety of optimisation tasks, including
numerical optimisation, and combinatorial optimisation problems such as travelling salesman
problem (TSP), circuit design [Louis 1993] , job shop scheduling [Goldstein 1991] and video & sound
quality optimisation.

Automatic Programming: GAs has been used to evolve computer programs for specific
tasks, and to design other computational structures, for example, cellular automata and sorting
networks.

Machine and robot learning: GAs has been used for many machine- learning applications,
including classification and prediction, and protein structure prediction. GAs have also been used to
design neural networks, to evolve rules for learning classifier systems or symbolic production
systems, and to design and control robots.

Economic models: GAs has been used to model processes of innovation, the development of
bidding strategies, and the emergence of economic markets.

Immune system models: GAs has been used to model various aspects of the natural
immune system, including somatic mutation during an individual's lifetime and the discovery of
multi-gene families during evolutionary time.

Ecological models: GAs have been used to model ecological phenomena such as biological
arms races, host-parasite co-evolutions, symbiosis and resource flow in ecologies.

Population genetics models: GAs has been used to study questions in population genetics,
such as "under what conditions will a gene for recombination be evolutionarily viable?"

Interactions between evolution and learning: GAs has been used to study how individual
learning and species evolution affect one another.

Models of social systems: GAs has been used to study evolutionary aspects of social
systems, such as the evolution of cooperation [Chughtai 1995], the evolution of communication, and
trail-following behaviour in ants.

Applications of Genetic Algorithms

GA on optimisation and planning: Travelling Salesman Problem


The TSP is interesting not only from a theoretical point of view, many practical applications can be
modelled as a travelling salesman problem or as variants of it, for example, pen movement of a
plotter, drilling of printed circuit boards (PCB), real-world routing of school buses, airlines, delivery
trucks and postal carriers. Researchers have tracked TSPs to study bio molecular pathways, to route
a computer networks' parallel processing, to advance cryptography, to determine the order of
thousands of exposures needed in X-ray crystallography and to determine routes searching for
forest fires (which is a multiple-salesman problem partitioned into single TSPs). Therefore, there is a
tremendous need for algorithms.

In the last two decades an enormous progress has been made with respect to solving travelling
salesman problems to optimality which, of course, is the ultimate goal of every researcher. One of
landmarks in the search for optimal solutions is a 3038-city problem. This progress is only partly due
to the increasing hardware power of computers. Above all, it was made possible by the development
of mathematical theory and of efficient algorithms. Here, the GA approach is discussed.

There are strong relations between the constraints of the problem, the representation adopted and
the genetic operators that can be used with it. The goal of travelling Salesman Problem is to devise a
travel plan (a tour) which minimises the total distance travelled. TSP is NP-hard (NP stands for non-
deterministic polynomial time) - it is generally believed cannot be solved (exactly) in time
polynomial. The TSP is constrained:

a. The salesman can only be in a city at any time


b. Cities have to be visited once and only once.

When GAs applied to very large problems, they fail in two aspects:

a. They scale rather poorly (in terms of time complexity) as the number of cities increases.
b. The solution quality degrades rapidly.

Failure of Standard Genetic Algorithm


To use a standard GA, the following problems have to be solved:

a. A binary representation for tours is found such that it can be easily translated into a
chromosome.
b. An appropriate fitness function is designed, taking the constraints into account.

Non-permutation matrices represent unrealistic solutions, that is, the GA can generate some
chromosomes that do not represent valid solutions. This happens:

a. In the random initialisation step of the GA.


b. As a result of genetic operators (mutation and crossover).

Thus, permutation matrices are used. Two tours including the same cities in the same order but with
different starting points or different directions are represented by different matrices and hence by
different chromosomes, for example:

Tour (23541) = tour (12354)

A proper fitness function is obtained using penalty-function method to enforce the constraints.

However, the ordinary genetic operators generate too many invalid solutions, leading to poor
results. Alternative solutions to TSP require new representations (Position Dependent
Representations) and new genetic operators.
Evolutionary Divide and Conquer (EDAC)
This approach, EDAC [Valenzuela 1995], has potential for any search problem in which knowledge of
good solutions for sub problems can be exploited to improve the solution of the problem itself. The
idea is to use the Genetic Algorithm to explore the space of problem subdivisions rather than the
space of solutions themselves, and thus capitalise on the near linear scaling qualities generally
inherent in the divide and conquer approach.

The basic mechanisms for dissecting a TSP into sub problems, solving the sub problems and then
patching the sub tours together to form a global tour, have been obtained from the cellular
dissection algorithms of Richard Karp. Although solution quality tends to be rather poor, Karp`s
algorithms possess an attractively simple geometrical approach to dissection, and offer reasonable
guarantees of performance. Moreover, EDAC approach is intrinsically parallel.

The EDAC approach has lifted the application of GAs to TSP an order or magnitude larger in terms of
problem sizes than permutation representations. Experimental results demonstrate the successful
properties for EDAC on uniform random points and PCB problems in the range 500 - 5000 cities.

GA in Business and Their Supportive Role in Decision Making

Genetic Algorithms have been used to solve many different types of business problems in functional
areas such as finance, marketing, information systems, and production/ operations. Within these
functional areas, GAs has performed a variety of applications such as tactical asset allocation, job
scheduling, machine-part grouping, and computer network design.

Finance Applications
Models for tactical asset allocation and international equity strategies have been improved with the
use of GAs. They report an 82% improvement in cumulative portfolio value over a passive
benchmark model and a 48% improvement over a non-GA model designed to improve over the
passive benchmark.

Genetic algorithms are particularly well-suited for financial modelling applications for three reasons:

1. They are payoff driven. Payoffs can be improvements in predictive power or returns over a
benchmark. There is an excellent match between the tool and the problems addressed.

2. They are inherently quantitative, and well-suited to parameter optimisation (unlike most
symbolic machine learning techniques).

3. They are robust, allowing a wide variety of extensions and constraints that cannot be
accommodated in traditional methods."

Information Systems Applications


Distributed computer network topologies are designed by a GA, using three different objective
functions to optimise network reliability parameters, namely diameter, average distance, and
computer network reliability. The GA has successfully designed networks with 100 orders of nodes.
GA has also been used to determine file allocation for a distributed system. The objective is to
maximise the programs' abilities to reference the file s located on remote nodes. The problem is
solved with the following three different constraint sets:

1. There is exactly one copy of each file to be distributed.

2. There may be any number of copies of each file subject to a finite memory constraint at each
node.

3. The number of copies and the amount of memory are both limited.

Production/Operation Applications
Genetic Algorithm has been used to schedule jobs in a sequence dependent setup environment for a
minimal total tardiness. All jobs are scheduled on a single machine; each job has a processing time
and a due date. The setup time of each job is dependent upon the job which immediately precedes
it. The GA is able to find good, but not necessarily optimal schedules, fairly quickly.

GA is also used to schedule jobs in non-sequence dependent setup environment. The jobs are
scheduled on one machine with the objective of minimising the total generally weighted penalty for
earliness or tardiness from the jobs' due dates. However, this does not guarantee that it will
generate optimal solutions for all schedules.

GA is developed for solving the machine-component grouping problem required for cellular
manufacturing systems. GA provides a collection of satisfactory solutions for a two objective
environment (minimising cell load variation and minimising volume of inter cell movement), allowing
the decision maker to then select the best alternative.

Role in Decision Making


Applying the well established decision processing phase model of Simon (1960), Genetic Algorithms
appear to be very well suited for supporting the design and choice phases of decision making.

a. In solving a single objective problem, GA designs many solutions until no further


improvement (no increase in fitness) can be achieved or some predetermined numbers of
generations have evolved or when the allotted processing time is complete. The fit solution
in the final generation is the one that maximises or minimises the objective (fitness)
function; this solution can be thought of as the GA has recommended choice. Therefore with
single objective problems the user of GA is assisted in the choice phase of decision
processing.
b. When solving multi-objective problems, GA gives out many satisfactory solutions in terms of
the objectives, and then allows the decision maker to select the best alternative. Therefore
GAs assist with the design phase of decision processing with multi-objective problems.

GAs can be of great assistance for examining alternatives since they are designed to evaluate
existing potential solutions as well to generate new (and better) solutions for evaluation. Thus GAs
can improve the quality of decision making.
Learning Robot behaviour using Genetic Algorithms

Robot has become such a prominent tools that it has increasingly taken a more important role in
many different industries. As such, it has to operate with great efficiency and accuracy. This may not
sound very difficult if the environment in which the robot operates remain unchanged, since the
behaviours of the robot could be pre-programmed. However, if the environment is ever-changing, it
gets extremely difficult, if not impossible, for programmers to figure out every possible behaviours
of the robot. Applying robot in a changing environment is not only inevitable in modern technology,
but is also becoming more frequent. This has obviously led to the development of a learning robot.

The approach to learning behaviours, which lead the robot to its goal, described here reflects a
particular methodology for learning via simulation model. The motivation is that making mistakes on
real system can be costly and dangerous.

ON-LINE SYSTEM OFF-LINE SYSTEM

Target Rule Interpreter Simulation Rule Interpreter


Environment Environment

Learning Module
Test
Active
Behaviour
Behaviour

In addition, time constraints may limit the extent of learning in real world. Since learning requires
experimenting with behaviours that might occasionally produce undesirable results if applied to real
world. Therefore, as shown in the diagram, the current best behaviour can be place in the real, on-
line system, while learning continues in the off-line system.

Previous studies have shown that knowledge learned under simulation is robust and might be
applicable to the real world if the simulation is more general (add more noise and distortion). If this
is not possible, the differences between the real world and the simulation have to be identified.

GAs' Role
Genetic Algorithms are adaptive search techniques that can learn high performance knowledge
structures. The genetic algorithms' strength come from the implicitly parallel search of the solution
space that it performs via a population of candidate solutions and this population is manipulated in
the simulation. The candidate solutions represent every possible behaviour of the robot and based
on the overall performance of the candidates, each could be assigned a fitness value. Genetic
operators could then be applied to improve the performance of the population of behaviours. One
cycle of testing all of the competing behaviour is defined as a generation, and is repeated until a
good behaviour is evolved. The good behaviour is then applied to the real world. Also because of the
nature of GA, the initial knowledge does not have to be very good.
Conclusion and Future Work
The system described has been used to learn behaviours for controlling simulate autonomous
underwater vehicles, missile evasion, and other simulated tasks. Future work will continue
examining the process of building robotic system through evolution. We want to know how multiple
behaviours that will be required for a higher level task interact, and how multiple behaviours can be
evolved simultaneously. We are also examining additional ways to bias the learning both with initial
rule sets, and by modifying the rule set during evolution through human interaction. Other open
problems include how to evolve hierarchies of skills and how to enable the robot to evolve new
fitness functions as the need for new skill arises.

Genetic Algorithms for Object Localisation in a Complex Scene

In order to provide machines with the ability to interact in complex, real-world environments, and
sensory data must be presented to the machine. One such module dealing with sensory input is the
visual data processing module, also known as the computer vision module. A central task of this
computer vision module is to recognise objects from images of the environment.

There are two different parts to computer vision modules, namely segmentation and recognition.
Segmentation is the process of finding interested objects while recognition works to see if the
located object matches the predefined attributes. Since images cannot be recognised until they have
been located and separated from the background, it is of paramount importance that this vision
module is able to locate different objects of interest for different systems with great efficiency.

GA parameters
The task of locating a particular object of interest in a complex scene is quite simple when cast in the
framework of genetic algorithms. The brute force-force method for finding an object in a complex
scene is to examine all positions and sizes, with varying degrees of occlusion of the objects, to
determine whether the extracted sub image matches a rough notion of what is being sought. This
method is immediately dismissed as it is far too computational expensive to achieve. The use of
genetic methodology, however, can raise the brute-force setup to an elegant solution to this
complex problem. Since the GA approach does well in very large search spaces by working only with
a sample available population, the computational limitation of the brute-force method using full
search space enumeration does not apply.

Conclusion and Future Work


It has been shown that the genetic algorithm perform better in finding areas of interest even in a
complex, real-world scene. Genetic Algorithms are adaptive to their environments, and as such this
type of method is appealing to the vision community who must often work in a changing
environment. However, several improvements must be made in order that GAs could be more
generally applicable. Grey coding the field would greatly improve the mutation operation while
combing segmentation with recognition so that the interested object could be evaluated at once.
Finally, timing improvement could be done by utilising the implicit parallelization of multiple
independent generations evolving at the same time.

Conclusion

If the conception of a computer algorithms being based on the evolutionary of organism is


surprising, the extensiveness with which this algorithms is applied in so many areas is no less than
astonishing. These applications, be they commercial, educational and scientific, are increasingly
dependent on this algorithms, the Genetic Algorithms. Its usefulness and gracefulness of solving
problems has made it the more favourite choice among the traditional methods, namely gradient
search, random search and others. GAs is very helpful when the developer does not have precise
domain expertise, because GAs possesses the ability to explore and learn from their domain.

In this report, we have placed more emphasis in explaining the use of GAs in many areas of
engineering and commerce. We believe that, through working out these interesting examples, one
could grasp the idea of GAs with greater ease. We have also discuss the uncertainties about whether
computer generated life could exist as real life form. The discussion is far from conclusive and,
whether artificial life will become real life, will remain to be seen.

In future, we would witness some developments of variants of GAs to tailor for some very specific
tasks. This might defy the very principle of GAs that it is ignorant of the problem domain when used
to solve problem. But we would realize that this practice could make GAs even more powerful.
Support Vector Machines
Introduction to Support Vector Machines

A support vector machine (SVM) is a supervised learning technique from the field of machine
learning applicable to both classification and regression.

Rooted in the Statistical Learning Theory developed by Vladimir Vapnik and co-workers at AT&T Bell
Laboratories in 1995, SVMs are based on the principle of Structural Risk Minimization.

1. Non-linearly map the input space into a very high dimensional feature space (the “kernel trick”).

a. In the case of classification, construct an optimal separating hyper plane in this space (a
maximal margin classifier); or
b. In the case of regression, perform linear regression in this space, but without penalising
small errors. Sewell (2005)

"The support vector machine (SVM) is a universal constructive learning procedure based on the
statistical learning theory (Vapnik, 1995)."Cherkassky and Mulier (1998)

"The support-vector network is a new learning machine for two-group classification problems. The
machine conceptually implements the following idea: input vectors are non-linearly mapped to a
very high-dimensional feature space. In this feature space a linear decision surface is
constructed."Cortes and Vapnik (1995)

"Support Vector Machines (SVM) is learning systems that use a hypothesis space of linear functions
in a high dimensional feature space, trained with a learning algorithm from optimisation theory that
implements a learning bias derived from statistical learning theory."Cristianini and Shawe-Taylor
(2000)

"These techniques are then generalized to what is known as the support vector machine, which
produces nonlinear boundaries by constructing a linear boundary in a large, transformed version of
the feature space."Hastie, Tibshirani and Friedman (2001)

"Support Vector Machines have been developed recently [34]. Originally it was worked out for linear
two-class classification with margin, where margin means the minimal distance from the separating
hyper plane to the closest data points. SVM learning machine seeks for an optimal separating hyper
plane, where the margin is maximal. An important and unique feature of this approach is that the
solution is based only on those data points, which are at the margin. These points are called support
vectors. The linear SVM can be extended to nonlinear one when first the problem is transformed
into a feature space using a set of nonlinear basis functions. In the feature space - which can be very
high dimensional - the data points can be separated linearly. An important advantage of the SVM is
that it is not necessary to implement this transformation and to determine the separating hyper
plane in the possibly very-high dimensional feature space, instead a kernel representation can be
used, where the solution is written as a weighted sum of the values of certain kernel function
evaluated at the support vectors."Horváth (2003) in Suykens et al.

"With their introduction in 1995, Support Vector Machines (SVMs) marked the beginning of a new
era in the learning from examples paradigm. Rooted in the Statistical Learning Theory developed by
Vladimir Vapnik at AT&T, SVMs quickly gained attention from the pattern recognition community
due to a number of theoretical and computational merits. These include, for example, the simple
geometrical interpretation of the margin, uniqueness of the solution, statistical robustness of the
loss function, modularity of the kernel function, and over fit control through the choice of a single
regularization parameter."Lee and Verri (2002)

"Support Vector machines (SVM) are a new statistical learning technique that can be seen as a new
method for training classifiers based on polynomial functions, radial basis functions, neural
networks, splines or other functions. Support Vector machines use a hyper-linear separating plane to
create a classifier. For problems that cannot be linearly separated in the input space, this machine
offers a possibility to find a solution by making a non-linear transformation of the original input
space into a high dimensional feature space, where an optimal separating hyper plane can be found.
Those separating planes are optimal, which means that a maximal margin classifier with respect to
the training data set can be obtained. Rychetsky (2001)

"A learning machine that is based on the principle of Structural Risk Minimization described above is
the Support Vector Machine (SVM). The SVM has been developed by Vapnik and co-workers at AT&T
Bell Laboratories [9, 115, 116, and 19]." Rychetsky (2001)

"The support vector network implements the following idea [21]: Map the input vectors into a very
high dimensional feature space Z through some non-linear mapping chosen a priori. Then construct
an optimal separating hyper plane in this space." Vapnik (2003) in Suykens et al.

"The support vector machine (SVM) is a supervised learning method that generates input-output
mapping functions from a set of labelled training data." Wang (2005)

"Support vector machines (SVMs) are a set of related supervised learning methods, applicable to
both classification and regression." Wikipedia (2004)

Support Vector Machines vs. Artificial Neural Networks

The development of ANNs followed a heuristic path, with applications and extensive
experimentation preceding theory. In contrast, the development of SVMs involved sound theory
first, then implementation and experiments. A significant advantage of SVMs is that whilst ANNs can
suffer from multiple local minima, the solution to an SVM is global and unique. Two more
advantages of SVMs are that that have a simple geometric interpretation and give a sparse solution.
Unlike ANNs, the computational complexity of SVMs does not depend on the dimensionality of the
input space. ANNs use empirical risk minimization, whilst SVMs use structural risk minimization. The
reason that SVMs often outperform ANNs in practice is that they deal with the biggest problem with
ANNs, SVMs are less prone to over fitting.
"They differ radically from comparable approaches such as neural networks: SVM training always
finds a global minimum, and their simple geometric interpretation provides fertile ground for further
investigation." Burgess (1998)

"Most often Gaussian kernels are used, when the resulted SVM corresponds to an RBF network with
Gaussian radial basis functions. As the SVM approach “automatically” solves the network complexity
problem, the size of the hidden layer is obtained as the result of the QP procedure. Hidden neurons
and support vectors correspond to each other, so the centre problems of the RBF network are also
solved, as the support vectors serve as the basis function centres."
Horváth (2003) in Suykens et al.

"In problems when linear decision hyper planes are no longer feasible (section 2.4.3), an input space
is mapped into a feature space (the hidden layer in NN models), resulting in a nonlinear classifier."
Kecman p 149

Formalization
We are given some training data, a set of points of the form

Where the ci is either 1 or −1, indicating the class to which the point belongs. Each is a p-
dimensional real vector. We want to give the maximum-margin hyper plane which divides the points
having ci = 1 from those having ci = − 1. Any hyper plane can be written as the set of points
satisfying

Maximum-margin hyperplane and margins for a SVM trained with samples from two classes.
Samples on the margin are called the support vectors.

Where denotes the dot product. The vector is a normal vector: it is perpendicular to the

hyperplane. The parameter determines the offset of the hyperplane from the origin along the
normal vector .
We want to choose the and b to maximize the margin, or distance between the parallel
hyperplanes that are as far apart as possible while still separating the data. These hyperplanes can
be described by the equations

and

Note that if the training data are linearly separable, we can select the two hyperplanes of the margin
in a way that there are no points between them and then try to maximize their distance. By using

geometry, we find the distance between these two hyperplanes is , so we want to minimize
. As we also have to prevent data points falling into the margin, we add the following
constraint: for each i either

of the first class or

of the second.

This can be rewritten as:

We can put this together to get the optimization problem:

Minimize (in )

subject to (for any )

Primal form
The optimization problem presented in the preceding section is difficult to solve because it depends
on ||w||, the norm of w, which involves a square root. Fortunately it is possible to alter the

equation by substituting ||w|| with without changing the solution (the minimum of the
original and the modified equation have the same w and b). This is a quadratic programming (QP)
optimization problem. More clearly:

Minimize (in )
Subject to (for any )

The factor of 1/2 is used for mathematical convenience. This problem can now be solved by standard
quadratic programming techniques and programs.

Dual form
Writing the classification rule in its unconstrained dual form reveals that the maximum margin hyper
plane and therefore the classification task is only a function of the support vectors, the training data
that lie on the margin. The dual of the SVM can be shown to be the following optimization problem:

Maximize (in αi )

Subject to (for any )

And

The α terms constitute a dual representation for the weight vector in terms of the training set:

Biased and unbiased hyper planes


For simplicity reasons, sometimes it is required that the hyper plane passes through the origin of the
coordinate system. Such hyper planes are called unbiased, whereas general hyper planes not
necessarily passing through the origin are called biased. An unbiased hyper plane can be enforced by
setting b = 0 in the primal optimization problem. The corresponding dual is identical to the dual
given above without the equality constraint

Transductive support vector machines


Transductive support vector machines extend SVMs in that they also take into account structural
properties (e.g. correlational structures) of the data set to be classified. Here, in addition to the
training set , the learner is also given a set
Of test examples to be classified. Formally, a Transductive support vector machine is defined by the
following primal optimization problem:

Minimize (in )

Subject to (for any and any )

and

Transductive support vector machines have been introduced by Vladimir Vapnik in 1998.

Properties

SVMs belong to a family of generalized linear classifiers. They can also be considered a special case
of Tikhonov regularization. A special property is that they simultaneously minimize the empirical
classification error and maximize the geometric margin; hence they are also known as maximum
margin classifiers.

Extensions to the linear SVM

Soft margin
In 1995, Corinna Cortes and Vladimir Vapnik suggested a modified maximum margin idea that allows
for mislabelled examples. If there exists no hyper plane that can split the "yes" and "no" examples,
the Soft Margin method will choose a hyper plane that splits the examples as cleanly as possible,
while still maximizing the distance to the nearest cleanly split examples. The method introduces
slack variables, ξi, which measure the degree of misclassification of the datum xi

The objective function is then increased by a function which penalizes non-zero ξi, and the
optimization becomes a trade off between a large margin, and a small error penalty. If the penalty
function is linear, the equation (3) now transforms to
This constraint in along with the objective of minimizing can be solved using Lagrange
multipliers. The key advantage of a linear penalty function is that the slack variables vanish from the
dual problem, with the constant C appearing only as an additional constraint on the Lagrange
multipliers. Non-linear penalty functions have been used, particularly to reduce the effect of outliers
on the classifier, but unless care is taken, the problem becomes non-convex, and thus it is
considerably more difficult to find a global solution.

Non-linear classification
The original optimal hyper plane algorithm proposed by Vladimir Vapnik in 1963 was a linear
classifier. However, in 1992, Bernhard Boser, Isabelle Guyon and Vapnik suggested a way to create
non-linear classifiers by applying the kernel trick (originally proposed by Aizerman et al...) to
maximum-margin hyper planes. The resulting algorithm is formally similar, except that every dot
product is replaced by a non-linear kernel function. This allows the algorithm to fit the maximum-
margin hyper plane in the transformed feature space. The transformation may be non-linear and the
transformed space high dimensional; thus though the classifier is a hyper plane in the high-
dimensional feature space it may be non-linear in the original input space.

If the kernel used is a Gaussian radial basis function, the corresponding feature space is a Hilbert
space of infinite dimension. Maximum margin classifiers are well regularized, so the infinite
dimension does not spoil the results. Some common kernels include,

 Polynomial (homogeneous):

 Polynomial (inhomogeneous):

 Radial Basis Function: , for γ > 0

 Gaussian Radial basis function:

 Hyperbolic tangent: , for some (not every) κ > 0 and c


<0

Multiclass SVM

Multiclass SVM aims to assign labels to instances by using support vector machines, where the labels
are drawn from a finite set of several elements. The dominating approach for doing so is to reduce
the single multiclass problem into multiple binary problems. Each of the problems yields a binary
classifier, which is assumed to produce an output function that gives relatively large values for
examples from the positive class and relatively small values for examples belonging to the negative
class. Two common methods to build such binary classifiers are where each classifier distinguishes
between (i) one of the labels to the rest (one-versus-all) or (ii) between every pair of classes (one-
versus-one). Classification of new instances for one-versus-all case is done by a winner-takes-all
strategy, in which the classifier with the highest output function assigns the class. The classification
of one-versus-one case is done by a max-wins voting strategy, in which every classifier assigns the
instance to one of the two classes, then the vote for the assigned class is increased by one vote, and
finally the class with most votes determines the instance classification.

Structured SVM

Support vector machines have been generalized to Structured SVM, where the label space is
structured and of possibly infinite size.

Regression

A version of a SVM for regression was proposed in 1996 by Vladimir Vapnik, Harris Drucker, Chris
Burges, Linda Kaufman and Alex Smola. This method is called support vector regression (SVR). The
model produced by support vector classification (as described above) only depends on a subset of
the training data, because the cost function for building the model does not care about training
points that lie beyond the margin. Analogously, the model produced by SVR only depends on a
subset of the training data, because the cost function for building the model ignores any training
data that are close (within a threshold ε) to the model prediction.

Implementation

The parameters of the maximum-margin hyper plane are derived by solving the optimization. There
exist several specialized algorithms for quickly solving the QP problem that arises from SVMs, mostly
reliant on heuristics for breaking the problem down into smaller, more-manageable chunks. A
common method for solving the QP problem is the Platt's Sequential Minimal Optimization (SMO)
algorithm, which breaks the problem down into 2-dimensional sub-problems that may be solved
analytically, eliminating the need for a numerical optimization algorithm such as conjugate gradient
methods.

Another approach is to use an interior point method that uses Newton-like iterations to find a
solution of the Karush-Kuhn-Tucker conditions of the primal and dual problems. Instead of solving a
sequence of broken down problems, this approach directly solves the problem as a whole. To avoid
solving a linear system involving the large kernel matrix, a row rank approximation to the matrix is
often used to use the kernel trick.

You might also like