OA Notes

CS3030 – OPTIMIZATION ALGORITHMS
UNIT 1 Introduction to optimization algorithms
Part A (2 Marks)
1. What do you mean by optimization algorithm and give example?

An optimization algorithm is a procedure which is executed iteratively by comparing various
solutions till an optimum or a satisfactory solution is found. With the advent of computers, optimization
has become a part of computer-aided design activities.
Five steps in solving optimization problems

❖ visualize the problem
❖ define the problem
❖ write an equation for it
❖ find the minimum or maximum for the problem (usually the derivatives or end-points)
❖ answer the question
Example: Optimization algorithm in deep learning - Finds the value of the parameters(weights)
that minimize the error when mapping inputs to outputs.
2. What are the types of optimization and give examples of each optimization method?
Continuous Optimization versus Discrete Optimization
Unconstrained Optimization versus Constrained Optimization
Deterministic Optimization versus Stochastic Optimization
3. Distinguish between continuous optimization and Discrete optimization.
Continuous optimization – the variables used in the objective function can assume real values, e.g.,
values from intervals of the real line. Analytical optimization method gradient descent, linear and non-
linear optimization…
Digital (discrete) optimization – the variables used in the mathematical program are restricted to
assume only discrete values, such as the integers.Combinatorial optimization integer programming.
4. What are the additional based functions in Unconstrained optimization?
In test problems such as Rosenbrock’s function, Wood’s function, quadratic function, and so forth
are taken, on which different solution methods will be tested. The performance of each method is compared
in terms of the computational time.Additional test functions are,
• Rosenbrock’s function
• Wood’s function
• Quadratic function
• Nonlinear function
5. Define Multi-Objective optimization.
The multiobjective optimization problem (also known as multiobjective programming problem)
is a branch of mathematics used in multiple criteria decision-making, which deals with optimization
problems involving two or more objective functions to be optimized simultaneously.
5. What are the solution techniques in Constrained optimization?
7. Explain about Multivariable optimization

Absolute maximum/absolute minimum (also called global max/min): Specify a
region R contained in the domain of the function f. If the value at (a, b) is bigger than or equal to the value
at any other point in R, then f(a, b) is called the global maximum.
8. State Pareto Analysis
The Pareto Principle states that 80 percent of a project's benefit comes from 20
percent of the work. Or, conversely, that 80 percent of problems can be traced back to 20 percent of
causes. Pareto Analysis identifies the problem areas or tasks that will have the biggest payoff
9. Define Repair Algorithms
Repair Algorithms is a constraint handling technique that consists in repairing
infeasible individuals of the population to make them approaching the feasible region.
10. What are penalty function approaches?
Penalty function methods approximate a constrained problem by an
unconstrained problem structured such that minimization favors satisfaction of the constraints. The
general technique is to add to the objective function a term that produces a high cost for violation of
constraints.
Part B (16 Marks)
1. Explain a short note about Unconstrained and Constrained optimization with an example.
Unconstrained optimization
Optimization problems are constrained, and unconstrained optimization problems are few. One example
of an unconstrained optimization problem is data fitting, where one fits a curve on the measured data. The
solution methods for unconstrained optimization problems can be broadly classified into gradient-based
and non–gradient-based search methods. As the name suggests, gradient-based methods require gradient
information in determining the search direction. The gradient-based methods discussed are the steepest
descent, Davidon–Fletcher– Powell (DFP), Broyden–Fletcher–Goldfarb–Shanno (BFGS), Newton, and
Levenberg–Marquardt methods. The search direction computed by these methods uses the gradient
information, Hessian information, or a combination of these two. Some methods also make an
approximation of the Hessian matrix. Once the search direction is identified, one needs to evaluate how
much to move in that direction so as to minimize the function. This is a one-dimensional problem.
Unidirectional Search
The unidirectional search refers to minimizing the value of a multivariable function along a specified
direction. For example, if xi is the initial starting point of the design variables for minimizing a multivariable
function and S is the search direction, then we need to determine a scalar quantity α such that the function
f(α) = xi + αSi
is minimized. The value of α at which this function reaches a minimum is given by α*. This is a one-
dimensional optimization problem and we can use the golden section technique to minimize this function.
The golden section method is modified to handle multivariable functions
Test Problem
Let us define a spring system as a test problem on which we will apply multivariable optimization
algorithms such as the steepest descent, DFP, BFG, Newton, and Levenberg–Marquardt methods. Consider
two springs of unit length and with stiffness k1 and k2, joined at the origin. The other two ends of the
springs are fixed on a wall .On applying a force, the spring system will deflect to an equilibrium position,
which we are interested in determining. The potential of the spring system is given by
where is the force applied at the origin due to which it moves to a position (x1, x2)
Solution Techniques
solution techniques for multivariable, unconstrained optimization problems can be grouped into gradient-
and non–gradient-based methods. Gradient-based methods require derivative information of the function
in constituting a search. The first and second derivatives can be computed using the central difference
formula as given below.
The efficiency of solution methods can be gauged by three criteria:

• Number of function evaluations.
• Computational time.
• Rate of convergence. By this we mean how fast the sequence xi , xi+1,… converges to x*. The rate of
convergence is given by the parameter n in the equation.
Steepest Descent Method

The search direction Si that reduces the function value is a descent direction. It was discussed earlier that
along the gradient direction, there is the maximum change in the function value. Thus, along the negative
gradient direction, the function value decreases the most. The negative gradient direction is called the
steepest descent direction. That is,
In successive iterations, the design variables can be updated using the equation,
where α is a positive scalar parameter that can be determined using the line search algorithm such as the
golden section method.
The steepest descent method ensures a reduction in the function value at every iteration. If the starting
point is far away from the minimum, the gradient will be higher and the function reduction will be
maximized in each iteration. Because the gradient value of the function changes and decreases to a small
value near the optimum, the function reduction is uneven and the method becomes sluggish (slow
convergence) near the minimum. The method can therefore be utilized as a starter for other gradient-based
algorithms.
Advantage:
The steepest descent method is that it reaches closer to the minimum of the function in a few iterations
even when the starting guess is far away from the optimum.
Newton’s Method
The search direction in this method is based on the first and second derivative information and is given by
where [H] is the Hessian matrix. If this matrix is positive definite, then Si will be a descent direction. Though
the Newton’s method is known for converging in a single iteration for a quadratic function, seldom do we
find functions in practical problems that are quadratic. However, Newton’s method is often used as a hybrid
method in conjunction with other methods.
Advantage:
Newton’s method shows a faster convergence if the starting guess is close to the minimum point.
Disadvantage:
Newton’s method may not converge if the starting point is far away from the optimum point.
Modified Newton’s Method

The method is similar to Newton’s method with the modification that a unidirectional search is performed
in the search direction Si of the Newton method. For the same starting point, the modified Newton’s method
converges to the minimum point in just six iterations as compared to Newton’s method, which converges
in ten iterations.
Levenberg–Marquardt Method
The Levenberg–Marquardt method is a kind of hybrid method that combines the strength of both the
steepest descent and Newton’s methods. The search direction in this method is given by
where I is an identity matrix and λ is a scalar that is set to a high value at the start of the algorithm. The
value of λ is altered during every iteration depending on whether the function value is decreasing or not. If
the function value decreases in the iteration, λ it decreases by a factor (less weightage on steepest descent
direction). On the other hand, if the function value increases in the iteration, λ it increases by a factor (more
weightage on steepest descent direction).
Fletcher–Reeves Conjugate Gradient Method

The Levenberg–Marquardt method uses the strengths of both steepest descent and Newton’s method for
accelerating the convergence to reach the minimum of a function. The method is a second-order method,
as it requires computation of the Hessian matrix. On the other hand, the conjugate gradient method is a
first-order method, but shows the property of quadratic convergence and thus has a significant advantage
over the second-order methods. Two directions, S1 and S2, are said to be conjugate if
where H is a symmetric matrix, then, S1 and S2 are conjugate directions.

DFP Method
In the DFP method, the inverse of the Hessian is approximated by a matrix [A] and the search direction is
given by
The information stored in the matrix [A] is called as the metric and because it changes with every iteration,
the DFP method is known as the variable metric method. Because this method uses first-order derivatives
and has the property of quadratic convergence, it is referred to as a quasi-Newton method.
BFGS Method
In the BFGS method, the Hessian is approximated using the variable metric matrix [A] given by the equation
It is important to note that whereas the matrix [A] converges to the inverse of the Hessian in the DFP
method, the matrix [A] converges to the Hessian itself in the BFGS method. As the BFGS method needs fewer
restarts as compared to the DFP method, it is more popular than the DFP method.
Powell Method
The Powell method is a direct search method (no gradient computation is required) with the property of
quadratic convergence. Previous search directions are stored in this method and they form a basis for the
new search direction. The method makes a series of unidirectional searches along these search directions.
The last search direction replaces the first one in the new iteration and the process is continued until the
function value shows no improvement.
Nelder–Mead Algorithm
The Nelder–Mead algorithm is a direct search method and uses function information alone (no gradient
computation is required) to move from one iteration to another. The objective function is computed at each
vertex of the simplex. Using this information, the simplex is moved in the search space. Again, the objective
function is computed at each vertex of the simplex. The process of moving the simplex is continued until
the optimum value of the function is reached. Three basic operations are required to move the simplex in
the search space: reflection, contraction, and expansion The centroid point xc is computed using all the
points but with the exclusion of xworst. That is
The reflected point is computed as
where α is a predefined constant. Typically, α = 1 is taken in the simulations.
Additional Test Functions
Additional test problems such as Rosenbrock’s function, Wood’s function, quadratic function, and so forth
are taken, on which different solution methods will be tested. The performance of each method is compared
in terms of the computational time.
Rosenbrock Function
The two-variable function is given by
The minimum of this “banana valley” function is zero and occurs at (1, 1)
Quadratic Function
The minimum of this function is zero and occurs at (1, 2).
Nonlinear Function
Wood’s Function
Constrained optimization
all optimization problems carry constraints, The supply of a product is constrained by the capacity of a
machine. The trajectory of a rocket is constrained by the final target as well as the maximum aerodynamic
load it can carry. The range of an aircraft is constrained by its payload, fuel capacity, and its aerodynamic
characteristics.
In constrained optimization problems, the feasible region gets restricted because of the presence of
constraints. This is more challenging because for a multivariable problem with several nonlinear
constraints, arriving at any feasible point itself is a daunting task. The constrained optimization problem
can be mathematically stated as,
The functions f, gi , and hj are all differentiable. The design variables are bounded by x l and xu. The
constraints gi are called as inequality constraints and hj are called equality constraints.
Optimality Conditions
Let us define the Lagrange function for the constrained optimization problem with the equality and
inequality constraints
Solution Techniques
For a simple optimization problem (say, with two variables) with one equality constraint, the simplest
approach would be to use a variable substitution method. In this method, one variable is written in the form
of another variable using the equality constraint. Then it is substituted in the objective function to make it
an unconstrained optimization problem that is easier to solve. For instance, consider the optimization
problem,
Penalty Function Method
The motivation of the penalty function method is to solve the constrained optimization problem using
algorithms for unconstrained problems. As the name suggests, the algorithm penalizes the objective
function in case constraints are violated. The modified objective function with penalty terms is written as,
where rk (>0) is a penalty parameter and the function
In case constraints are satisfied (gi (x) ≤ 0), 〈gi (x)〉 will be zero and there will be no penalty on the objective
function. In case constraints are violated (gi (x) ≥ 0), 〈gi (x)〉 will be a positive value resulting in a penalty
on the objective function. The penalty will be higher for the higher infeasibility of the constraints. The
function F(x) can be optimized using the algorithms for unconstrained problems. The penalty function
method of this form is called the exterior penalty function method.
The main advantages of the penalty function method are
• It can be started from an infeasible point.

• Unconstrained optimization methods can be directly used.
The main disadvantages of the penalty function method are
• The function becomes ill-conditioned as the value of the penalty terms is increased. Owing to abrupt
changes in the function value, the gradient value may become large and the algorithm may show
divergence.
• As this method does not satisfy the constraints exactly, it is not suitable for optimization problems where
feasibility must be ensured in all iterations.
Augmented Lagrange Multiplier Method
As the name suggests, the augmented Lagrange multipliers (ALM) method combines both Lagrange
multipliers and penalty function methods. For an optimization problem with both equality and inequality
constraints, the augmented Lagrangian function is given by
where λj and βi are the Lagrange multipliers, rk is a penalty parameter fixed at the start of the iteration.
Sequential Quadratic Programming

Sequential quadratic programming (SQP) is one of the most effective methods for nonlinearly
constrained optimization problems. The method generates steps by solving quadratic subproblems; it can
be used both in line search and trust-region frameworks. SQP is appropriate for small and large problems
and it is well-suited to solving problems with significant nonlinearities.
The SQP method can be viewed as a generalization of Newton's method for unconstrained optimization in
that it finds a step away from the current point by minimizing a quadratic model of the problem. A number
of software packages (NPSOL, NLPQL, OPSYC, OPTIMA, MATLAB, and SQP) are based on this approach. In
its purest form, the SQP algorithm replaces the objective function with the quadratic approximation.
and replaces the constraint functions by linear approximations.
Method of Feasible Directions

Some optimization problems require constraints to be satisfied in every iteration. For example, consider
the shape optimization problem of a body whose drag is to be minimized. The drag force is computed using
computational fluid dynamics (CFD) analysis for a given shape of the body. It is obvious that CFD analysis
will provide reliable results if only there is a meaningful shape of the body. This can be achieved by not only
giving a proper definition of the constraints but also satisfying them at each iteration. Consider a
constrained optimization problem
A direction S is feasible at point x if
If the objective function also has to be reduced, then the following inequality must also be satisfied:
Zoutendijk’s method of feasible directions and Rosen’s gradient projection method are two popular
methods of feasible directions.
2. Explain a brief note about Gradient-based methods and Direct Search methods.
Refer to question 1 answer.
3. Explain a short note about Combinatorial Optimization with an example?
Combinatorial optimization is an emerging field at the forefront of combinatorics and theoretical

computer science that aims to use combinatorial techniques to solve discrete optimization problems.
A discrete optimization problem seeks to determine the best possible solution from a finite set of
Possibilities.
From a computer science perspective, combinatorial optimization seeks to improve an algorithm by using
mathematical methods either to reduce the size of the set of possible solutions or to makethe search itself
faster
From a computer science perspective, combinatorial optimization seeks to improve an algorithm by using
mathematical methods either to reduce the size of the set of possible solutions or to make the search itself
faster
Example:
Combinatorial optimization refers primarily to the methods used to approach such problems and, for the
most part, does not provide guidelines on how to turn real-world problems
into abstract mathematical questions, or vice versa.
It is a sub field of mathematical optimization that consists of finding an optimal object from a finite set of
objects, where the set of feasible solutions is discrete or can be reduced to a
discrete set
Typical combinatorial optimization problems are the travelling salesman problem ("TSP"), the minimum
spanning tree problem ("MST"), and the knapsack problem.
pplications of Combinatorial Optimization
Applications of Combinatorial Optimization
Logistics
Deciding which taxis in a fleet to route to pick up fares
Determining the optimal way to deliver packages
Allocating jobs to people optimally
Designing water distribution networks Earth science problems (e.g. reservoir flow-rates)8
4.With any example explain the significance and steps involved in Combinatorial Optimization.
Summarizes the general procedure of Branch-and-bound for integer -programming maximization
with flow chart.
With any example explain the significance and steps involved in Combinatorial Optimization (Refer 3
Answer)
General procedure of Branch-and-bound for integer -programming maximization with flow chart
5. Illustrate the road map for MOOP in detail.
Road Map MOOP

Multiobjective Optimization:
Absolute maximum/absolute minimum (also called global max/min): Specify a region R contained in the
domain of the function f. If the value at (a, b) is bigger than or equal to the value at any other point in R,
then f(a, b) is called the global maximum.
The weighted sum approach:
The weighted sum approach we scale our set of goals into a single goal by multiplying each of our
objectives by a user-supplied weight. This method is one of the most widely used approaches. A question
that comes to mind when doing the weighted sum approach is working out what weights to assign to each
objective.
e-Contraints Methods:
The ε-constraint method is one of the classical methods that is used to handle multi-objective
optimization problems (MOPs) by converting a MOP into single objective optimization problems (SOPs).
This method depends on the epsilon value, which represents a boundary of the objective.
Goal Progrmming:
Goal programming is a branch of multiobjective optimization, which in turn is a branch of multi-criteria

decision analysis (MCDA). It can be thought of as an extension or generalisation of linear programming to
handle multiple, normally conflicting objective measures.
Utility Function Method:
In this method a utility function U is defined that combines all the objective functions of the multi
objective optimization problem. The utility function then becomes the objective function of the
optimization problem that can be solved along with the constraints.
UNIT 2 – Approximations
Part A(2 Marks)
1. List out the properties of linear programming model.

2. What is the meaning of the slack variable?
3. Define unimodal function with a neat sketch.

4. What is a branch-and-bound method?
5. Define Monomial Function.
6. Define Stochastic Programming.
Stochastic or probabilistic programming deals with situations where some or all of the parameters of the
optimization problem are described by stochastic (or random or probabilistic) variables rather than by
deterministic quantities. The sources of random variables may be several, depending on the nature and the
type of problem. For instance, in the design of concrete structures, the strength of concrete is a random
variable since the compressive strength of concrete varies considerably from sample to sample.
Depending on the nature of equations involved (in terms of random variables) in the problem, a stochastic
optimization problem is called a stochastic linear, geometric, dynamic, or nonlinear programming problem.
The basic idea used in stochastic programming is to convert the stochastic problem into an equivalent
deterministic problem. The resulting deterministic problem is then solved by using familiar techniques
such as linear, geometric, dynamic, and nonlinear programming.
7. State Central Limit Theorem.
If X1,X2, . . . ,Xn are n mutually independent random variables with finite mean and variance (they may follow
different distributions), the sum
Sn =∑𝑛𝑖=1 𝑋𝑖
tends to a normal variable if no single variable contributes significantly to the sum as n tends to infinity.
Because of this theorem, we can approximate most of the physical phenomena as normal random variables.
Physically, Sn may represent, for example, the tensile strength of a fiber-reinforced material, in which case
the total tensile strength is given by the sum of the tensile strengths of individual fibers. In this case the
tensile strength of the material may be represented as a normally distributed random variable.
8. State the application of GPs.

GP has been successfully used as an automatic programming tool, a machine learning tool and an automatic
problem-solving engine. GP is especially useful in the domains where the exact form of the solution is not
known in advance or an approximate solution is acceptable (possibly because finding the exact solution is
very difficult). Some of the applications of GP are curve fitting, data modeling, symbolic regression, feature
selection, classification, etc. John R. Koza mentions 76 instances where Genetic Programming has been able
to produce results that are competitive with human-produced results (called Human-competitive results).
9. Show the program tree representation in GPs.
In tree-based GP, the computer programs are represented in tree structures that are evaluated recursively
to produce the resulting multivariate expressions. Traditional nomenclature states that a tree node (or just
node) is an operator [+,-,*,/] and a terminal node (or leaf) is a variable [a,b,c,d].
Tree-based GP was the first application of Genetic Programming. There are several other types (as
presented on the home page of this website) such as linear, Cartesian, and stack-based which are typically
more efficient in their execution of the genetic operators. However, Tree-based GP provides a visual means
to engage new users of Genetic Programming, and remains viable when built upon a fast-programming
language or underlying suite of libraries.
10. Distinguish between genetic programming and genetic algorithm.
Genetic algorithms and similar EAs are powerful optimization techniques, but they have an inherent
limitation: they incorporate the assumed solution structure in the representation of their candidate
solutions. However, we may not know which parameters need to be optimized in a given problem. Also, we
may not know the structure of the parameters that need to be optimized. Are the parameters real numbers,
or state space machines, or computer programs, or complex arrays, or time schedules, or something else
Genetic programming (GP) is an attempt to generalize EAs to an algorithm that can learn not only the best
solution to a problem given a specific structure, but that can also learn the optimal structure. GP evolves
computer programs to solve optimization problems. This is the distinctive feature of GP compared with
other EAs; other EAs evolve solutions, while GP evolves programs that can compute solutions. In fact, this
was one of the original goals of the artificial intelligence community.
Part B (16 Marks)
1. The payoff matrix of player A is shown in below table.
Player B
Player A 3 8 4 4
-7 2 10 2
a) Find the optimal solution using graphical method

b) Write the linear programming problem with respect to Player A
c) Write the linear programming problem with respect to Player B.
The answer is attached with the file.
2. Solve the optimization problem using geometric programming Minimize f(x) = 3x 1-1x2-3 + 4x12
x2 x3-2 +5x1x24x3-1 + 6x3
3. Formulate the objective function in the form of a posynomial form in detail.
4. Explain in detail about integer nonlinear programming with relevant examples.
5. Explain the DUALITY in Non-linear programming? Explain the concept of local and global
minima
6. Explain CONVEXITY in Non – linear programming and explain the impact of minima and
convexity in NLP
7. List out the GPs steps with an example and point out about the common operators used for GPs
These are some steps to implement a GP.

i. What is the fitness measure?
ii. What is the termination criterion?
iii. What is the terminal set for the evolving computer programs? That is, what symbols can
appear at the leaves of the syntax trees?
iv. What is the function set for the evolving computer programs? That is, what functions can
appear at the non-terminal nodes of the syntax trees?
v. How should we generate the initial population of computer programs?
vi. What other parameters do we need to determine to control GP execution?
A conceptual overview of a simple genetic program.
Parents  {randomly generated computer programs}

While not (termination criterion)
Calculate the fitness of each parent in the population
Children  Ø
While | Children | < | Parents |
Use fitnesses to probabilistically select parents p1 and p2
Mate pi and P2 to create children c\ and c^
Children  Children U {c1, c2}
Loop
Randomly mutate some of the children
Parents  Children
Next generation
Fitness Measure
What is the fitness measure? This decision must be made for all EAs, but the decision is more complicated
with GP. A computer program needs to work well for a wide variety of inputs, a variety of initial conditions,
and a variety of environments. For example, a program to find a fuel-efficient satellite trajectory from one
orbit to another should work well for various satellite parameters and various orbits. Therefore, many
different conditions must be used when determining the fitness of a computer program. For a given
computer program, each computer input set and operating condition returns its own "subfitness." How
should we combine these subfitnesses to obtain a single fitness measure for the computer program? Should
we use average performance? Should we try to maximize worst-case performance? Should we use some
combination of the two? These questions naturally lead to multi-objective optimization (Chapter 20),
although it not necessary to use multi-objective optimization in GP.
Termination Criteria
What is the termination criterion? This question needs to be answered for all EAs, but it may be especially
important for GP. This is because the fitness measure is usually more computationally demanding in GP
than in other EAs. The choice of the termination criterion could determine whether or not the GP is
successful. As with other EAs, the termination criterion for GP could include factors such as number of
iterations, number of fitness evaluations, run time, best fitness value, change in best fitness over several
generations, or standard deviation of fitness values over the entire population.
Terminal Set
What is the terminal set for the evolving computer programs? This set describes the symbols that can
appear at the leaves of the syntax trees. The terminal set is the set of all possible inputs to the evolving
computer programs. This set includes variables that are input to the computer program, along with
constants that we think might be important. The constants could include basic integers like 0 and 1, and
also constants that may be important for the particular optimization problem
(π, e, and so on).
The syntax trees have three terminals: x, y, and z. Some
constants can be obtained implicitly; for example, x - x =
0, and x/x — 1. So as long as we have a subtraction and
division function, we do not really need the 0 and 1
constants. However, most GP implementations should
include constants in their terminal sets.
We can also use random numbers in the terminal set, but
usually we do not want a random number to change after
it is generated. These type of random numbers are called
ephemeral random constants. Ephemeral random
constants are obtained by specifying a quantity denoted
as R in the terminal set. If R is chosen as a terminal during
population initialization, we generate a random number
r1 between given limits, and insert r1 into the GP
individual. From that point on, that particular value r1
does not change. However, if R is chosen again for
initialization of another individual, or for mutation, then
we generate a new random constant r2 for that
realization. The choice of the limits within which to
generate ephemeral random constants is another GP
design decision. Defining the terminal set for a GP
application is a balancing act. If we use a set that is too
small, then the GP will not be able to effectively solve our
problem. However, if we use a terminal set that is too
large, then it may be too difficult for the GP to find a good
solution in a reasonable time.
Koza studies this issue in for the simple problem of

discovering the program x3 + x2 + x on the basis of 20 test cases. For this
problem, the only terminal that the GP needs is x. When the terminal set is the
minimal set {x}, the GP finds the correct program within 50 generations
99.8% of the time. Table shows how the probability of success decreases
when extra members (random floating point numbers) are added to the
terminal set of the GP. For this simple problem, the probability of success
decreases linearly with the number of extraneous variables in the terminal
set. The good news is that even when 32 of the 33 members in the terminal
set are extraneous, GP is still able to solve the problem 35% of the time.
Function Set
What is the function set for the evolving computer programs? This set describes the functions that can
appear at the non-terminal nodes of the syntax trees, such as the following.
• Standard mathematical operators can be included in the function set (for example, addition, subtraction,
multiplication, division, absolute value).
• Problem-specific functions that we think are important for our particular optimization problem can be
included in the function set (for example, exponential functions, logarithmic functions, trigonometric
functions, filters, integrators, differentiators).
• Conditional tests can be included in the function set (for example, greater than, less than, equal to).
• Logic functions can be included in the function set, if we think that they could be applicable to the solution
of our particular optimization problem (for example, and, nand, or, xor, nor, not).
• Variable assignment functions can be included in the function set.
• Loop statements can be included in the function set (for example, while loops, for loops).
• Subroutine calls can be included in the function set, if we have a set of predefined functions that we have
created for our problem.
The syntax trees in Figure include five functions: addition, subtraction, multiplication, division, and
absolute value. We need to find the right balance in our definition of the function set and the terminal set.
The sets need to be large enough to be able to represent a solution to our problem, but if they are too large,
then the search space will be so large that the GP will have a hard time finding a good
solution.
Some functions need to be modified for GP because the syntax trees evolve might not have legal function
arguments. For example, GP could evolve the sexpression (/ x 0), which is division by zero. This would
result in a Lisp error, which would cause the GP to terminate. Therefore, instead of using the standard
division operator in Lisp, we can define a division operator DIV that protects against division by zero, and
that also protects against overflow due to division by a very small number:
(defun DIV (x y) ; define a protected division function

(if (< (abs y) ϵ) (return-from DIV 1)) ; return 1 if the divisor is very small
return-from DIV (/ x y)) ; else return x/y
where ϵ is a very small positive constant, like 10-20. Equation shows the Lisp syntax for defining a protected
division routine. The DIV function returns 1 if the divisor has a very small magnitude. We may need to
redefine other functions in a similar way (logarithm functions, inverse trigonometric functions, and so on)
to make sure that the functions in our function set can handle all possible inputs.
Initialization
How should we generate the initial population of computer programs? We have two basic options for
initialization, which are referred to as the full method and the grow method. We can also combine these
options to get a third option, which is referred to as the ramped half-and-half method.
The full method creates programs such that the number of nodes from each terminal node to the top-level
node is Dc, a user-specified constant. Dc is called the depth of the syntax tree. As an example, Parent 1 in
Figure 7.3 has a depth of three, while Parent 2 has a depth of four. Parent 1 in Figure 7.3 is a full syntax tree
because there are three nodes from each terminal node to the top-level addition node. However, Parent 2
is not a full syntax tree because some of the program branches have a depth of four while others only have
a depth of three.
We can use recursion to generate random syntax trees. For example, if we want to generate a syntax tree
with a structure like Parent 2, we first generate the subtraction node at the top level and note that it
requires two arguments. For the first argument, we generate the multiplication node and note that it
requires two arguments. This process continues for each node and each argument until we have generated
enough levels to reach the desired depth. When we reach the desired depth, we generate a random terminal
node to complete that branch of the syntax tree. The figure below illustrates the concept for a recursive
algorithm that generates random computer programs. We can generate a random syntax tree by calling
routine GrowProgramFull(Dc, 1), where Dc is our desired syntax tree depth. GrowProgramFull calls itself
each time it needs to add another layer in its growing
syntax tree.
The grow method of initialization creates programs such that the number of nodes from each terminal node
to the top-level node is less than or equal to Dc. If the parents in Figure 7.3 were created by random
initialization, then Parent 1 might have been generated with either the full method or the grow method,
while Parent 2 was definitely generated with the grow method since it is not a full syntax tree. The grow
method can be implemented the same way as the full method, except that when we generate a random node
at depths less than Dc , either a function or terminal node can be generated. If a function node is generated,
the syntax tree continues to grow. As with the full method, when we reach the maximum depth Dc, we
generate a random terminal to complete that branch of the syntax tree. The figure below illustrates the
concept for a recursive algorithm that generates random computer programs with the grow method.
The ramped half-and-half method generates half of the initial population with the full method, and half with
the grow method. Also, it generates an equal number of syntax trees for each value of depth between 2 and
Dc , which is the maximum allowable depth specified by the user. Figure 7.8 illustrates the concept of
ramped half-and-half syntax tree initialization.
Koza experimented with the three different types of initializations described above for some simple GP
problems. He found a difference in the probability of GP success depending on which initialization method
was used, as shown in Table 7.2. The table shows that the ramped half-and-half initializationmethod is
generally much better than the other two initialization methods.
Genetic Programming Parameters

What are the parameters that control GP execution? These parameters include those that are used for other
EAs, but also include GP-specific parameters.
1. We need to specify the selection method by which parents are chosen to participate in crossover. We
could use fitness-proportional selection, tournament selection, or some other method. In fact, we could use
any of the selection methods. This is also a good place to mention that we could implement tree-based
crossover more intelligently than simply selecting random crossover points. There are some subtrees that
are more useful than others, and we may not want to break up those subtrees. We could quantify the fitness
of subtrees by obtaining correlations between crossover points and the fitness of child programs, and then
using those correlations to bias the selection of future crossover points.
2. We need to specify the population size. Since there are so many degrees of freedom in computer
programs, GP usually has larger populations than other EAs. GP usually has a population size of at least
500, and often has a population size of several thousand.
3. We need to specify the mutation method. Various GP mutation methods have been used over the years,
some of which are described as follows.
(a) We can select a random node, and replace everything below that node with a randomly-generated
syntax subtree. This is called subtree mutation [Koza, 1992, page 106]. This is equivalent to crossing a
program with a randomly generated program, and is also called headless chicken crossover [Angeline,
1997].
(b) Expansion mutation replaces a terminal with a randomly-generated subtree. This is equivalent to
subtree mutation if the replaced node in subtree mutation is a terminal.
(c) We can replace a randomly selected node or terminal with a new randomly generated node or terminal.
This is called point mutation or node replacement mutation, and requires that the arity of the replaced
node be equal to the arity of the replacement node. For example, we could replace an addition operation
with a multiplication operation, or we could replace an absolute value operation with a sine operation.
(d) Hoist mutation creates a new program that is a randomly selected subtree of the parent program.
(e) Shrink mutation replaces a randomly chosen syntax subtree with a randomly selected terminal; this is
also called collapse subtree mutation. Hoist mutation and shrink mutation were originally introduced to
reduce code bloat.
(f ) Permutation mutation randomly permutes the arguments of a randomly selected function [Koza, 1992].
For example, we could replace the x and y arguments of a division function. Of course, this type of mutation
does not have any affect on commutative functions.
(g) We can randomly mutate constants in a program [Schoenauer et al.,1996]. We often implement
mutation in such a way that the mutated program replaces the original program only if it is more fit. This
idea of replace-only-ifmore- fit can be applied to mutation in any EA.
4. We need to specify the mutation probability pm. This is similar to other EAs. Mutation in a GP with N
individuals is often implemented with a method similar to the following:
The large population size that is used in GP, along with the large number of possible nodes at which
crossover can occur, usually means that good GP results do not depend on mutation. Often we can get good
results with pm = 0. However, mutation may still be desirable just in case an important terminal or function
is lost from the population. If that occurs, mutation is the only way that it could re-enter the population.
5. We need to specify the crossover probability pc. This is similar to G As. After selecting two parents in
Figure 7.5, we can either use crossover to combine them, or we can instead clone them for the next
generation. The line: Mate pi and pi to create children c 1 and c2 in Figure 7.5 would then be replaced with
something like the following:
Most experience suggests that crossover is an important aspect of GP and should be used with a probability
pc ≥ 0.9.
6. We need to decide whether or not to use elitism. As with any other EA, we can save the best m computer
programs in GP from one generation to the next to make sure they are not lost in the following generation.
The parameter m is called the elitism parameter. Elitism can be implemented in several different ways. For
example, we could archive the best m individuals at the end of a generation, create the children for the next
generation as usual, and then replace the worst m children with the elites from the previous generation.
Alternatively, we could copy the m elites to the first m children each generation, and then create only (N —
m) additional children each generation (where N is the population size).
7. We need to specify Di, the maximum program size of the initial population. A program's size can be
quantified by its depth, which measures the maximum number of nodes between the highest level and the
lowest level (inclusive). For example, Parent 1 in Figure 7.3 has a depth of three, while Parent 2 has a depth
of four.
8. We also need to specify Dc, the maximum depth of child programs. During GP operation, child programs
can grow larger and larger with each succeeding generation. If a maximum depth is not enforced, then child
programs can become unreasonably long, wasting space and execution time; this is called GP bloat. The
maximum depth Dc can be enforced in several ways. One way is to replace a child with one of its parents if
the child's depth exceeds Dc. Another way is to redo the crossover operation if the child's depth exceeds Dc.
Yet another way is to examine the parent syntax trees before choosing their crossover points, and constrain
the randomly selected crossover points so that Dc will not be exceeded by the children's depths.
9. We need to decide whether or not we want to allow a terminal node in a syntax tree to be replaced with
a subtree during crossover. Figure 7.4 shows that the z terminal in Parent 1 is selected for crossover, and
is replaced with a subtree in Child 1. We use pi to denote the probability of crossover at an internal node.
When selecting a crossover point, we generate a random number r uniformly distributed on [0,1]. If r is
less than pi, then we select a terminal node for crossover; that is, we select a symbol in the syntax tree that
is not immediately preceded by a left parenthesis. However, if r is greater than pi, then we select an s-
expression for crossover; that is, we select a subtree that is surrounded by matching left and right
parentheses for crossover.
10. We need to decide whether or not to worry about duplicate individuals in the population. Duplicate
individuals are a waste of computer resources. In EAs with relatively small search spaces or small
populations, duplicates can arise quite often, and dealing with duplicates can be an important aspect of the
EA. However, in GP, the search space is so large that duplicates rarely occur. Therefore, we usually do not
need to worry about duplicate individuals in GP.
UNIT 3 – RANDOM SEARCH METHODS

PART A (2 Marks)
1. State few advantages and disadvantages of genetic algorithm and mention the role of fitness
function in Genetic Algorithm.
Ans:
Advantages:
• Does not require any derivative information (which may not be available for many real-world
problems).
• Is faster and more efficient as compared to the traditional methods.
• Has very good parallel capabilities.
• Optimizes both continuous and discrete functions and also multi-objective problems.
• Provides a list of “good” solutions and not just a single solution.
• Always gets an answer to the problem, which gets better over the time.
• Useful when the search space is very large and there are a large number of parameters involved.
Limitations:
• GAs is not suited for all problems, especially problems which are simple and for which derivative
information is available.
• Fitness value is calculated repeatedly which might be computationally expensive for some
problems.
• Being stochastic, there are no guarantees on the optimality or the quality of the solution.
• If not implemented properly, the GA may not converge to the optimal solution.
Role of fitness function in Genetic Algorithm.
A fitness function simply defined is a function which takes the solution as input and produces the suitability
of the solution as the output. In some cases, the fitness function and the objective function may be the same,
while in others it might be different based on the problem.
2. How is Genetic algorithm differing from traditional algorithm?
3. Mention the advantages of Partical Swarm Optimization and Application of it.

Advantages:
The main advantages of the PSO algorithm are summarized as: simple concept, easy implementation,
robustness to control parameters, and computational efficiency when compared with mathematical
algorithm and other heuristic optimization techniques.
Applications:
PSO can be applied for various optimization problems, for example, Energy-Storage Optimization. PSO can
simulate the movement of a particle swarm and can be applied in visual effects like those special effects in
the Hollywood film.
4. What are the characteristics of Ant Colony Optimization algorithm?

Ans: Ant colony optimization (ACO) is a population-based metaheuristic that can be used to find
approximate solutions to difficult optimization problems. In ACO, a set of software agents called artificial
ants search for good solutions to a given optimization problem.
5. Justify Swarm intelligence is superior to conventional computing algorithm.
Ans: The intuitive notion of “swarm intelligence” is that of a “swarm” of agents (biological or
artificial) which, without central control, collectively (and only collectively) carry out (unknowingly, and
in a somewhat-random way) tasks normally requiring some form of “intelligence.” The capability of
universal computation carried out with natural asynchrony by a dynamic cellular-computing system, none
of whose cells can predict the computation done by the swarm.
6. What is meant by biogeography-based optimization algorithm?
Biogeography is the study of the geographical distribution of biological organisms.
Biogeography-based optimization is an evolutionary algorithm that optimizes a function by stochastically

and iteratively improving candidate solutions with regard to a given measure of quality, or fitness function.
7. Define Swarm Intelligence and what is the characteristics of the swarm?
Swarm intelligence (SI) is the collective behaviour of decentralized, self-organized systems, natural or
artificial. The concept is employed in work on artificial intelligence. The expression was introduced
by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems.
Characteristics:
• The system is composed of a large number of individuals
• The system is composed of homogenous individuals having similar characteristics.
• Information must be exchanged among the individuals either directly or via the environment. In
other words, the group must operate on the basis of self-organization.
• Interactions among the individuals must only be based on local information.
8. What is the difference between heuristic and metaheuristic methods?
Heuristic Method Metaheuristic method

Heuristic are often problem dependent that is Metaheuristic are problem independent
define an heuristic for a given problem techniques that can be applied to a broad range of
problems.
Example: Choosing a random element for pivoting Metaheuristics knows nothing about the problem
in quicksort it will be applied, it can treat functions a black
boxes
It exploits problem dependent information to find Metaheuristic are like design patterns general
a good enough solution to a specific problem algorithmic ideas that can be applied to a broad
range of problems
9. What are the main strategies of tabu search?

3 main strategies:
• Forbidding strategy:
control what enters the tabu list
• Freeing strategy: control
what exits the tabu list and when
• Short-term strategy:
manage interplay between the forbidding strategy and freeing strategy to select trial solutions
10. Why Harmony Search is Successful?
1. It needs less math knowledge and no derivative knowledge
2.use both continues, discrete and integer variable
3. Finds a reasonably good solution with few iterations.
4. No need to specify a special initial value for the decision variable when starting the problem algorithm.
PART – B (16 Marks)

1. Draw the functional block diagram of swarm intelligence system, explain the role of sub blocks.
Also represent the properties and application domain of swarm intelligence system.
Swarm intelligence (SI) is the collective behaviour of decentralized, self-organized systems,
natural or artificial. The concept is employed in work on artificial intelligence. The expression was
introduced by Gerardo Beni and Jing Wang in 1989, in the context of cellular robotic systems.
Key Aspects:
• Behavior: Difficult to predict the behavior from the individual rules.
• Knowledge: The functions of colony could not be understood with the knowledge of functioning of
a agent.
• Sensitivity: Even a small change in the simple rules results in different group level behavior.
Benefits of swarming
• More transparency. Swarming makes for a nicer experience for all parties involved.
• Developing new skills. Swarming opens new ways to collaborate: it thrives on the diverse skill sets
in your team.
• Employee empowerment.
• Lower staff turnover.

Swarm Intelligence Algorithms
These algorithms include
• Genetic Algorithms (GA),
• Ant Colony Optimization (ACO),
• Particle Swarm Optimization (PSO),
• Differential Evolution (DE),
• Artificial Bee Colony (ABC),
• Glow-worm Swarm Optimization (GSO),
• and Cuckoo Search Algorithm (CSA).

Properties:
The characterizing property of a swarm intelligence system is its ability to act in a coordinated way
without the presence of a coordinator or of an external controller.
Fundamental Properties of Metaheuristics
• Metaheuristics are strategies that “guide” the search process.
• The goal is to efficiently explore the search space in order to find (near-)optimal solutions.
• Metaheuristic algorithms are approximate and usually non-deterministic.
• Metaheuristics are not problem-specific.

Applications:
Swarm intelligence (SI) is the collective behavior of decentralized, self-organized systems,
natural or artificial. SI systems are typically made up of a population of simple agents interacting locally
with one another and with their environment. The inspiration often comes from nature, especially
biological systems.
2. Enumerate the procedure involved in using Genetic Algorithm for optimizing controller
parameters.
The genetic algorithm is a method for solving both constrained and unconstrained
optimization problems that is based on natural selection, the process that drives biological evolution. The
genetic algorithm repeatedly modifies a population of individual solutions.
Genetic Algorithm (GA) is a search-based optimization technique based on the principles of
Genetics and Natural Selection. It is frequently used to find optimal or near-optimal solutions to difficult
problems which otherwise would take a lifetime to solve.
It generates solutions to optimization problems using techniques inspired by natural evolution,
such as inheritance, mutation, selection, and crossover.
Genetic algorithms are based on the principles of natural genetics and natural selection. The basic elements
of natural genetics—reproduction, crossover, and mutation—are used in the genetic search procedure. GAs
differ from the traditional methods of optimization in the following respects:
1. A population of points (trial design vectors) is used for
starting the procedure instead of a single design point. If
the number of design variables is n, usually the size of the
population is taken as 2n to 4n. Since several points are
used as candidate solutions, GAs are less likely to get
trapped at a local optimum.
2. GAs use only the values of the objective function. The
derivatives are not used in the search procedure.
3. In GAs the design variables are represented as strings
of binary variables that correspond to the chromosomes
in natural genetics. Thus the search method is naturally
applicable for solving discrete and integer programming
problems. For continuous design variables, the string
length can be varied to achieve any desired resolution.
4. The objective function value corresponding to a design
vector plays the role of fitness in natural genetics.
5. In every new generation, a new set of strings is
produced by using randomized parents selection and
crossover from the old generation (old set of strings).
Although randomized, GAs are not simple random search
techniques. They efficiently explore the new combinations
with the available knowledge to find a new generation
with better fitness or objective function value.
Algorithm
The computational procedure involved in maximizing the fitness function F (x1, x2, x3, . . . , xn) in the genetic
algorithm can be described by the following steps.
1. Choose a suitable string length l = nq to
represent the n design variables of the design vector
X. Assume suitable values for the following
parameters: population size m, crossover
probability pc, mutation probability pm, permissible
value of standard deviation of fitness values of the
population (sf )max to use as a convergence criterion,
and maximum number of generations (imax) to be
used an a second convergence criterion.
2. Generate a random population of size m,
each consisting of a string of length l = nq. Evaluate
the fitness values Fi, i = 1, 2, . . . , m, of the m strings.
3. Carry out the reproduction process.
4. Carry out the crossover operation using
the crossover probability pc.
5. Carry out the mutation operation using the mutation probability p m to find the new generation
of m strings.
6. Evaluate the fitness values Fi, i = 1, 2, . . . , m, of the m strings of the new population. Find the
standard deviation of the m fitness values.
7. Test for the convergence of the algorithm or process. If sf ≤ (sf )max, the convergence criterion is
satisfied and hence the process may be stopped. Otherwise, go to step 8.
8. Test for the generation number. If i ≥ imax, the computations have been performed for the
maximum permissible number of generations and hence the process may be stopped. Otherwise, set the
generation number as i = i + 1 and go to step 3.
Real time Applications:
1. Traveling salesman problem (TSP)
2. Vehicle routing problem (VRP)
3. Financial markets
4. Manufacturing system
5. Mechanical engineering design
6. Data clustering and mining
7. Image processing
8. Neural networks
9. Wireless sensor networks
10. Medical science
3. With a neat flowchart, explain the algorithm of Particle Swarm Algorithm.
In computational science, particle swarm optimization (PSO) is a computational method that
optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure
of quality.
PSO is best used to find the maximum or minimum of a function defined on a multidimensional
vector space.
As an example, consider the behavior of birds in a flock. Although each bird has a limited
intelligence by itself, it follows the following simple rules:
1. It tries not to come too close to other birds.
2. It steers toward the average direction of other birds.
3. It tries to fit the “average position” between other birds with no wide gaps in the flock.
Thus the behavior of the flock or swarm is based on a combination of three simple factors:
1. Cohesion—stick together.
2. Separation—don’t come too close.
3. Alignment—follow the general heading of the flock.
The PSO is developed based on the following model:
1. When one bird locates a target or food (or maximum of the objective function), it instantaneously
transmits the information to all other birds.
2. All other birds gravitate to the target or food (or maximum of the objective function), but not directly.
3. There is a component of each bird’s own independent thinking as well as its past memory. Thus the model
simulates a random search in the design space for the maximum value of the objective function. As such,
gradually over many iterations, the birds go to the target (or maximum of the objective function).
Algorithm Details
Assume we have P particles and we denote the position of particle i at iteration t as Xi(t), which
in the example of above, we have it as a coordinate Xi(t)=(xi(t),yi(t)). Besides the position, we also have a
velocity for each particle denoted as Vi(t)=(vxi(t),vyi(t)). At the next iteration, the position of each particle
would be updated as
Xi(t+1)=Xi(t)+Vi(t+1)
or, equivalently,
xi(t+1)=xi(t)+vxi(t+1)
yi(t+1)=yi(t)+vyi(t+1)
and at the same time, the velocities are also updated by the rule
Vi(t+1)=wVi(t)+c1r1(pbesti–Xi(t))+c2r2(gbest–Xi(t))
where r1 and r2 are random numbers between 0 and 1, constants w, c1, and c2 are parameters to the PSO
algorithm, and pbesti is the position that gives the best f(X) value ever explored by particle i and gbest is
that explored by all the particles in the swarm.
Note that pbesti and Xi(t) are two position vectors and the difference pbesti–Xi(t) is a vector subtraction.
Adding this subtraction to the original velocity Vi(t) is to bring the particle back to the position pbesti.
Similar are for the difference gbest–Xi(t).
We call the parameter w the inertia weight constant. It is between 0 and 1 and determines how much
should the particle keep on with its previous velocity (i.e., speed and direction of the search). The
parameters c1 and c2 are called the cognitive and the social coefficients respectively. They controls how
much weight should be given between refining the search result of the particle itself and recognizing the
search result of the swarm. We can consider these parameters controls the trade off
between exploration and exploitation.
Vector subtraction.
Diagram by Benjamin D. Esham, public domain.
The positions pbesti and gbest are updated in each iteration to reflect the best position ever found thus far.
One interesting property of this algorithm that distinguish it from other optimization algorithms is that it
does not depend on the gradient of the objective function. In gradient descent, for example, we look for the
minimum of a function f(X) by moving X to the direction of −∇f(X) as it is where the function going down
the fastest. For any particle at the position X at the moment, how it moves does not depend on which
direction is the “down hill” but only on where are pbest and gbest. This makes PSO particularly suitable if
differentiating f(X) is difficult.
Another property of PSO is that it can be parallelized easily. As we are manipulating multiple particles to
find the optimal solution, each particles can be updated in parallel and we only need to collect the updated
value of gbest once per iteration. This makes map-reduce architecture a perfect candidate to implement
PSO.
Computational Implementation of PSO
Consider an unconstrained maximization problem: Maximize f (X) with X (l) ≤ X ≤ X (u) where X (l) and X
(u) denote the lower and upper bounds on X, respectively. The PSO procedure can be implemented through
the following steps.
1. Assume the size of the swarm (number of particles) is N. To reduce the total number of function
evaluations needed to find a solution, we must assume a smaller size of the swarm. But with too small a
swarm size it is likely to take us longer to find a solution or, in some cases, we may not be able to find a
solution at all. Usually a size of 20 to 30 particles is assumed for the swarm as a compromise.
2. Generate the initial population of X in the range X (l) and X (u) randomly as X1, X2, . . . , XN . Hereafter,
for convenience, the particle (position of) j and its velocity in iteration i are denoted as X (i) j and V (i) j ,
respectively. Thus the particles generated initially are denoted X1(0), X2(0), . . . , XN (0). The vectors Xj (0)(j
= 1, 2, . . . , N ) are called particles or vectors of coordinates of particles (similar to chromosomes in genetic
algorithms). Evaluate the objective function values corresponding to the particles as
f [X1(0)], f [X2(0)], . . . , f [XN (0)].
3. Find the velocities of particles. All particles will be moving to the optimal point with a velocity. Initially,
all particle velocities are assumed to be zero. Set the iteration number as i = 1.
4. In the ith iteration, find the following two important parameters used by a typical particle j : (a) The
historical best value of Xj (i) (coordinates of j th particle in the current iteration i), Pbest, j, with the highest
value of the objective function, f [Xj (i)], encountered by particle j in all the previous iterations. The
historical best value of Xj (i) (coordinates of all particles up to that iteration), Gbest, with the highest value
of the objective function f [Xj (i)], encountered in all the previous iterations by any of the N particles. (b)
Find the velocity of particle j in the ith iteration as follows:
Vj (i) = Vj (i − 1) + c1r1[Pbest,j − Xj (i − 1)] + c2r2[Gbest − Xj (i − 1)]; j = 1, 2, . . . , N where c1 and c2 are the
cognitive (individual) and social (group) learning rates, respectively, and r1 and r2 are uniformly
distributed random numbers in the range 0 and 1. The parameters c1 and c2 denote the relative importance
of the memory (position) of the particle itself to the memory (position) of the swarm. The values of c1 and
c2 are usually assumed to be 2 so that c1r1 and c2r2 ensure that the particles would overfly the target
about half the time. (c) Find the position or coordinate of the j th particle in ith iteration as
Xj (i) = Xj (i − 1) + Vj (i); j = 1, 2, . . . , N
where a time step of unity is assumed in the velocity. Evaluate the objective function values corresponding
to the particles as
f [X1(i)], F[X2(i)], . . . , F[XN (i)].
5. Check the convergence of the current solution. If the positions of all particles converge to the same set
of values, the method is assumed to have converged. If the convergence criterion is not satisfied, step 4 is
repeated by updating the iteration number as i = i + 1, and by computing the new values of Pbest,j and
Gbest. The iterative process is continued until all particles converge to the same optimum solution.
The main advantages of the PSO algorithm are summarized as: simple concept, easy
implementation, robustness to control parameters, and computational efficiency when compared with
mathematical algorithm and other heuristic optimization techniques.
PSO can be applied for various optimization problems, for example, Energy-Storage Optimization.
PSO can simulate the movement of a particle swarm and can be applied in visual effects like those special
effects in the Hollywood film.
4. With a neat flowchart, explain the algorithm of Ant Colony Optimization.

In computer science and operations research, the ant colony optimization algorithm (ACO) is a
probabilistic technique for solving computational problems which can be reduced to finding good paths
through graphs. Artificial ants stand for multi-agent methods inspired by the behavior of real ants.
Ant colony optimization is a probabilistic technique for finding optimal paths. In computer science
and researches, the ant colony optimization algorithm is used for solving different computational
problems.
They have an advantage over simulated annealing and genetic algorithm approaches of similar
problems when the graph may change dynamically; the ant colony algorithm can be run continuously and
adapt to changes in real time.
Graphical representation of the ACO process in the form of a multi-layered network
Ant Searching Behavior An ant k, when located at node i, uses the pheromone trail τij to compute the
probability of choosing j as the next node:
where α denotes the degree of importance of the pheromones and N (k) i indicates the set of neighborhood
nodes of ant k when located at node i. The neighborhood of node i contains all the nodes directly connected
to node i except the predecessor node (i.e., the last node visited before i). This will prevent the ant from
returning to the same node visited immediately before node i. An ant travels from node to node until it
reaches the destination (food) node.
Path Retracing and Pheromone Updating
Before returning to the home node (backward node), the kth ant deposits Δτ (k) of pheromone on arcs it
has visited. The pheromone value τij on the arc (i, j ) traversed is updated as follows:
Because of the increase in the pheromone, the probability of this arc being selected by the forthcoming ants
will increase.
Pheromone Trail Evaporation
When an ant k moves to the next node, the pheromone evaporates from all the arcs ij according to the
relation
where p ∈ (0, 1] is a parameter and A denotes the segments or arcs traveled by ant k in its path from home
to destination. The decrease in pheromone intensity favors the exploration of different paths during the
search process. This favors the elimination of poor choices made in the path selection. This also helps in
bounding the maximum value attained by the pheromone trails. An iteration is a complete cycle involving
ant’s movement, pheromone evaporation and pheromone deposit.
Algorithm
ACO was used to solve graph problems by investigating possible paths on the graphs. ACO is
inspired by the behavior of ants that provides to find shortest distance between their nest and food
resource by means of pheromone. Ants choose shortest way while searching food resources rapidly in
progress of time.
5. Explain the idea behind Harmony Search Algorithms
The Harmony Search (HS) method is an emerging metaheuristic optimization algorithm.

It is meta heuristic algorithm inspired by music improvisation process in which the musician searches for
the best harmony and continues to polish the harmony in order to improve its aesthetics.
It is developed by Geem et.al. in 2001.
Let us first idealize the improvisation process by a skilled musician. When a musician is improvising there
are three possible choices:
1. Play any piece of music exactly from his memory.
2. Play something similar to a known piece.
3. Compose new or random notes.
Geem formalized these three options into quantitative optimization process and the three corresponding
components.
Components such as
1.Harmony memory (HM)
2.Pitch adjusting
3.Randomization are introduced .
Harmony memory considering rate is similar to the crossover rate c in GAs.In order to use this memory
effectively, it is typically assigned a parameter called harmony memory considering rate (HMCR [0, 1]).
If this rate is low (near 0), only few best harmonies are utilized and thus convergence of algorithm is slow.
If this rate is very high (near 1), it results in exploitation of the harmonies in the HM, thus the solution space
is not explored properly leading to potentially inefficient solutions.
The second component is pitch adjustment determined by a pitch bandwidth (BW) (also referred as fret
width [15] ) and a pitch adjusting rate (PAR), it corresponds to generating a slightly dierent solution in the
HS algorithm.
Pitch can be adjusted linearly or nonlinearly however most often linear adjustment is used.
Hinew = Hiold + BW × ri where ri ∈ [ 1, 1] and 1 ≤ i ≤ D
Where Hi is the ith component of the existing harmony or solution and Hinew is the ith component of
old
new harmony after the pitch adjusting action and BW is the bandwidth.
Algorithm:
1. Initialize the optimization problem and algorithm parameters.
2. Initialize the harmony memory(HM).
3. Improvisation of a new harmony.
4. Update the HM
5. Termination
HS creates one child each generation Algorithm :
Games: Su-do-ku
Harmony Search Applications:

Power Systems:
There is a lot of work focused on the optimization issues concerning power systems, such as cost
minimization. A modified HS algorithm is proposed to handle nonconvex economic load dispatch of real-
world power systems. The economic load dispatch and combined economic and emission load dispatch
problems can be converted into the minimization of the cost function
Computer Science:
The HS algorithm has been recently applied in many applications in computer science and
engineering, among them: The clustering or grouping of web pages, the summary or text summarization,
Internet routing and robotics.
Signal and Image Processing:
Li and Duan modify the HS by adding a Gaussian factor to adjust the BW. With this modified HS, they
develop a pretraining process to select the weights used in the combining of feature maps to make the
target more conspicuous in the saliency map.
HS is an amalgamation of previously established EA ideas, including

global uniform recombination, uniform mutation, Gaussian mutation, and replacement of the worst
individual each generation.
The contribution of HS lies in two areas. First, the way that HS combines these ideas is novel. Second, the
musical motivation of HS is novel.
In addition, the implementation of HS algorithm is also easier.
Furthermore, the HS algorithm is a population-based metaheuristic, this means that multiple harmonics
groups can be used in parallel
6. Explain in detail shuffled Frog Leaping Algorithm?

UNIT 4 – OPTIMIZATION IN NETWORK SECURITY
PART – A (2 Marks)
1. What is optimization in dynamic network?
➔ Network optimization refers to the tools, techniques, and best practices used to monitor and
enhance network performance.
➔ The first step of the optimization process is to measure a series of network performance metrics
and identify any issues.
➔ Network performance monitoring includes measuring traffic, bandwidth, jitter, and latency caused
by the likes of insufficient infrastructure or inadequate network security.
➔ Dynamic Network optimization problems whose parameters change (e.g., time varying) and are
not static.
2. Define Ranking
➔ The simplest form of job evaluation method.
➔ The method involves ranking each job relative to all other jobs,usually based on some overall factor
like 'job difficulty’.
➔ Each job as a whole is compared with other and this comparison of jobs goes on until all the jobs have
been evaluated and ranked.
➔ Example: AssetRank was proposed to rank any dependency attack graph using a random walk model.
AssetRank is a generalization of PageRank extending it to handle both conjunctive and disjunctive nodes.
AssetRank is supported by an underlying probabilistic interpretation based on a random walk.

OA Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OA Notes

Uploaded by

Copyright:

Available Formats

CS3030 – OPTIMIZATION ALGORITHMS

UNIT 1 Introduction to optimization algorithms

1. What do you mean by optimization algorithm and give example?

Five steps in solving optimization problems

7. Explain about Multivariable optimization

The efficiency of solution methods can be gauged by three criteria:

Steepest Descent Method

Modified Newton’s Method

Fletcher–Reeves Conjugate Gradient Method

where H is a symmetric matrix, then, S1 and S2 are conjugate directions.

The reflected point is computed as

where α is a predefined constant. Typically, α = 1 is taken in the simulations.

Additional Test Functions

The two-variable function is given by

The two-variable function is given by

The minimum of this function is zero and occurs at (1, 2).

The two-variable function is given by

where rk (>0) is a penalty parameter and the function

The main advantages of the penalty function method are

• It can be started from an infeasible point.

The main disadvantages of the penalty function method are

Augmented Lagrange Multiplier Method

Sequential Quadratic Programming

and replaces the constraint functions by linear approximations.

Method of Feasible Directions

A direction S is feasible at point x if

Refer to question 1 answer.

3. Explain a short note about Combinatorial Optimization with an example?

Combinatorial optimization is an emerging field at the forefront of combinatorics and theoretical

Road Map MOOP

The weighted sum approach:

Goal programming is a branch of multiobjective optimization, which in turn is a branch of multi-criteria

Utility Function Method:

Part A(2 Marks)

1. List out the properties of linear programming model.

3. Define unimodal function with a neat sketch.

5. Define Monomial Function.

6. Define Stochastic Programming.

7. State Central Limit Theorem.

8. State the application of GPs.

9. Show the program tree representation in GPs.

10. Distinguish between genetic programming and genetic algorithm.

Part B (16 Marks)

1. The payoff matrix of player A is shown in below table.

a) Find the optimal solution using graphical method

c) Write the linear programming problem with respect to Player B.

The answer is attached with the file.

These are some steps to implement a GP.

A conceptual overview of a simple genetic program.

Parents  {randomly generated computer programs}

Koza studies this issue in for the simple problem of

(defun DIV (x y) ; define a protected division function

Genetic Programming Parameters

UNIT 3 – RANDOM SEARCH METHODS

3. Mention the advantages of Partical Swarm Optimization and Application of it.

4. What are the characteristics of Ant Colony Optimization algorithm?

Biogeography-based optimization is an evolutionary algorithm that optimizes a function by stochastically

Heuristic Method Metaheuristic method

9. What are the main strategies of tabu search?

PART – B (16 Marks)

• Behavior: Difficult to predict the behavior from the individual rules.

• Lower staff turnover.

These algorithms include

• Genetic Algorithms (GA),