Professional Documents
Culture Documents
Part A (2 Marks)
Example: Optimization algorithm in deep learning - Finds the value of the parameters(weights)
that minimize the error when mapping inputs to outputs.
2. What are the types of optimization and give examples of each optimization method?
Continuous Optimization versus Discrete Optimization
Unconstrained Optimization versus Constrained Optimization
Deterministic Optimization versus Stochastic Optimization
3. Distinguish between continuous optimization and Discrete optimization.
Continuous optimization – the variables used in the objective function can assume real values, e.g.,
values from intervals of the real line. Analytical optimization method gradient descent, linear and non-
linear optimization…
Digital (discrete) optimization – the variables used in the mathematical program are restricted to
assume only discrete values, such as the integers.Combinatorial optimization integer programming.
4. What are the additional based functions in Unconstrained optimization?
In test problems such as Rosenbrock’s function, Wood’s function, quadratic function, and so forth
are taken, on which different solution methods will be tested. The performance of each method is compared
in terms of the computational time.Additional test functions are,
• Rosenbrock’s function
• Wood’s function
• Quadratic function
• Nonlinear function
5. Define Multi-Objective optimization.
The multiobjective optimization problem (also known as multiobjective programming problem)
is a branch of mathematics used in multiple criteria decision-making, which deals with optimization
problems involving two or more objective functions to be optimized simultaneously.
5. What are the solution techniques in Constrained optimization?
Unconstrained optimization
Optimization problems are constrained, and unconstrained optimization problems are few. One example
of an unconstrained optimization problem is data fitting, where one fits a curve on the measured data. The
solution methods for unconstrained optimization problems can be broadly classified into gradient-based
and non–gradient-based search methods. As the name suggests, gradient-based methods require gradient
information in determining the search direction. The gradient-based methods discussed are the steepest
descent, Davidon–Fletcher– Powell (DFP), Broyden–Fletcher–Goldfarb–Shanno (BFGS), Newton, and
Levenberg–Marquardt methods. The search direction computed by these methods uses the gradient
information, Hessian information, or a combination of these two. Some methods also make an
approximation of the Hessian matrix. Once the search direction is identified, one needs to evaluate how
much to move in that direction so as to minimize the function. This is a one-dimensional problem.
Unidirectional Search
The unidirectional search refers to minimizing the value of a multivariable function along a specified
direction. For example, if xi is the initial starting point of the design variables for minimizing a multivariable
function and S is the search direction, then we need to determine a scalar quantity α such that the function
f(α) = xi + αSi
is minimized. The value of α at which this function reaches a minimum is given by α*. This is a one-
dimensional optimization problem and we can use the golden section technique to minimize this function.
The golden section method is modified to handle multivariable functions
Test Problem
Let us define a spring system as a test problem on which we will apply multivariable optimization
algorithms such as the steepest descent, DFP, BFG, Newton, and Levenberg–Marquardt methods. Consider
two springs of unit length and with stiffness k1 and k2, joined at the origin. The other two ends of the
springs are fixed on a wall .On applying a force, the spring system will deflect to an equilibrium position,
which we are interested in determining. The potential of the spring system is given by
where is the force applied at the origin due to which it moves to a position (x1, x2)
Solution Techniques
solution techniques for multivariable, unconstrained optimization problems can be grouped into gradient-
and non–gradient-based methods. Gradient-based methods require derivative information of the function
in constituting a search. The first and second derivatives can be computed using the central difference
formula as given below.
In successive iterations, the design variables can be updated using the equation,
where α is a positive scalar parameter that can be determined using the line search algorithm such as the
golden section method.
The steepest descent method ensures a reduction in the function value at every iteration. If the starting
point is far away from the minimum, the gradient will be higher and the function reduction will be
maximized in each iteration. Because the gradient value of the function changes and decreases to a small
value near the optimum, the function reduction is uneven and the method becomes sluggish (slow
convergence) near the minimum. The method can therefore be utilized as a starter for other gradient-based
algorithms.
Advantage:
The steepest descent method is that it reaches closer to the minimum of the function in a few iterations
even when the starting guess is far away from the optimum.
Newton’s Method
The search direction in this method is based on the first and second derivative information and is given by
where [H] is the Hessian matrix. If this matrix is positive definite, then Si will be a descent direction. Though
the Newton’s method is known for converging in a single iteration for a quadratic function, seldom do we
find functions in practical problems that are quadratic. However, Newton’s method is often used as a hybrid
method in conjunction with other methods.
Advantage:
Newton’s method shows a faster convergence if the starting guess is close to the minimum point.
Disadvantage:
Newton’s method may not converge if the starting point is far away from the optimum point.
Levenberg–Marquardt Method
The Levenberg–Marquardt method is a kind of hybrid method that combines the strength of both the
steepest descent and Newton’s methods. The search direction in this method is given by
where I is an identity matrix and λ is a scalar that is set to a high value at the start of the algorithm. The
value of λ is altered during every iteration depending on whether the function value is decreasing or not. If
the function value decreases in the iteration, λ it decreases by a factor (less weightage on steepest descent
direction). On the other hand, if the function value increases in the iteration, λ it increases by a factor (more
weightage on steepest descent direction).
The information stored in the matrix [A] is called as the metric and because it changes with every iteration,
the DFP method is known as the variable metric method. Because this method uses first-order derivatives
and has the property of quadratic convergence, it is referred to as a quasi-Newton method.
BFGS Method
In the BFGS method, the Hessian is approximated using the variable metric matrix [A] given by the equation
It is important to note that whereas the matrix [A] converges to the inverse of the Hessian in the DFP
method, the matrix [A] converges to the Hessian itself in the BFGS method. As the BFGS method needs fewer
restarts as compared to the DFP method, it is more popular than the DFP method.
Powell Method
The Powell method is a direct search method (no gradient computation is required) with the property of
quadratic convergence. Previous search directions are stored in this method and they form a basis for the
new search direction. The method makes a series of unidirectional searches along these search directions.
The last search direction replaces the first one in the new iteration and the process is continued until the
function value shows no improvement.
Nelder–Mead Algorithm
The Nelder–Mead algorithm is a direct search method and uses function information alone (no gradient
computation is required) to move from one iteration to another. The objective function is computed at each
vertex of the simplex. Using this information, the simplex is moved in the search space. Again, the objective
function is computed at each vertex of the simplex. The process of moving the simplex is continued until
the optimum value of the function is reached. Three basic operations are required to move the simplex in
the search space: reflection, contraction, and expansion The centroid point xc is computed using all the
points but with the exclusion of xworst. That is
Additional test problems such as Rosenbrock’s function, Wood’s function, quadratic function, and so forth
are taken, on which different solution methods will be tested. The performance of each method is compared
in terms of the computational time.
Rosenbrock Function
The minimum of this “banana valley” function is zero and occurs at (1, 1)
Quadratic Function
Nonlinear Function
Wood’s Function
The two-variable function is given by
Constrained optimization
all optimization problems carry constraints, The supply of a product is constrained by the capacity of a
machine. The trajectory of a rocket is constrained by the final target as well as the maximum aerodynamic
load it can carry. The range of an aircraft is constrained by its payload, fuel capacity, and its aerodynamic
characteristics.
In constrained optimization problems, the feasible region gets restricted because of the presence of
constraints. This is more challenging because for a multivariable problem with several nonlinear
constraints, arriving at any feasible point itself is a daunting task. The constrained optimization problem
can be mathematically stated as,
The functions f, gi , and hj are all differentiable. The design variables are bounded by x l and xu. The
constraints gi are called as inequality constraints and hj are called equality constraints.
Optimality Conditions
Let us define the Lagrange function for the constrained optimization problem with the equality and
inequality constraints
Solution Techniques
For a simple optimization problem (say, with two variables) with one equality constraint, the simplest
approach would be to use a variable substitution method. In this method, one variable is written in the form
of another variable using the equality constraint. Then it is substituted in the objective function to make it
an unconstrained optimization problem that is easier to solve. For instance, consider the optimization
problem,
Penalty Function Method
The motivation of the penalty function method is to solve the constrained optimization problem using
algorithms for unconstrained problems. As the name suggests, the algorithm penalizes the objective
function in case constraints are violated. The modified objective function with penalty terms is written as,
In case constraints are satisfied (gi (x) ≤ 0), 〈gi (x)〉 will be zero and there will be no penalty on the objective
function. In case constraints are violated (gi (x) ≥ 0), 〈gi (x)〉 will be a positive value resulting in a penalty
on the objective function. The penalty will be higher for the higher infeasibility of the constraints. The
function F(x) can be optimized using the algorithms for unconstrained problems. The penalty function
method of this form is called the exterior penalty function method.
• The function becomes ill-conditioned as the value of the penalty terms is increased. Owing to abrupt
changes in the function value, the gradient value may become large and the algorithm may show
divergence.
• As this method does not satisfy the constraints exactly, it is not suitable for optimization problems where
feasibility must be ensured in all iterations.
As the name suggests, the augmented Lagrange multipliers (ALM) method combines both Lagrange
multipliers and penalty function methods. For an optimization problem with both equality and inequality
constraints, the augmented Lagrangian function is given by
where λj and βi are the Lagrange multipliers, rk is a penalty parameter fixed at the start of the iteration.
If the objective function also has to be reduced, then the following inequality must also be satisfied:
Zoutendijk’s method of feasible directions and Rosen’s gradient projection method are two popular
methods of feasible directions.
2. Explain a brief note about Gradient-based methods and Direct Search methods.
A discrete optimization problem seeks to determine the best possible solution from a finite set of
Possibilities.
From a computer science perspective, combinatorial optimization seeks to improve an algorithm by using
mathematical methods either to reduce the size of the set of possible solutions or to makethe search itself
faster
From a computer science perspective, combinatorial optimization seeks to improve an algorithm by using
mathematical methods either to reduce the size of the set of possible solutions or to make the search itself
faster
Example:
Combinatorial optimization refers primarily to the methods used to approach such problems and, for the
most part, does not provide guidelines on how to turn real-world problems
into abstract mathematical questions, or vice versa.
It is a sub field of mathematical optimization that consists of finding an optimal object from a finite set of
objects, where the set of feasible solutions is discrete or can be reduced to a
discrete set
Typical combinatorial optimization problems are the travelling salesman problem ("TSP"), the minimum
spanning tree problem ("MST"), and the knapsack problem.
pplications of Combinatorial Optimization
Applications of Combinatorial Optimization
Logistics
Deciding which taxis in a fleet to route to pick up fares
Determining the optimal way to deliver packages
Allocating jobs to people optimally
Designing water distribution networks Earth science problems (e.g. reservoir flow-rates)8
4.With any example explain the significance and steps involved in Combinatorial Optimization.
Summarizes the general procedure of Branch-and-bound for integer -programming maximization
with flow chart.
With any example explain the significance and steps involved in Combinatorial Optimization (Refer 3
Answer)
General procedure of Branch-and-bound for integer -programming maximization with flow chart
5. Illustrate the road map for MOOP in detail.
Absolute maximum/absolute minimum (also called global max/min): Specify a region R contained in the
domain of the function f. If the value at (a, b) is bigger than or equal to the value at any other point in R,
then f(a, b) is called the global maximum.
The weighted sum approach we scale our set of goals into a single goal by multiplying each of our
objectives by a user-supplied weight. This method is one of the most widely used approaches. A question
that comes to mind when doing the weighted sum approach is working out what weights to assign to each
objective.
e-Contraints Methods:
The ε-constraint method is one of the classical methods that is used to handle multi-objective
optimization problems (MOPs) by converting a MOP into single objective optimization problems (SOPs).
This method depends on the epsilon value, which represents a boundary of the objective.
Goal Progrmming:
In this method a utility function U is defined that combines all the objective functions of the multi
objective optimization problem. The utility function then becomes the objective function of the
optimization problem that can be solved along with the constraints.
UNIT 2 – Approximations
Stochastic or probabilistic programming deals with situations where some or all of the parameters of the
optimization problem are described by stochastic (or random or probabilistic) variables rather than by
deterministic quantities. The sources of random variables may be several, depending on the nature and the
type of problem. For instance, in the design of concrete structures, the strength of concrete is a random
variable since the compressive strength of concrete varies considerably from sample to sample.
Depending on the nature of equations involved (in terms of random variables) in the problem, a stochastic
optimization problem is called a stochastic linear, geometric, dynamic, or nonlinear programming problem.
The basic idea used in stochastic programming is to convert the stochastic problem into an equivalent
deterministic problem. The resulting deterministic problem is then solved by using familiar techniques
such as linear, geometric, dynamic, and nonlinear programming.
If X1,X2, . . . ,Xn are n mutually independent random variables with finite mean and variance (they may follow
different distributions), the sum
Sn =∑𝑛𝑖=1 𝑋𝑖
tends to a normal variable if no single variable contributes significantly to the sum as n tends to infinity.
Because of this theorem, we can approximate most of the physical phenomena as normal random variables.
Physically, Sn may represent, for example, the tensile strength of a fiber-reinforced material, in which case
the total tensile strength is given by the sum of the tensile strengths of individual fibers. In this case the
tensile strength of the material may be represented as a normally distributed random variable.
In tree-based GP, the computer programs are represented in tree structures that are evaluated recursively
to produce the resulting multivariate expressions. Traditional nomenclature states that a tree node (or just
node) is an operator [+,-,*,/] and a terminal node (or leaf) is a variable [a,b,c,d].
Tree-based GP was the first application of Genetic Programming. There are several other types (as
presented on the home page of this website) such as linear, Cartesian, and stack-based which are typically
more efficient in their execution of the genetic operators. However, Tree-based GP provides a visual means
to engage new users of Genetic Programming, and remains viable when built upon a fast-programming
language or underlying suite of libraries.
Genetic algorithms and similar EAs are powerful optimization techniques, but they have an inherent
limitation: they incorporate the assumed solution structure in the representation of their candidate
solutions. However, we may not know which parameters need to be optimized in a given problem. Also, we
may not know the structure of the parameters that need to be optimized. Are the parameters real numbers,
or state space machines, or computer programs, or complex arrays, or time schedules, or something else
Genetic programming (GP) is an attempt to generalize EAs to an algorithm that can learn not only the best
solution to a problem given a specific structure, but that can also learn the optimal structure. GP evolves
computer programs to solve optimization problems. This is the distinctive feature of GP compared with
other EAs; other EAs evolve solutions, while GP evolves programs that can compute solutions. In fact, this
was one of the original goals of the artificial intelligence community.
Player B
Player A 3 8 4 4
-7 2 10 2
2. Solve the optimization problem using geometric programming Minimize f(x) = 3x 1-1x2-3 + 4x12
x2 x3-2 +5x1x24x3-1 + 6x3
3. Formulate the objective function in the form of a posynomial form in detail.
4. Explain in detail about integer nonlinear programming with relevant examples.
5. Explain the DUALITY in Non-linear programming? Explain the concept of local and global
minima
6. Explain CONVEXITY in Non – linear programming and explain the impact of minima and
convexity in NLP
7. List out the GPs steps with an example and point out about the common operators used for GPs
Termination Criteria
What is the termination criterion? This question needs to be answered for all EAs, but it may be especially
important for GP. This is because the fitness measure is usually more computationally demanding in GP
than in other EAs. The choice of the termination criterion could determine whether or not the GP is
successful. As with other EAs, the termination criterion for GP could include factors such as number of
iterations, number of fitness evaluations, run time, best fitness value, change in best fitness over several
generations, or standard deviation of fitness values over the entire population.
Terminal Set
What is the terminal set for the evolving computer programs? This set describes the symbols that can
appear at the leaves of the syntax trees. The terminal set is the set of all possible inputs to the evolving
computer programs. This set includes variables that are input to the computer program, along with
constants that we think might be important. The constants could include basic integers like 0 and 1, and
also constants that may be important for the particular optimization problem
(π, e, and so on).
The syntax trees have three terminals: x, y, and z. Some
constants can be obtained implicitly; for example, x - x =
0, and x/x — 1. So as long as we have a subtraction and
division function, we do not really need the 0 and 1
constants. However, most GP implementations should
include constants in their terminal sets.
We can also use random numbers in the terminal set, but
usually we do not want a random number to change after
it is generated. These type of random numbers are called
ephemeral random constants. Ephemeral random
constants are obtained by specifying a quantity denoted
as R in the terminal set. If R is chosen as a terminal during
population initialization, we generate a random number
r1 between given limits, and insert r1 into the GP
individual. From that point on, that particular value r1
does not change. However, if R is chosen again for
initialization of another individual, or for mutation, then
we generate a new random constant r2 for that
realization. The choice of the limits within which to
generate ephemeral random constants is another GP
design decision. Defining the terminal set for a GP
application is a balancing act. If we use a set that is too
small, then the GP will not be able to effectively solve our
problem. However, if we use a terminal set that is too
large, then it may be too difficult for the GP to find a good
solution in a reasonable time.
Function Set
What is the function set for the evolving computer programs? This set describes the functions that can
appear at the non-terminal nodes of the syntax trees, such as the following.
• Standard mathematical operators can be included in the function set (for example, addition, subtraction,
multiplication, division, absolute value).
• Problem-specific functions that we think are important for our particular optimization problem can be
included in the function set (for example, exponential functions, logarithmic functions, trigonometric
functions, filters, integrators, differentiators).
• Conditional tests can be included in the function set (for example, greater than, less than, equal to).
• Logic functions can be included in the function set, if we think that they could be applicable to the solution
of our particular optimization problem (for example, and, nand, or, xor, nor, not).
• Variable assignment functions can be included in the function set.
• Loop statements can be included in the function set (for example, while loops, for loops).
• Subroutine calls can be included in the function set, if we have a set of predefined functions that we have
created for our problem.
The syntax trees in Figure include five functions: addition, subtraction, multiplication, division, and
absolute value. We need to find the right balance in our definition of the function set and the terminal set.
The sets need to be large enough to be able to represent a solution to our problem, but if they are too large,
then the search space will be so large that the GP will have a hard time finding a good
solution.
Some functions need to be modified for GP because the syntax trees evolve might not have legal function
arguments. For example, GP could evolve the sexpression (/ x 0), which is division by zero. This would
result in a Lisp error, which would cause the GP to terminate. Therefore, instead of using the standard
division operator in Lisp, we can define a division operator DIV that protects against division by zero, and
that also protects against overflow due to division by a very small number:
where ϵ is a very small positive constant, like 10-20. Equation shows the Lisp syntax for defining a protected
division routine. The DIV function returns 1 if the divisor has a very small magnitude. We may need to
redefine other functions in a similar way (logarithm functions, inverse trigonometric functions, and so on)
to make sure that the functions in our function set can handle all possible inputs.
Initialization
How should we generate the initial population of computer programs? We have two basic options for
initialization, which are referred to as the full method and the grow method. We can also combine these
options to get a third option, which is referred to as the ramped half-and-half method.
The full method creates programs such that the number of nodes from each terminal node to the top-level
node is Dc, a user-specified constant. Dc is called the depth of the syntax tree. As an example, Parent 1 in
Figure 7.3 has a depth of three, while Parent 2 has a depth of four. Parent 1 in Figure 7.3 is a full syntax tree
because there are three nodes from each terminal node to the top-level addition node. However, Parent 2
is not a full syntax tree because some of the program branches have a depth of four while others only have
a depth of three.
We can use recursion to generate random syntax trees. For example, if we want to generate a syntax tree
with a structure like Parent 2, we first generate the subtraction node at the top level and note that it
requires two arguments. For the first argument, we generate the multiplication node and note that it
requires two arguments. This process continues for each node and each argument until we have generated
enough levels to reach the desired depth. When we reach the desired depth, we generate a random terminal
node to complete that branch of the syntax tree. The figure below illustrates the concept for a recursive
algorithm that generates random computer programs. We can generate a random syntax tree by calling
routine GrowProgramFull(Dc, 1), where Dc is our desired syntax tree depth. GrowProgramFull calls itself
each time it needs to add another layer in its growing
syntax tree.
The grow method of initialization creates programs such that the number of nodes from each terminal node
to the top-level node is less than or equal to Dc. If the parents in Figure 7.3 were created by random
initialization, then Parent 1 might have been generated with either the full method or the grow method,
while Parent 2 was definitely generated with the grow method since it is not a full syntax tree. The grow
method can be implemented the same way as the full method, except that when we generate a random node
at depths less than Dc , either a function or terminal node can be generated. If a function node is generated,
the syntax tree continues to grow. As with the full method, when we reach the maximum depth Dc, we
generate a random terminal to complete that branch of the syntax tree. The figure below illustrates the
concept for a recursive algorithm that generates random computer programs with the grow method.
The ramped half-and-half method generates half of the initial population with the full method, and half with
the grow method. Also, it generates an equal number of syntax trees for each value of depth between 2 and
Dc , which is the maximum allowable depth specified by the user. Figure 7.8 illustrates the concept of
ramped half-and-half syntax tree initialization.
Koza experimented with the three different types of initializations described above for some simple GP
problems. He found a difference in the probability of GP success depending on which initialization method
was used, as shown in Table 7.2. The table shows that the ramped half-and-half initializationmethod is
generally much better than the other two initialization methods.
The large population size that is used in GP, along with the large number of possible nodes at which
crossover can occur, usually means that good GP results do not depend on mutation. Often we can get good
results with pm = 0. However, mutation may still be desirable just in case an important terminal or function
is lost from the population. If that occurs, mutation is the only way that it could re-enter the population.
5. We need to specify the crossover probability pc. This is similar to G As. After selecting two parents in
Figure 7.5, we can either use crossover to combine them, or we can instead clone them for the next
generation. The line: Mate pi and pi to create children c 1 and c2 in Figure 7.5 would then be replaced with
something like the following:
Most experience suggests that crossover is an important aspect of GP and should be used with a probability
pc ≥ 0.9.
6. We need to decide whether or not to use elitism. As with any other EA, we can save the best m computer
programs in GP from one generation to the next to make sure they are not lost in the following generation.
The parameter m is called the elitism parameter. Elitism can be implemented in several different ways. For
example, we could archive the best m individuals at the end of a generation, create the children for the next
generation as usual, and then replace the worst m children with the elites from the previous generation.
Alternatively, we could copy the m elites to the first m children each generation, and then create only (N —
m) additional children each generation (where N is the population size).
7. We need to specify Di, the maximum program size of the initial population. A program's size can be
quantified by its depth, which measures the maximum number of nodes between the highest level and the
lowest level (inclusive). For example, Parent 1 in Figure 7.3 has a depth of three, while Parent 2 has a depth
of four.
8. We also need to specify Dc, the maximum depth of child programs. During GP operation, child programs
can grow larger and larger with each succeeding generation. If a maximum depth is not enforced, then child
programs can become unreasonably long, wasting space and execution time; this is called GP bloat. The
maximum depth Dc can be enforced in several ways. One way is to replace a child with one of its parents if
the child's depth exceeds Dc. Another way is to redo the crossover operation if the child's depth exceeds Dc.
Yet another way is to examine the parent syntax trees before choosing their crossover points, and constrain
the randomly selected crossover points so that Dc will not be exceeded by the children's depths.
9. We need to decide whether or not we want to allow a terminal node in a syntax tree to be replaced with
a subtree during crossover. Figure 7.4 shows that the z terminal in Parent 1 is selected for crossover, and
is replaced with a subtree in Child 1. We use pi to denote the probability of crossover at an internal node.
When selecting a crossover point, we generate a random number r uniformly distributed on [0,1]. If r is
less than pi, then we select a terminal node for crossover; that is, we select a symbol in the syntax tree that
is not immediately preceded by a left parenthesis. However, if r is greater than pi, then we select an s-
expression for crossover; that is, we select a subtree that is surrounded by matching left and right
parentheses for crossover.
10. We need to decide whether or not to worry about duplicate individuals in the population. Duplicate
individuals are a waste of computer resources. In EAs with relatively small search spaces or small
populations, duplicates can arise quite often, and dealing with duplicates can be an important aspect of the
EA. However, in GP, the search space is so large that duplicates rarely occur. Therefore, we usually do not
need to worry about duplicate individuals in GP.
• Knowledge: The functions of colony could not be understood with the knowledge of functioning of
a agent.
• Sensitivity: Even a small change in the simple rules results in different group level behavior.
Benefits of swarming
• More transparency. Swarming makes for a nicer experience for all parties involved.
• Developing new skills. Swarming opens new ways to collaborate: it thrives on the diverse skill sets
in your team.
• Employee empowerment.
• The goal is to efficiently explore the search space in order to find (near-)optimal solutions.
We call the parameter w the inertia weight constant. It is between 0 and 1 and determines how much
should the particle keep on with its previous velocity (i.e., speed and direction of the search). The
parameters c1 and c2 are called the cognitive and the social coefficients respectively. They controls how
much weight should be given between refining the search result of the particle itself and recognizing the
search result of the swarm. We can consider these parameters controls the trade off
between exploration and exploitation.
Vector subtraction.
Diagram by Benjamin D. Esham, public domain.
The positions pbesti and gbest are updated in each iteration to reflect the best position ever found thus far.
One interesting property of this algorithm that distinguish it from other optimization algorithms is that it
does not depend on the gradient of the objective function. In gradient descent, for example, we look for the
minimum of a function f(X) by moving X to the direction of −∇f(X) as it is where the function going down
the fastest. For any particle at the position X at the moment, how it moves does not depend on which
direction is the “down hill” but only on where are pbest and gbest. This makes PSO particularly suitable if
differentiating f(X) is difficult.
Another property of PSO is that it can be parallelized easily. As we are manipulating multiple particles to
find the optimal solution, each particles can be updated in parallel and we only need to collect the updated
value of gbest once per iteration. This makes map-reduce architecture a perfect candidate to implement
PSO.
Computational Implementation of PSO
Consider an unconstrained maximization problem: Maximize f (X) with X (l) ≤ X ≤ X (u) where X (l) and X
(u) denote the lower and upper bounds on X, respectively. The PSO procedure can be implemented through
the following steps.
1. Assume the size of the swarm (number of particles) is N. To reduce the total number of function
evaluations needed to find a solution, we must assume a smaller size of the swarm. But with too small a
swarm size it is likely to take us longer to find a solution or, in some cases, we may not be able to find a
solution at all. Usually a size of 20 to 30 particles is assumed for the swarm as a compromise.
2. Generate the initial population of X in the range X (l) and X (u) randomly as X1, X2, . . . , XN . Hereafter,
for convenience, the particle (position of) j and its velocity in iteration i are denoted as X (i) j and V (i) j ,
respectively. Thus the particles generated initially are denoted X1(0), X2(0), . . . , XN (0). The vectors Xj (0)(j
= 1, 2, . . . , N ) are called particles or vectors of coordinates of particles (similar to chromosomes in genetic
algorithms). Evaluate the objective function values corresponding to the particles as
f [X1(0)], f [X2(0)], . . . , f [XN (0)].
3. Find the velocities of particles. All particles will be moving to the optimal point with a velocity. Initially,
all particle velocities are assumed to be zero. Set the iteration number as i = 1.
4. In the ith iteration, find the following two important parameters used by a typical particle j : (a) The
historical best value of Xj (i) (coordinates of j th particle in the current iteration i), Pbest, j, with the highest
value of the objective function, f [Xj (i)], encountered by particle j in all the previous iterations. The
historical best value of Xj (i) (coordinates of all particles up to that iteration), Gbest, with the highest value
of the objective function f [Xj (i)], encountered in all the previous iterations by any of the N particles. (b)
Find the velocity of particle j in the ith iteration as follows:
Vj (i) = Vj (i − 1) + c1r1[Pbest,j − Xj (i − 1)] + c2r2[Gbest − Xj (i − 1)]; j = 1, 2, . . . , N where c1 and c2 are the
cognitive (individual) and social (group) learning rates, respectively, and r1 and r2 are uniformly
distributed random numbers in the range 0 and 1. The parameters c1 and c2 denote the relative importance
of the memory (position) of the particle itself to the memory (position) of the swarm. The values of c1 and
c2 are usually assumed to be 2 so that c1r1 and c2r2 ensure that the particles would overfly the target
about half the time. (c) Find the position or coordinate of the j th particle in ith iteration as
Xj (i) = Xj (i − 1) + Vj (i); j = 1, 2, . . . , N
where a time step of unity is assumed in the velocity. Evaluate the objective function values corresponding
to the particles as
f [X1(i)], F[X2(i)], . . . , F[XN (i)].
5. Check the convergence of the current solution. If the positions of all particles converge to the same set
of values, the method is assumed to have converged. If the convergence criterion is not satisfied, step 4 is
repeated by updating the iteration number as i = i + 1, and by computing the new values of Pbest,j and
Gbest. The iterative process is continued until all particles converge to the same optimum solution.
The main advantages of the PSO algorithm are summarized as: simple concept, easy
implementation, robustness to control parameters, and computational efficiency when compared with
mathematical algorithm and other heuristic optimization techniques.
PSO can be applied for various optimization problems, for example, Energy-Storage Optimization.
PSO can simulate the movement of a particle swarm and can be applied in visual effects like those special
effects in the Hollywood film.
Ant Searching Behavior An ant k, when located at node i, uses the pheromone trail τij to compute the
probability of choosing j as the next node:
where α denotes the degree of importance of the pheromones and N (k) i indicates the set of neighborhood
nodes of ant k when located at node i. The neighborhood of node i contains all the nodes directly connected
to node i except the predecessor node (i.e., the last node visited before i). This will prevent the ant from
returning to the same node visited immediately before node i. An ant travels from node to node until it
reaches the destination (food) node.
Path Retracing and Pheromone Updating
Before returning to the home node (backward node), the kth ant deposits Δτ (k) of pheromone on arcs it
has visited. The pheromone value τij on the arc (i, j ) traversed is updated as follows:
Because of the increase in the pheromone, the probability of this arc being selected by the forthcoming ants
will increase.
Pheromone Trail Evaporation
When an ant k moves to the next node, the pheromone evaporates from all the arcs ij according to the
relation
where p ∈ (0, 1] is a parameter and A denotes the segments or arcs traveled by ant k in its path from home
to destination. The decrease in pheromone intensity favors the exploration of different paths during the
search process. This favors the elimination of poor choices made in the path selection. This also helps in
bounding the maximum value attained by the pheromone trails. An iteration is a complete cycle involving
ant’s movement, pheromone evaporation and pheromone deposit.
Algorithm
ACO was used to solve graph problems by investigating possible paths on the graphs. ACO is
inspired by the behavior of ants that provides to find shortest distance between their nest and food
resource by means of pheromone. Ants choose shortest way while searching food resources rapidly in
progress of time.
5. Explain the idea behind Harmony Search Algorithms
Geem formalized these three options into quantitative optimization process and the three corresponding
components.
Components such as
1.Harmony memory (HM)
2.Pitch adjusting
3.Randomization are introduced .
Harmony memory considering rate is similar to the crossover rate c in GAs.In order to use this memory
effectively, it is typically assigned a parameter called harmony memory considering rate (HMCR [0, 1]).
If this rate is low (near 0), only few best harmonies are utilized and thus convergence of algorithm is slow.
If this rate is very high (near 1), it results in exploitation of the harmonies in the HM, thus the solution space
is not explored properly leading to potentially inefficient solutions.
The second component is pitch adjustment determined by a pitch bandwidth (BW) (also referred as fret
width [15] ) and a pitch adjusting rate (PAR), it corresponds to generating a slightly dierent solution in the
HS algorithm.
Pitch can be adjusted linearly or nonlinearly however most often linear adjustment is used.
Hinew = Hiold + BW × ri where ri ∈ [ 1, 1] and 1 ≤ i ≤ D
Where Hi is the ith component of the existing harmony or solution and Hinew is the ith component of
old
new harmony after the pitch adjusting action and BW is the bandwidth.
Algorithm:
1. Initialize the optimization problem and algorithm parameters.
2. Initialize the harmony memory(HM).
3. Improvisation of a new harmony.
4. Update the HM
5. Termination
HS creates one child each generation Algorithm :
Games: Su-do-ku
The contribution of HS lies in two areas. First, the way that HS combines these ideas is novel. Second, the
musical motivation of HS is novel.
Furthermore, the HS algorithm is a population-based metaheuristic, this means that multiple harmonics
groups can be used in parallel
➔ Network optimization refers to the tools, techniques, and best practices used to monitor and
enhance network performance.
➔ The first step of the optimization process is to measure a series of network performance metrics
and identify any issues.
➔ Network performance monitoring includes measuring traffic, bandwidth, jitter, and latency caused
by the likes of insufficient infrastructure or inadequate network security.
➔ Dynamic Network optimization problems whose parameters change (e.g., time varying) and are
not static.
2. Define Ranking
➔ The simplest form of job evaluation method.
➔ The method involves ranking each job relative to all other jobs,usually based on some overall factor
like 'job difficulty’.
➔ Each job as a whole is compared with other and this comparison of jobs goes on until all the jobs have
been evaluated and ranked.
➔ Example: AssetRank was proposed to rank any dependency attack graph using a random walk model.
AssetRank is a generalization of PageRank extending it to handle both conjunctive and disjunctive nodes.
AssetRank is supported by an underlying probabilistic interpretation based on a random walk.