You are on page 1of 72

Submitted in partial fulfilment of the requirement

for the award of Bachelor of Engineering in


Computer Science and Engineering

A Project Report on

A Modified HNN Approach for Constraint


Satisfaction Problem

University Institute of Technology


The University of Burdwan

Submitted by

Suman Kumar Mahato (2015-1035)


Navneet Prashant (2015-1006)
Saurav Kumar (2015-1025)
Subhajit Chakraborty (2014-1074)
(Computer Science Engineering)

Under the guidance of

Mr. Soumik Ghosh


Assistant Professor
Department of Computer Science & Engineering
University Institute of Technology
The University of Burdwan
Acknowledgements

We would like to express our deepest gratitude to our advisor Mr. Soumik Ghosh,
Assistant Professor, CSE Department, University Institute of Technology, The
University of Burdwan for his guidance and support. His extreme energy, creativity
and excellent skill of knowledge have always been a constant source of motivation for
us.

We would also like to thank Dr. Souvik Bhattacharya, In-charge, CSE Department,
University Institute of Technology, The University of Burdwan for devoting his time
inspiring and helping us in many aspects.

We are also grateful to Dr. Abhijit Mitra, Principal, University Institute of Technology,
The University of Burdwan to give us the opportunities for continuing our study.

------------------------------------------------------

Suman Kumar Mahato (2015-1035)

-------------------------------------------------------

Navneet Prashant (2015-1006)

--------------------------------------------------------

Saurav Kumar (2015-1025)

--------------------------------------------------------

Subhajit Chakraborty (2014-1074)


Department of Computer Science & Engineering
University Institute of Technology
The University of Burdwan

Certificate of Approval
This is to certify that project entitled “A Modified HNN Approach for
Constraint Satisfaction Problem” is hereby approved as a creditable
engineering study carried out, presented and submitted by Suman Kumar
Mahato (2015-1035), Navneet Prashant (201501006), Saurav Kumar (2015-
1025), Subhajit Chakraborty (2014-1074) in a satisfactory manner to narrate its
acceptance as prerequisite in partial fulfilment of the academic requirements
for the award of degree of Bachelor of engineering in Computer Science &
Engineering department. It is a bonafide work carried out at University
Institute of Technology, The University of Burdwan. The thesis has not been
submitted for the award of any other degree.
Date:

Mr. Soumik Ghosh Dr. Souvik Bhattacharyya Dr. Abhijit Mitra


Assistant Professor & In- Charge of Department of Principal,
project mentor of Computer Science &
University Institute of
Department of Computer Engineering,
Technology,
Science & Engineering, University Institute of
The University of Burdwan
University Institute of Technology,
Technology, The The University of Burdwan
University of Burdwan
Abstract
Constraint Satisfaction Problem involves finding values for problem variables which are
subject to given constraints specifying the acceptable combination of values. For our
project we choose k-Coloring Problem of Graph which is NP-complete for k ≥ 3. Our
proposed method modifies the pre-existing method of k-coloring using Hopfield Neural
Network which converges faster than previously proposed method with 100%
convergence rate to valid solution with no parameter tuning needed.

Keywords:
Constraint Satisfaction Problem(CSP), Graph Coloring Problem (GCP), Hopfield
Network, Maximum Neuron model
1

Contents
Abstract

List of Figures 3

List of Tables 5

1. Chapter 1 : Introduction 6
1.1 Constraint Satisfaction Problem (CSP) 7
1.2 Graph Coloring 8
1.2.1 What is Graph ? 8
1.2 2 Graph Coloring Problem 8
1.2.3 Graph & its Adjacency Matrix 10
1.2.4 Why Coloring? 10
1.2.5 Practical Applications 11
1.3 Artificial Neural Network 18
1.3.1 ANN Model 19
1.3.2 Components of ANN 19
1.3.3 Learning Paradigms 21
References 24

2. Chapter 2 : Literature Survey 25


References 30

3. Chapter 3 : Graph Coloring Techniques 32


3.1 Previous Techniques Graph colouring 33
3.1.1 Greedy Algorithm 33
3.1.2 Recursive Largest First 34
3.1.3 AntCol Algorithm 35
3.1.4 Hill Climbing Algorithm 37
3.1.5 Hybrid Evolutionary Algorithm 38
3.1.6 Dsatur Algorithm 39

References 41

4. Chapter 4: Hopfield Network Approach for k- coloring 42


4.1 Hopfield Nets 43
4.2 Why is the decay term harmful? 44
4.3 Maximum Neuron Model 47
4.4 Convergence of maximum Neural Net to Valid Solutions 50
4.5 Computational power of Hopfield Neural Nets 52
4.6 Neural Network representation for Graph coloring 53
4.7 Pseudo Code 56

References 57
2

5. Chapter 5 : Proposed Approach 58


5.1 Proposed Modified HNN for Graph Coloring 59
5.2 Pseudo Code 59

References 61

6. Chapter 6 : Comparative Study 63

References 66

7. Chapter 7 : Conclusion 67
3

List of Figures

Serial Figure no. and name Page no.


no.
1. Fig. 1.1 A small graph (a), and corresponding 5-colouring (b). 8

2. Fig. 1.2 If we extract the vertices in the dotted circle, we are left 9
with a subgraph that clearly needs more than four colours

3. Fig. 1.3 Adjacency Matrix 10

4. Fig. 1.4 Illustration of how proper 5- and 4-colourings can be 11


constructed from the same graph.

5. Fig. 1.5 Tasks allocated to processors, the diagram shows the tasks 13
namely task1, task2, task3 and task4 are allocated to the
processors (P1, P5); (P1, P6); (P2, P4) and (P3, P7) respectively.

6. Fig. 1.6 A small timetabling problem (a), a feasible 4-colouring 14


(b), and its corresponding timetable solution using four timeslots
(c).

7. Fig. 1.7 A set of taxi journey requests over time (a), its 15
corresponding interval graph and 3- colouring (b), and (c) the
corresponding assignment of journeys to taxis.

8. Fig. 1.8 (a) An example computer program together with the live 16
ranges of each variable. Here, the statement “vi ←. . .” denotes
the assignment of some value to variable vi, whereas “. . . vi . . .
” is just some arbitrary operation using vi.
(b) shows an optimal colouring of the corresponding interference
graph

9. Fig. 1.9 Architecture of feedforward ANN. 18

10. Fig. 1.10 A Biological Neuron 20

11. Fig. 4.1 The artificial Neuron 43


4

12. Fig. 4.2 Mapping the graph coloring problem onto neural net. 54

13. Fig. 6.1 Comparision Chart 1 63

14. Fig. 6.2 Comparision Chart 2 64

15. Fig. 6.3 Comparision Chart 3 64

16. Fig. 6.4 Comparision Chart 4 65


5

List of Tables

Serial no. Tables Page no.

1. Table 6.1 63

2. Table 6.2 64

3. Table 6.3 65

4. Table 6.4 65
6

Chapter 1
Introduction
7

1.1 Constraint Satisfaction Problem:


Constraint satisfaction problems (CSPs) are mathematical questions defined as a set of
objects whose state must satisfy a number of constraints or limitations. CSPs represent the
entities in a problem as a homogeneous collection of finite constraints over variables,
which is solved by constraint satisfaction methods. CSPs are the subject of intense research
in both artificial intelligence and operations research, since the regularity in their
formulation provides a common basis to analyse and solve problems of many seemingly
unrelated families. CSPs often exhibit high complexity, requiring a combination of
heuristics and combinatorial search methods to be solved in a reasonable time. The
Boolean satisfiability problem (SAT), the satisfiability modulo theories (SMT) and answer
set programming (ASP) can be roughly thought of as certain forms of the constraint
satisfaction problem.
Examples of simple problems that can be modelled as a constraint satisfaction problem
include:
• Eight queens puzzle.
• Map coloring problem.
• Graph Coloring Problem.
• Sudoku, Crosswords, Futoshiki, Kakuro (CrossSums), Numbrix, Hidato and
many other logic puzzles.

Formal definition:
Formally, a constraint satisfaction problem is defined as a triple (X,D,C), where
X= {X1… Xn } is a set of variables,
D= {D1… Dn } is a set of the respective domains of values, and
C= {C1… Cn} is a set of constraints.

Each variable Xi can take on the values in the nonempty domain Di. Every constraint Cj
∈ C is in turn a pair (tj, Rj) , where tj ∪ X is a subset of k variables and Rj is a k -
ary relation on the corresponding subset of domains Dj. An evaluation of the variables is a
function from a subset of variables to a particular set of values in the corresponding subset
of domains Dj. An evaluation v satisfies a constraint (tj ,Rj)if the values assigned to the
variables tj satisfies the relation Rj .
An evaluation is consistent if it does not violate any of the constraints. An evaluation
is complete if it includes all variables. An evaluation is a solution if it is consistent and
complete; such an evaluation is said to solve the constraint satisfaction problem.[1]

For our Project we consider the Graph Coloring Problem as the constraint satisfaction
problem.
8

1.2 Graph Coloring

1.2.1 What is a Graph :


Mathematically, a graph can be thought of as a set of objects in which some pairs of objects
are connected by links. The interconnected objects are usually called vertices, with the links
connecting pairs of vertices termed edges. Graphs can be used to model a surprisingly large
number of problem areas, including social networking, chemistry, scheduling, parcel
delivery, satellite navigation, electrical engineering, and computer networking.

Formal Definition:
A graph is an ordered pair G = (V, E) comprising:
• V a set of vertices (also called nodes or points);
• E ⊆ {{x, y} | (x, y) ∈ V2 ∧ x ≠ y} a set of edges (also called links or lines), which
are unordered pairs of vertices (i.e., an edge is associated with two distinct vertices).

1.2.2 Graph Coloring Problem:


The graph colouring problem is one of the most famous problems in the
field of graph theory and has a long and illustrious history. In a nutshell it asks, given any
graph, how might we go about assigning “colours” to all of its vertices so that
(a) no vertices joined by an edge are given the same colour, and
(b) the number of different colours used is minimised?

Fig. 1.1 A small graph (a), and corresponding 5-colouring (b).

Figure 1.1 shows a picture of a graph with ten vertices (the circles), and 21 edges (the
lines connecting the circles). It also shows an example colouring of this graph that uses
9

five different colours. We can call this solution a “proper” colouring because all pairs
of vertices joined by edges have been assigned to different colours, as required by the
problem. Specifically, two vertices have been assigned to colour 1, three vertices to
colour 2, two vertices to colour 3, two vertices to colour 4, and one vertex to colour 5.

Fig. 1.2 If we extract the vertices in the dotted circle, we are left with a subgraph that
clearly needs more than four colours

Actually, this solution is not the only possible 5-colouring for this example graph. For
example, swapping the colours of the bottom two vertices in the figure would give us
a different proper 5-colouring. It is also possible to colour the graph with anything
between six and ten colours (where ten is the number of vertices in the graph), because
assigning a vertex to an additional, newly created, colour still ensures that the
colouring remains proper. But what if we wanted to colour this graph using fewer than
five colours? Is this possible? To answer this question, consider Figure 1.2, where the
dotted line indicates a selected portion of the graph. When we remove everything from
outside this selection, we are left with a subgraph containing just five vertices.
Importantly, we can see that every pair of vertices in this subgraph has an edge between
them. If we were to have only four colours available to us, as indicated in the figure
we would be unable to properly colour this subgraph, since its five vertices all need to
be assigned to a different colour in this instance. This allows us to conclude that the
solution in Figure 1.2 is actually optimal, since there is no solution available that uses
fewer than five colours.
10

1.2.3 Graph and its Adjacency Matrix:

Fig 1.3 Adjacency Matrix

In Fig 1.3 the graph is represented using a matrix of size total number of vertices
by a total number of vertices. That means a graph with 4 vertices is represented using a
matrix of size 4X4. In this matrix, both rows and columns represent vertices. This matrix
is filled with either 1 or 0. Here, 1 represents that there is an edge from row vertex to
column vertex and 0 represents that there is no edge from row vertex to column vertex.

1.2.4 Why Colouring?


Graph coloring is NP-hard and k-coloring is NP-complete for any integer k = 3 (but 2-
coloring is polynomial). Therefore, no algorithm can solve graph coloring in polynomial
time in the general case (assuming that N 6= NP). In addition, note that the problem to
find a k-coloring with no more than twice the optimal number of colors is still NP-hard.
Also, finding an optimal coloring turns out to be particularly difficult in practice. As a
matter of fact, there are graphs with as few as 125 vertices that can not be solved optimally
even by using the best performing exact algorithms. For larger graphs, it is therefore
necessary to resort to heuristics, i.e., algorithmic techniques that provide sub-optimal
solutions within an acceptable amount of time.
11

1.2.5 Graph Coloring Practical Applications

1.2.5.1 A Team Building Exercise


An instructive way to visualise the graph colouring problem is to imagine the vertices of a
graph as a set of “items” that need to be divided into “groups”. As an example,
imagine we have a set of university students that we want to split into groups for a team
building exercise. In addition, imagine we are interested in dividing the students so that
no student is put in a group containing one or more of his friends, and so that the number
of groups used is minimal. How might this be done? Consider the example given in the
table in Figure 1.3(a), where we have a list of eight students with names A through to H,
together with information on who their friends are. From this information we can see that
student A is friends with three students (B, C and G), student B is friends with four students
(A, C, E, and F), and so on. Note that the information in this table is “symmetric” in
that if student x lists student y as one of his friends, then student y also does the same with
student x. This sort of relationship occurs in social networks such as Facebook, where two
people are only considered friends if both parties agree to be friends in advance. An
illustration of this example in graph form is also given in the figure.

Fig. 1.4 Illustration of how proper 5- and 4-colourings can be constructed from the same
graph.
12

Let us now attempt to split the eight students of this problem into groups so that each
student is put into a different group to that of his friends’. A simple method to do this
might be to take the students one by one in alphabetical order and assign them to the first
group where none of their friends are currently placed. Walking through the process, we
start by taking student A and assigning him to the first group. Next, we take student B and
see that he is friends with someone in the first group (student A), and so we put him into
the second group. Taking student C next, we notice that he is friends with someone in the
first group (student A) and also the second group (student B), meaning that he must now
be assigned to a third group. At this point we have only considered three students, yet we
have created three separate groups. What about the next student? Looking at the
information we can see that student D is only friends with E and F, allowing us to place
him into the first group alongside student A. Following this, student E cannot be assigned
to the first group because he is friends with D, but can be assigned to the second.
Continuing this process for all eight students gives us the solution shown in Figure 1.4(b).
This solution uses four groups, and also involves student F being assigned to a group by
himself
Can we do any better than this? By inspecting the graph in Figure 1.4(a), we can
see that there are three separate cases where three students are all friends with one another.
Specifically, these are students A, B, and C; students B, E, and F; and students D, E, and
F. The edges between these triplets of students form triangles in the graph. Because of
these mutual friendships, in each case these collections of three students will need to be
assigned to different groups, implying that at least three groups will be needed in any valid
solution. However, by visually inspecting the graph we can see that there is no occurrence
of four students all being friends with one another. This hints that we may not necessarily
need to use four groups in a solution.
In fact, a solution using three groups is actually possible in this case as Figure 1.4(c)
demonstrates. This solution has been achieved using the same assignment process as
before but using a different ordering of the students, as indicated. Since we have already
deduced that at least three groups are required for this particular problem, we can conclude
that this solution is optimal.
The process we have used to form the solutions shown Figures 1.4(b) and (c) is
generally known as the GREEDY algorithm for graph colouring, and we have seen that
the ordering of the vertices (students in this case) can influence the number of colours
(groups) that are ultimately used in the solution it produces. The GREEDY algorithm and
its extensions are a fundamental part of the field of graph colouring and will be considered
further in later chapters. Among other things, we will demonstrate that there will always
be at least one ordering of the vertices that, when used with the GREEDY algorithm, will
result in an optimal solution.
13

1.2.5.2 Biprocessor Tasks


Assume that we have a set of processors (machines) and a set of tasks, each task has to be
executed on two preassigned processors simultaneously. A processor cannot work on two
jobs at the same time. For example, such biprocessor tasks arise when we want to schedule
file transfers between processors [2] or in the case of mutual diagnostic testing of processors
[3]. Consider the graph whose vertices correspond to the processors, and if there is a task
that has to be executed on processors i and j, then we add an edge between the two
corresponding vertices. Now the scheduling problem can be modeled as an edge coloring
of this graph: we have to assign colors to the edges in such a way that every color appears
at most once at a vertex. Edge coloring is NP-hard [4], but there are good approximation
algorithms. The maximum degree Δ of the graph is an obvious lower bound on the
number of colors needed to color the edges of the graph. On the other hand, if there are
no multiple edges in the graph (there are no two tasks that require the same two
processors), then Vizing’s Theorem gives an efficient method for obtaining a (Δ + 1)-edge
coloring. If multiple edges are allowed, then the algorithm of [8] gives an 1:1-approximate
solution.

Fig. 1.5 . Tasks allocated to processors, the diagram shows the tasks namely task1, task2,
task3 and task4 are allocated to the processors (P1, P5); (P1, P6); (P2, P4) and (P3, P7)
respectively.

1.2.5.3 Constructing Timetables


A second important application of graph colouring arises in the production of timetables
at colleges and universities. In these problems we are given a set of “events”, such as
lectures, exams, classroom sessions, together with a set of “timeslots” (e.g., Monday
09:00–10:00, Monday 10:00–11:00 and so on). Our task is to then assign the events to the
timeslots in accordance with a set of constraints. One of the most important of these
constraints is what is often known as the “event-clash” constraint. This specifies that if a
person (or some other resource of which there is only one) is required to be present in a
pair of events, then these events must not be assigned to the same timeslot since such an
assignment will result in this person/resource having to be in two places at once.
14

Timetabling problems can be easily converted into an equivalent graph colouring


problem by considering each event as a vertex, and then adding edges between any vertex
pairs that are subject to an event clash constraint. Each timeslot available in the timetable
then corresponds to a colour, and the task is to find a colouring such that the number of
colours is no larger than the number of available timeslots.

Fig. 1.6 A small timetabling problem (a), a feasible 4-colouring (b), and its corresponding
timetable solution using four timeslots (c).

Figure 1.6 shows an example timetabling problem expressed as a graph colouring problem.
Here we have nine events which we have managed to timetable into four timeslots. In this
case, three events have been scheduled into timeslot 1, and two events have been scheduled
into each of the remaining three. In practice, assuming that only one event can take place
in a room at any one time, we would also need to ensure that three rooms are available
during timeslot 1. If only two rooms are available in each timeslot, then an extra timeslot
might need to be added to the timetable. It should be noted that timetabling problems can
often vary a great deal between educational institutions, and can also be subject to a wide
range of additional constraints beyond the event-clash constraint mentioned above.

1.2.5.4 Task Scheduling


A third example of how graph colouring can be used to solve real-world problems arises
in the scheduling of tasks that each have a start and finish time. Imagine that a taxi firm
has received n journey bookings, each of which has a start time, signifying when the taxi
will leave the depot, and a finish time telling us when the taxi is expected to return. How
might we assign all of these bookings to vehicles so that the minimum number of vehicles
is needed?
15

Figure 1.7(a) shows an example problem where we have ten taxi bookings. For
illustrative purposes these have been ordered from top to bottom according to their start
times. It can be seen, for example, that booking 1 overlaps with bookings 2, 3 and 4; hence
any taxi carrying out booking 1 will not be able to serve bookings 2, 3 and 4. We can
construct a graph from this information by using one vertex for each booking and then
adding edges between any vertex pair corresponding to overlapping bookings. A 3-
colouring of this example graph is shown in Figure 1.7(b), and the corresponding
assignment of the bookings to three taxis (the minimum number possible) is shown in
Figure 1.7(c).

Fig. 1.7 A set of taxi journey requests over time (a), its corresponding interval graph and
3- colouring (b), and (c) the corresponding assignment of journeys to taxis.

In this particular case we see that our example problem has resulted in a graph made of
three smaller graphs (components), comprising vertices v1 to v4, v5 to v7 and v8 to v10
respectively. However, this will not always be the case and will depend on the nature of
the bookings received.
A graph constructed from time-dependent tasks such as this is usually referred to
as an interval graph.

1.2.5.5 Compiler Register Allocation

Our fourth and final example in this section concerns the allocation of computer code
variables to registers on a computer processor. When writing code in a particular
programming language, whether it be C++, Pascal, FORTRAN or some other option, the
programmer is free to make use of as many variables as he or she sees fit. When it comes
to compiling this code, however, it is advantageous for the compiler to assign these
variables to registers1 on the processor since accessing and updating values in these
16

locations is far faster than carrying out the same operations using the computer’s RAM
or cache.
Computer processors only have a limited number of registers. For example, most
RISC processors feature 64 registers: 32 for integer values and 32 for floating point values.
However, not all variables in a computer program will be in use (or “live”) at a particular
time. We might therefore choose to assign multiple variables to the same register if they
are seen not to interfere with one another.
Figure 1.8(a) shows an example piece of computer code making use of five
variables, v1, . . . ,v5. It also shows the live ranges for each variable. So, for example, variable
v2 is live only in lines (2) and (3), whereas v3 is live from lines (4) to (9). It can also be
seen, for example, that the live ranges of v1 and v4 do not overlap. Hence we might use
the same register for storing both of these variables at different periods during execution.

Fig. 1.8 (a) An example computer program together with the live ranges of each variable.
Here, the statement “vi ←. . .” denotes the assignment of some value to variable vi,
whereas “. . . vi . . .” is just some arbitrary operation using vi.
(b) shows an optimal colouring of the corresponding interference graph

The problem of deciding how to assign the variables to registers can be modelled as a graph
colouring problem by using one vertex for each live range and then adding edges between
any pairs of vertices corresponding to overlapping live ranges. Such a graph is known as
an interference graph, and the task is to now colour the graph using equal or fewer colours
than the number of available registers. Figure 1.8(b) shows that in this particular case only
three registers are needed: variables v1 and v4 can be assigned to register 1, v2 and v5 to
register 2, and v3 to register 3.
17

Note that in the example of Figure 1.8, the resultant interference graph actually
corresponds to an interval graph, rather like the taxi example from the previous subsection.
Such graphs will arise in this setting when using straight-line code sequences or when using
software pipelining. In most situations however, the flow of a program is likely to be far
more complex, involving if-else statements, loops, goto commands, and so on. In these
cases the more complicated process of liveness analysis will be needed for determining the
live ranges of each variable, which could result in an interference graphs of any arbitrary
topology.
18

1.3 Artificial Neural Network:


An Artificial Neural Network (ANN) is an information processing paradigm that is
inspired by the way biological nervous systems, such as the brain, process information.
The key element of this paradigm is the novel structure of the information processing
system. It is composed of a large number of highly interconnected processing elements
(neurones) working in unison to solve specific problems. ANNs, like people, learn by
example. An ANN is configured for a specific application, such as pattern recognition or
data classification, through a learning process. Learning in biological systems involves
adjustments to the synaptic connections that exist between the neurones.

Fig: 1.9 Architecture of feedforward ANN.

Artificial neural networks (ANN) or connectionist systems are computing systems that
are inspired by, but not necessarily identical to, the biological neural networks that
constitute animal brains. Such systems "learn" to perform tasks by considering examples,
generally without being programmed with any task-specific rules. For example, in image
recognition, they might learn to identify images that contain cats by analyzing example
images that have been manually labeled as "cat" or "no cat" and using the results to
identify cats in other images. They do this without any prior knowledge about cats, for
example, that they have fur, tails, whiskers and cat-like faces. Instead, they automatically
generate identifying characteristics from the learning material that they process.
An ANN is based on a collection of connected units or nodes called artificial neurons,
which loosely model the neurons in a biological brain. Each connection, like
the synapses in a biological brain, can transmit a signal from one artificial neuron to
19

another. An artificial neuron that receives a signal can process it and then signal additional
artificial neurons connected to it.
In common ANN implementations, the signal at a connection between artificial neurons
is a real number, and the output of each artificial neuron is computed by some non-linear
function of the sum of its inputs. The connections between artificial neurons are called
'edges'. Artificial neurons and edges typically have a weight that adjusts as learning
proceeds. The weight increases or decreases the strength of the signal at a connection.
Artificial neurons may have a threshold such that the signal is only sent if the aggregate
signal crosses that threshold. Typically, artificial neurons are aggregated into layers.
Different layers may perform different kinds of transformations on their inputs. Signals
travel from the first layer (the input layer), to the last layer (the output layer), possibly after
traversing the layers multiple times.
The original goal of the ANN approach was to solve problems in the same way that
a human brain would. However, over time, attention moved to performing specific tasks,
leading to deviations from biology. Artificial neural networks have been used on a variety
of tasks, including computer vision, speech recognition, machine translation, social
network filtering, playing board and video games and medical diagnosis.

1.3.1 ANN Model


An artificial neural network is a network of simple elements called artificial neurons,
which receive input, change their internal state (activation) according to that input, and
produce output depending on the input and activation.
An artificial neuron mimics the working of a biophysical neuron with inputs and outputs,
but is not a biological neuron model.
The network forms by connecting the output of certain neurons to the input of other
neurons forming a directed, weighted graph. The weights as well as the functions that
compute the activation can be modified by a process called learning which is governed by
a learning rule.[5]

1.3.2 Components of an artificial neural network


Neuron
A neuron with label 𝑗𝑗 receiving an input 𝑝𝑝𝑗𝑗 (𝑡𝑡) from predecessor neurons consists of the
following components: [6]

• an activation 𝑎𝑎𝑗𝑗 (𝑡𝑡), the neuron's state, depending on a discrete time parameter,
• possibly a threshold 𝜃𝜃𝑗𝑗 , which stays fixed unless changed by a learning function,
• an activation function 𝑓𝑓 that computes the new activation at a given time 𝑡𝑡 + 1 from
𝑎𝑎𝑗𝑗 (𝑡𝑡), 𝜃𝜃𝑗𝑗 and the net input 𝑝𝑝𝑗𝑗 (𝑡𝑡) giving rise to the relation 𝑎𝑎𝑗𝑗 (𝑡𝑡 + 1) =
𝑓𝑓(𝑎𝑎𝑗𝑗 (𝑡𝑡), 𝑝𝑝𝑗𝑗 (𝑡𝑡), 𝜃𝜃𝑗𝑗 ) ,
20

• and an output function 𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜 computing the output from the activation
𝑜𝑜𝑗𝑗 (𝑡𝑡) = 𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜 �𝑎𝑎𝑗𝑗 (𝑡𝑡)�.

Often the output function is simply the Identity function.

An input neuron has no predecessor but serves as input interface for the whole network.
Similarly an output neuron has no successor and thus serves as output interface of the whole
network.

Connections, weights and biases


The network consists of connections, each connection transferring the output of a neuron 𝑖𝑖
to the input of a neuron 𝑗𝑗. In this sense 𝑖𝑖 is the predecessor of 𝑗𝑗 and 𝑗𝑗 is the successor of
𝑖𝑖 . Each connection is assigned a weight 𝑤𝑤𝑖𝑖𝑖𝑖 .[6]. Sometimes a bias term is added to the total
weighted sum of inputs to serve as a threshold to shift the activation function.[ 5]

Propagation function
The propagation function computes the input 𝑝𝑝𝑗𝑗 (𝑡𝑡) to the neuron 𝑗𝑗 from the output 𝑜𝑜𝑖𝑖 (𝑡𝑡) of
predecessor neurons and typically has the form[6].

𝑝𝑝𝑗𝑗 (𝑡𝑡) = ∑𝑖𝑖 𝑜𝑜𝑖𝑖 (𝑡𝑡)𝑤𝑤𝑖𝑖𝑖𝑖 .

When a bias value is added with the function, the above form changes to the following [7]
𝑝𝑝𝑗𝑗 (𝑡𝑡) = ∑𝑖𝑖 𝑜𝑜𝑖𝑖 (𝑡𝑡)𝑤𝑤𝑖𝑖𝑖𝑖 + 𝑤𝑤0𝑗𝑗 . , where 𝑤𝑤0𝑗𝑗 is a bias.

Learning rule
The learning rule is a rule or an algorithm which modifies the parameters of the neural
network, in order for a given input to the network to produce a favored output. This
learning process typically amounts to modifying the weights and thresholds of the
variables within the network.[6]

Fig: 1.10 Biological Neuron


21

1.3.3 Learning paradigms


The three major learning paradigms each correspond to a particular learning task. These
are supervised learning, unsupervised learning and reinforcement learning.
Supervised learning
Supervised learning uses a set of example pairs (x,y),x 𝜖𝜖 𝑋𝑋, 𝑦𝑦 𝜖𝜖Y and the aim is to find a
function f : XY in the allowed class of functions that matches the examples. In other
words, we wish to infer the mapping implied by the data; the cost function is related to the
mismatch between our mapping and the data and it implicitly contains prior knowledge
about the problem domain. [8]
A commonly used cost is the mean-squared error, which tries to minimize the average
squared error between the network's output, f(x) and the target value y over all the
example pairs. Minimizing this cost using gradient descent for the class of neural networks
called multilayer perceptrons (MLP), produces the backpropagation algorithm for training
neural networks.
Tasks that fall within the paradigm of supervised learning are pattern recognition
(also known as classification) and regression (also known as function approximation). The
supervised learning paradigm is also applicable to sequential data (e.g., for hand writing,
speech and gesture recognition). This can be thought of as learning with a "teacher", in the
form of a function that provides continuous feedback on the quality of solutions obtained
thus far.

Unsupervised learning
In unsupervised learning, some data x is given and the cost function to be minimized, that
can be any function of the data x and the network's output, f. The cost function is
dependent on the task (the model domain) and any a priori assumptions (the implicit
properties of the model, its parameters and the observed variables).
As a trivial example, consider the mode f(x)=a where a is a constant and the cost C=E[(x-
f(x))2].
Minimizing this cost produces a value of a that is equal to the mean of the data. The cost
function can be much more complicated. Its form depends on the application: for example,
in compression it could be related to the mutual information between x and f(x), whereas
in statistical modeling, it could be related to the posterior probability of the model given
the data (note that in both of those examples those quantities would be maximized rather
than minimized).

Tasks that fall within the paradigm of unsupervised learning are in general estimation
problems; the applications include clustering, the estimation of statistical distributions,
compression and filtering.
22

Hebbian learning
In the late 1940s, D. O. Hebb [9] created a learning hypothesis based on the mechanism
of neural plasticity that became known as Hebbian learning. Hebbian learning is
unsupervised learning. This evolved into models for long-term potentiation. Researchers
started applying these ideas to computational models in 1948 with Turing's B-type
machines. Farley and Clark [10] (1954) first used computational machines, then called
"calculators", to simulate a Hebbian network. Other neural network computational
machines were created by Rochester, Holland, Habit and Duda (1956). [11] Rosenblatt [6]
(1958) created the perceptron, an algorithm for pattern recognition. With mathematical
notation, Rosenblatt described circuitry not in the basic perceptron, such as the exclusive-
or circuit that could not be processed by neural networks at the time. [12] In 1959, a
biological model proposed by Nobel laureates Hubel and Wiesel was based on their
discovery of two types of cells in the primary visual cortex: simple cells and complex
cells.[13] The first functional networks with many layers were published by Ivakhnenko
and Lapa in 1965, becoming the Group Method of Data Handling. [14] [15]

Neural network research stagnated after machine learning research by Minsky and Papert
(1969),[16] who discovered two key issues with the computational machines that
processed neural networks. The first was that basic perceptrons were incapable of
processing the exclusive-or circuit. The second was that computers didn't have enough
processing power to effectively handle the work required by large neural networks. Neural
network research slowed until computers achieved far greater processing power. Much of
artificial intelligence had focused on high-level (symbolic) models that are processed by
using algorithms, characterized for example by expert systems with knowledge embodied
in if-then rules, until in the late 1980s research expanded to low-level (sub-symbolic)
machine learning, characterized by knowledge embodied in the parameters of a cognitive
model
Reinforcement learning
In reinforcement learning, data x are usually not given, but generated by an agent's
interactions with the environment. At each point in time t, the agent performs an action yt
and the environment generates an observation xt and an instantaneous cost ct, according
to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions
that minimizes some measure of a long-term cost, e.g., the expected cumulative cost. The
environment's dynamics and the long-term cost for each policy are usually unknown, but
can be estimated.
More formally the environment is modeled as a Markov decision process (MDP)
with states s1…,sn 𝜖𝜖 S and actions a1,…,am 𝜖𝜖 A with the following probability distributions:
the instantaneous cost distribution P( ct | st ), the observation distribution P( xt | st ) and
the transition P( st+1 | st , at ) while a policy is defined as the conditional distribution over
actions given the observations. Taken together, the two then define a Markov chain (MC).
The aim is to discover the policy (i.e., the MC) that minimizes the cost.
23

Optimization
The optimization algorithm repeats a two phase cycle, propagation and weight update.
When an input vector is presented to the network, it is propagated forward through the
network, layer by layer, until it reaches the output layer. The output of the network is then
compared to the desired output, using a loss function. The resulting error value is
calculated for each of the neurons in the output layer. The error values are then propagated
from the output back through the network, until each neuron has an associated error value
that reflects its contribution to the original output.
Backpropagation uses these error values to calculate the gradient of the loss function. In
the second phase, this gradient is fed to the optimization method, which in turn uses it to
update the weights, in an attempt to minimize the loss function.
24

References
1. T Schiex, H Fargier, G Verfaillie - IJCAI (1), 1995 “ Valued constraint satisfaction
problems: Hard and easy problems”.
2. E. G. Coffman, Jr., M. R. Garey, D. S. Johnson, and A. S. LaPaugh, “Scheduling
file transfers,” SIAM J. Comput., 14(3):744–780, 1985.
3. J. A. Hoogeveen, S. L. van de Velde, and B. Veltman, “Complexity of scheduling
multiprocessor tasks with prespecified processor allocations,” Discrete Appl. Math.,
55(3):259–272, 1994.
4. I. Holyer, “The NP-completeness of edge-coloring,” SIAM J. Comput., 10(4):718–
720, Nov. 1981.
5. Abbod, Maysam F (2007). "Application of Artificial Intelligence to the
Management of Urological Cancer". The Journal of Urology. 178 (4): 1150–1156.
doi:10.1016/j.juro.2007.05.122. PMID 17698099.
6. Zell, Andreas (1994). "chapter 5.2". Simulation Neuronaler Netze [Simulation of
Neural Networks] (in German) (1st ed.). Addison-Wesley. ISBN 978-3-89319-554-
1.
7. Dawson, Christan W (1998). "An artificial neural network approach to rainfall-
runoff modelling". Hydrological Sciences Journal. 43 (1): 47–66.
8. Ojha, Varun Kumar; Abraham, Ajith; Snášel, Václav (1 April 2017). "Metaheuristic
design of feedforward neural networks: A review of two decades of research".
Engineering Applications of Artificial Intelligence. 60: 97–116. arXiv:1705.05584
9. Hebb, Donald (1949). The Organization of Behavior. New York: Wiley. ISBN 978-
1-135-63190-1
10. Farley, B.G.; W.A. Clark (1954). "Simulation of Self-Organizing Systems by Digital
Computer". IRE Transactions on Information Theory. 4 (4): 76–84.
doi:10.1109/TIT.1954.1057468.
11. Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). "Tests on a cell
assembly theory of the action of the brain, using a large digital computer". IRE
Transactions on Information Theory.
12. Werbos, P.J. (1975). Beyond Regression: New Tools for Prediction and Analysis in
the Behavioral Sciences.
13. David H. Hubel and Torsten N. Wiesel (2005). Brain and visual perception: the
story of a 25-year collaboration. Oxford University Press US. p. 106. ISBN 978-0-
19-517618-6.
14. Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview".
Neural Networks. 61: 85–117. ArXiv:1404.7828
15. Ivakhnenko, A. G.; Grigorʹevich Lapa, Valentin (1967). Cybernetics and
forecasting techniques. American Elsevier Pub. Co.
16. Minsky, Marvin; Papert, Seymour (1969). Perceptrons: An Introduction to
Computational Geometry. MIT Press. ISBN 978-0-262-63022-1.
25

Chapter 2
Literature Survey
26

1. T Schiex, H Fargier, G Verfaillie - IJCAI (1), 1995 “ Valued constraint


satisfaction problems: Hard and easy problems”

In order to deal with over-constrained Constraint Satisfaction Problems, various


extensions of the CSP framework have been considered by taking into account
costs, uncertainties, preferences, priorities...Each extension uses a specific
mathematical operator (+, max...) to aggregate constraint violations.
In this paper, we consider a simple algebraic framework, related to Partial
Constraint Satisfaction, which subsumes most of these proposals and use it to
characterize existing proposals in terms of rationality and computational
complexity. We exhibit simple relationships between these proposals, try to extend
some traditional CSP algorithms and prove that some of these extensions may be
computationally expensive.

2. E. G. Coffman, Jr., M. R. Garey, D. S. Johnson, and A. S. LaPaugh,


“Scheduling file transfers,” SIAM J. Comput., 14(3):744–780, 1985.

We consider a problem of scheduling file transfers in a network so as to minimize


overall finishing time. Although the general problem is NP-complete, we identify
polynomial time solvable special cases and derive good performance bounds for
several natural approximation algorithms, assuming the existence of a central
controller. We also show how these bounds can be maintained in a distributed
regime.

3. J. A. Hoogeveen, S. L. van de Velde, and B. Veltman, “Complexity of


scheduling multiprocessor tasks with prespecified processor allocations,”
Discrete Appl. Math., 55(3):259–272, 1994.

We investigate the computational complexity of scheduling multiprocessor tasks


with prespecified processor allocations. We consider two criteria: minimizing
schedule length and minimizing the sum of the task completion times. In addition,
we investigate the complexity of problems when precedence constraints or release
dates are involved.

4. Abbod, Maysam F (2007). "Application of Artificial Intelligence to the


Management of Urological Cancer". The Journal of Urology. 178 (4): 1150–
1156. doi:10.1016/j.juro.2007.05.122. PMID 17698099.

Artificial intelligence techniques, such as artificial neural networks, Bayesian belief


networks and neuro-fuzzy modeling systems, are complex mathematical models
based on the human neuronal structure and thinking. Such tools are capable of
generating data driven models of biological systems without making assumptions
based on statistical distributions. A large amount of study has been reported of the
27

use of artificial intelligence in urology. We reviewed the basic concepts behind


artificial intelligence techniques and explored the applications of this new dynamic
technology in various aspects of urological cancer management. A detailed and
systematic review of the literature was performed using the MEDLINE and Inspec
databases to discover reports using artificial intelligence in urological cancer. The
characteristics of machine learning and their implementation were described and
reports of artificial intelligence use in urological cancer were reviewed. While most
researchers in this field were found to focus on artificial neural networks to improve
the diagnosis, staging and prognostic prediction of urological cancers, some groups
are exploring other techniques, such as expert systems and neuro-fuzzy modeling
systems. Compared to traditional regression statistics artificial intelligence methods
appear to be accurate and more explorative for analyzing large data cohorts.
Furthermore, they allow individualized prediction of disease behavior. Each
artificial intelligence method has characteristics that make it suitable for different
tasks. The lack of transparency of artificial neural networks hinders global scientific
community acceptance of this method but this can be overcome by neuro-fuzzy
modeling systems.

5. DAWSON, CHRISTIAN W (1998). "An artificial neural network approach to


rainfall-runoff modelling". Hydrological Sciences Journal. 43 (1): 47–66.
This paper provides a discussion of the development and application of Artificial
Neural Networks (ANNs) to flow forecasting in two flood-prone UK catchments
using real hydrometric data. Given relatively brief calibration data sets it was
possible to construct robust models of 15-min flows with six hour lead times for the
Rivers Amber and Mole. Comparisons were made between the performance of the
ANN and those of conventional flood forecasting systems. The results obtained for
validation forecasts were of comparable quality to those obtained from operational
systems for the River Amber. The ability of the ANN to cope with missing data
and to “learn” from the event currently being forecast in real time makes it an
appealing alternative to conventional lumped or semi-distributed flood forecasting
models. However, further research is required to determine the optimum ANN
training period for a given catchment, season and hydrological contexts.
6. Ojha, Varun Kumar; Abraham, Ajith; Snášel, Václav (1 April 2017).
"Metaheuristic design of feedforward neural networks: A review of two decades
of research". Engineering Applications of Artificial Intelligence. 60: 97–116.
arXiv:1705.05584

Over the past two decades, the feedforward neural network (FNN) optimization
has been a key interest among the researchers and practitioners of multiple
disciplines. The FNN optimization is often viewed from the various perspectives:
the optimization of weights, network architecture, activation nodes, learning
parameters, learning environment, etc. Researchers adopted such different
viewpoints mainly to improve the FNN's generalization ability. The gradient-
descent algorithm such as backpropagation has been widely applied to optimize the
FNNs. Its success is evident from the FNN's application to numerous real-world
28

problems. However, due to the limitations of the gradient-based optimization


methods, the metaheuristic algorithms including the evolutionary algorithms,
swarm intelligence, etc., are still being widely explored by the researchers aiming
to obtain generalized FNN for a given problem. This article attempts to summarize
a broad spectrum of FNN optimization methodologies including conventional and
metaheuristic approaches. This article also tries to connect various research
directions emerged out of the FNN optimization practices, such as evolving neural
network (NN), cooperative coevolution NN, complex-valued NN, deep learning,
extreme learning machine, quantum NN, etc. Additionally, it provides interesting
research challenges for future research to cope-up with the present information
processing era.

7. Farley, B.G.; W.A. Clark (1954). "Simulation of Self-Organizing Systems by


Digital Computer". IRE Transactions on Information Theory. 4 (4): 76–84.

A general discussion of ideas and definitions relating to self-organizing systems and


their synthesis is given, together with remarks concerning their simulation by
digital computer. Synthesis and simulation of an actual system is then described.
This system, initially randomly organized within wide limits, organizes itself to
perform a simple prescribed task.

8. Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). "Tests on a cell
assembly theory of the action of the brain, using a large digital computer". IRE
Transactions on Information Theory.

Theories by D.O. Hebb and P.M. Milner on how the brain works were tested by
simulating neuron nets on the IBM Type 704 Electronic Calculator. The formation
of cell assemblies from an unorganized net of neurons was demonstrated, as well
as a plausible mechanism for short-term memory and the phenomena of growth
and fractionation of cell assemblies. The cell assemblies do not yet act just as the
theory requires, but changes in the theory and the simulation offer promise for
further experimentation.

9. Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview".


Neural Networks. 61: 85–117. arXiv:1404.7828

In recent years, deep artificial neural networks (including recurrent ones) have won
numerous contests in pattern recognition and machine learning. This historical
survey compactly summarizes relevant work, much of it from the previous
millennium. Shallow and Deep Learners are distinguished by the depth of their
credit assignment paths, which are chains of possibly learnable, causal links
between actions and effects. I review deep supervised learning (also recapitulating
the history of backpropagation), unsupervised learning, reinforcement learning &
evolutionary computation, and indirect search for short programs encoding deep
and large networks.
29

10. M.O. Berger, K-Colouring vertices using a neural network with convergence to
valid solutions, Proc. International Conf. on Neural Networks, 1994

This paper proposes a new algorithm using a maximum neural network model to
k -color vertices of a simple undirected graph. Unlike traditional neural nets, the
proposed network is guaranteed to converge to valid solutions with no parameter
tuning needed. The power of the new method to solve this NP- complete problem
will be shown in a number of simulations.

11. Takefuji, Y., and Lee, K.C., " Artificial Neural Network for Four-Coloring
Map Problems and K-Colorability Problems ", IEEE Trans. Circuits
Systems,vol.38,no.3,pp.325-333,Mar.1991

The computational energy required for solving a four-coloring map problem is


determined. A parallel algorithm for solving the problem based on the McCulloch-
Pits binary neuron model and the Hopfield neural network, is presented. It is shown
that the computational energy is always guaranteed to monotonically decrease with
the Newton equation. A 4*n neural array is used to color a map of n regions, where
each neuron is a processing element that performs according to the proposed
Newton equation. The capability of this system is demonstrated for a large number
of simulation runs. The parallel algorithm is extended for solving the K-colorability
problem.

12. Bruck,J., and Goodman, J., "On the Power of Neural Networks for Solving
Hard Problems",Journal of Complexity,vol.6,pp.129-135,1990

This paper deals with a neural network model in which each neuron performs a
threshold logic function. An important property of the model is that it always
converges to a stable state when operating in a serial mode. This property is the
basis of the potential applications of the model such as associative memory devices
and combinatorial optimization. One of the motivations for use of the model for
solving hard combinatorial problems is the fact that it can be implemented by
optical devices and thus operate at a higher speed than conventional electronics.
The main theme in this work is to investigate the power of the model for solving
NP-hard problems and to understand the relation between speed of operation and
the size of a neural network. In particular, it will be shown that for any NP-hard
problem the existence of a polynomial size network that solves it implies that NP
= co-NP. Also, for the Traveling Salesman Problem (TSP), even a polynomial size
network that gets an e-approximate solution does not exist unless P = NP. The
above results are of great practical interest, because right now it is possible to build
neural networks which will operate fast but are limited in the number of neurons
they contain.
30

References

1. T Schiex, H Fargier, G Verfaillie - IJCAI (1), 1995 “ Valued constraint satisfaction


problems: Hard and easy problems”.

2. E. G. Coffman, Jr., M. R. Garey, D. S. Johnson, and A. S. LaPaugh, “Scheduling


file transfers,” SIAM J. Comput., 14(3):744–780, 1985.

3. J. A. Hoogeveen, S. L. van de Velde, and B. Veltman, “Complexity of scheduling


multiprocessor tasks with prespecified processor allocations,” Discrete Appl.
Math., 55(3):259–272, 1994.

4. Abbod, Maysam F (2007). "Application of Artificial Intelligence to the


Management of Urological Cancer". The Journal of Urology. 178 (4): 1150–1156.
doi:10.1016/j.juro.2007.05.122. PMID 17698099.

5. Dawson, Christian W (1998). "An artificial neural network approach to rainfall-


runoff modelling". Hydrological Sciences Journal. 43 (1): 47–66.

6. Ojha, Varun Kumar; Abraham, Ajith; Snášel, Václav (1 April 2017).


"Metaheuristic design of feedforward neural networks: A review of two decades of
research". Engineering Applications of Artificial Intelligence. 60: 97–116.
arXiv:1705.05584

7. Farley, B.G.; W.A. Clark (1954). "Simulation of Self-Organizing Systems by


Digital Computer". IRE Transactions on Information Theory. 4 (4): 76–84.

8. Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). "Tests on a cell
assembly theory of the action of the brain, using a large digital computer". IRE
Transactions on Information Theory.

9. Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview".


Neural Networks. 61: 85–117. arXiv:1404.7828

10. M.O. Berger, K-Colouring vertices using a neural network with convergence to
valid solutions, Proc. International Conf. on Neural Networks, 1994
31

11. Takefuji, Y., and Lee, K.C., " Artificial Neural Network for Four-Coloring Map
Problems and K-Colorability Problems ", IEEE Trans. Circuits
Systems,vol.38,no.3,pp.325-333,Mar.1991

12. Bruck,J., and Goodman, J., "On the Power of Neural Networks for Solving Hard
Problems",Journal of Complexity,vol.6,pp.129-135,1990
32

Chapter 3
Graph Coloring
Techniques
33

3.1 Previous Techniques for Graph Coloring:

3.1.1 Greedy Algorithm:


GREEDY algorithm, which is one of the simplest but most fundamental heuristic
algorithms for graph colouring. The algorithm operates by taking vertices one by one
according to some (possibly arbitrary) ordering and assigns each vertex its first available
colour. Because this is a heuristic algorithm, the solutions it produces may very well be
suboptimal; however, it can also be shown that GREEDY can produce an optimal
solution for any graph given the correct sequence of vertices. As a result, various
algorithms for graph colouring have been proposed that seek to find such orderings of the
vertices.[1]
Pseudo Code:
GREEDY (S ← ϕ, π)
(1) for i←1 to |π| do
(2) for j ←1 to |S|
(3) if (Sj ∪{πi}) is an independent set then
(4) Sj ←Sj ∪{πi}
(5) break
(6) else j← j+1
(7) if j > |S| then
(8) Sj ←{πi}
(9) S ←S ∪Sj

To start, the algorithm takes an empty solution S = ϕ and an arbitrary permutation of the
vertices π. In each outer loop the algorithm takes the ith vertex in the permutation, πi, and
attempts to find a colour class Sj ∈ S into which it can be inserted. If such a colour class
currently exists in S, then the vertex is added to it and the process moves on to consider
the next vertex πi+1. If not, lines (8–9) of the algorithm are used to create new colour class
for the vertex.
Let us now estimate the computational complexity of the GREEDY algorithm with regard
to the number of constraint checks that are performed. We see that one vertex is coloured
at each iteration, meaning n = |π| iterations of the algorithm are required in total. At the
ith iteration (1 ≤ i ≤ n), we are concerned with finding a feasible colour for the vertex πi.
In the worst case this vertex will clash with all vertices that have preceded it in π, meaning
that (i−1) constraint checks will be performed before a suitable colour is determined.
Indeed, if the graph we are colouring is the complete graph Kn, the worst case will occur
34

for all vertices; hence a total of 0+1+2+. . .+(n−1) constraint checks will be performed.
This gives GREEDY an overall worst-case complexity O (n2).

3.1.2 Recursive largest First:


This method work by colouring a graph one colour at a time as opposed to one vertices
at time.In each step the algorithm uses heuristics to identify an independent set of vertices
which are then associated with the same color.This independent set is removed in each
step and resulting in smaller subgraph .This process is repeated again and again till the
whole graph is covered.[2]
Pseudo code :
RLF (S ← 𝜙𝜙, X ←V, Y ← 𝜙𝜙, i←0)
(1) while X ! = 𝜙𝜙 do
(2) i←i+1
(3) Si ← 𝜙𝜙
(4) while X != do
(5) choose v ∈ X
(6) Si ←Si ∪ {v}
(7) Y ←Y ∪ X (v)
(8) X ←X −(Y ∪ {v})
(9) S ←S ∪ {Si}
(10) X ←Y
(11) Y ← 𝜙𝜙
Pseudocode for the RLF algorithm is given. In each outer loop of the process, the ith
colour class Si is build. The algorithm also makes use of two sets: X, which contains
uncoloured vertices that can currently be added to Si without causing a clash; and Y, which
holds the uncoloured vertices that cannot be feasibly added to Si. At the start of execution
X =V and Y = ϕ . Lines (4) to (8) give the steps responsible for constructing the ith colour
class Si. To start, a vertex v from X is selected and added to Si (i.e., v is coloured with
colour i). Next, all vertices neighbouring v in the subgraph induced by X are transferred to
Y, to signify that they cannot now be feasibly assigned to Si. Finally, v and its neighbours
are also removed from X, since they are not now considered candidates for inclusion in
colour class Si. Once X = ϕ, no further vertices can be added to the current colour class
Si. In lines (9) to (11) of the algorithm Si is therefore added to the solution S and, if
necessary, the algorithm moves on to constructing colour class Si+1. To do this, all vertices
in the set of uncoloured vertices Y are moved into X, and Y is emptied. Obviously, once
both X and Y are empty, all vertices have been coloured.
35

3.1.3 Ant Col algorithm:


ACO is an algorithmic framework that was originally inspired by the way in which real
ants determine efficient paths between food sources and their colonies. In their natural
habitat, when no food has been identified, ants tend to wander about randomly. However
when a food source is found, the discovering ants will take some of this back to the colony
leaving a pheromone, they are less likely to continue wandering at random, but may
instead follow the trail. If they go on to discover the same food source, they will then
follow the pheromone trail back to the nest, adding their own pheromone in the process.
This encourages further ants to follow the trail. In addition to this, pheromones on a trail
also tend to evaporate over time, reducing the chances of an ant following it. The longer
it takes for an ant to traverse a path, the more time the pheromones have to evaporate;
hence shorter paths tend to see a more rapid build-up of pheromone, making other ants
more likely to follow it and deposit their own pheromone. This positive feedback
eventually leads to all ants following a single, efficient path between the colony and food
source.
As might be expected, initial applications of ACO were aimed towards problems
such as the travelling salesman problem and vehicle routing problems, where we seek to
identify efficient paths for visiting the vertices of a graph.[3]
Pseudo code :
ANTCOL (G = (V,E))
(1) tuv ←1 ∀ u, v ∈ V : u != v
(2) k = n
(3) while (not stopping condition) do
(4) δuv ←0 ∀u,v ∈ V : u != v
(5) best ←k
(6) foundFeasible← false
(7) for (ant ←1 to nants) do
(8) S ←BUILDSOLUTION(k)
(9) if (S is a partial solution) then
(10) Randomly assign uncoloured vertices to colour classes in S
(11) Run TABUCOL
(12) if (S is feasible) then
(13) foundFeasible ← true
(14) if (|S| ≤ best) then
(15) best ← |S|
36

(16) δuv ←δuv +F(S) ∀u, v : c(u) = c(v)∧u _= v


(17) tuv ←ρ ×tuv +δuv ∀u,v ∈ V : u _= v
(18) if (foundFeasible=true) then
(19) k←best−1

As shown in the pseudocode, in each cycle of the algorithm (lines (3) to (19)), a number
of ants each produce a complete, though not necessarily feasible, solution. In line (16) the
details of each of these solutions are then added to a trail update matrix δ and, at the end
of a cycle, the contents of δ are used together with an evaporation rate ρ to update the
global trail matrix t. At the start of each cycle, each individual ant attempts to construct a
solution using the procedure BUILDSOLUTION. This is based on the RLF method
which, we recall, operates by building up each colour class in a solution one at a time.
Also recall that during the construction of each class Si ∈ S, RLF makes use of two sets:
X, which contains uncoloured vertices that can currently be added to Si without causing a
clash; and Y, which holds the uncoloured vertices that cannot be feasibly added to Si. The
modifications to RLF that BUILDSOLUTION employs are as follows:
• In the procedure a maximum of k colour classes is permitted. Once these have been
constructed, any remaining vertices are left uncoloured.
• The first vertex to be assigned to each colour class Si (1 ≤ i ≤ k) is chosen randomly
from the set X.
• In remaining cases, each vertex v is then assigned to colour Si with probability

𝛼𝛼 𝛽𝛽
𝜏𝜏𝑣𝑣𝑣𝑣 × 𝜂𝜂𝑣𝑣𝑣𝑣
𝑃𝑃𝑣𝑣𝑣𝑣 = �∑ 𝛼𝛼 𝛽𝛽 𝑖𝑖𝑖𝑖 𝑣𝑣 𝜖𝜖 𝑋𝑋
𝑢𝑢 𝜖𝜖 𝑋𝑋(𝜏𝜏𝑢𝑢𝑢𝑢 × 𝜂𝜂𝑢𝑢𝑢𝑢 )
0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

where τvi is calculated

∑𝑢𝑢 𝜖𝜖 𝑆𝑆𝑖𝑖 𝑡𝑡𝑢𝑢𝑢𝑢


𝜏𝜏𝑣𝑣𝑣𝑣 =
|𝑆𝑆𝑖𝑖 |
37

Note that the calculation of τvi makes use of the global trail matrix t, meaning that higher
values are associated with combinations of vertices that have been assigned the same
colour in previous solutions. The value ηvi, meanwhile, is associated with a heuristic rule
which, in this case, is the degree of vertex v in the graph induced by the set of currently
uncoloured vertices X ∪Y. Larger values for τvi and ηvi thus contribute to larger values for
Pvi, encouraging vertex v to be assigned to colour class Si. The parameters α and β are used
to control the relative strengths of τ and η in the equation.

3.1.4 Hill Climbing Algorithm:


In contrast to the preceding four algorithms, the Hill-Climbing (HC) algorithm of Lewis
(2009) operates in the space of feasible solutions, with the initial solution being formed
using the DSATUR heuristic. During a run, the algorithm operates on a single feasible
solution S ={S1, . . .S|S|} with the aim of minimising |S|. To begin, a small number of
colour classes are removed from S and are placed into a second set T , giving two partial
proper solutions. A specialised local search procedure is then run for I iterations. This
attempts to feasibly transfer vertices from colour classes in T into colour classes in S such
that both S and T remain proper. If successful, this has the effect of increasing the
cardinality of the colour classes in S and may also empty some of the colour classes in T ,
reducing the total number of colours being used. At the end of the local search procedure,
all colour classes in T are copied back into S to form a feasible solution. The first iteration
of the local search procedure operates by considering each v in T and checking whether it
can be feasibly transferred into any of the colour classes in S. If this is the case, such
transfers are performed. The remaining iterations of the procedure then operate as follows.
First, an alteration is made to a randomly selected pair of colour classes Si,Sj ∈ S using
either a Kempe chain interchange or a pair swap and . Since this will usually alter the
make-up of two colour classes,1 this then raises the possibility that other vertices in T can
now also be moved to Si or Sj. Again, these transfers are made if they are seen to retain
feasibility. The local search procedure continues in this fashion for I iterations.
On completion of the local search procedure, the independent sets in T are copied
back into S to form a feasible solution. The independent sets in S are then ordered
according to some (possibly random) heuristic, and a new solution S_ is formed by
constructing a permutation of the vertices in the same manner as that of the Iterated
Greedy algorithm and then applying the GREEDY algorithm. This latter operation is
intended to generate large alterations to the incumbent solution, which is then passed back
to the local search procedure for further optimisation. Note that none of the stages of this
algorithm allow the number of colour classes being used to increase, thus providing its hill-
climbing characteristics. As with the previous algorithms, a number of parameters have to
be set with this algorithm, each that can influence its performance. The values used in our
experiments here were determined in preliminary tests and according to those reported by
Lewis (2009). For the local search procedure, independent sets are moved into T by
considering each Si ∈ S in turn and transferring it with probability 1/|S|. The local search
procedure is then run for I = 1,000 iterations, and in each iteration the Kempe chain and
38

swap neighbourhoods are called with probabilities 0.99 and 0.01 respectively. Finally,
when constructing the permutation of the vertices for passing to the GREEDY algorithm,
the independent sets are ordered using the same 5:5:3 ratio.[4]
Pseudo Code:
1. I=initial solution
2. While f(s)<= f(i) s 𝜖𝜖 Neighbours (i) do
3. Generates an s € Neighbours (i);
4. If fitness (s) > fitness (i) then
5. Replace s with the i;
6. End if

3.1.5 The Hybrid Evolutionary Algorithm (HEA):


The HEA operates by maintaining a population of candidate solutions that are evolved
via a problem-specific recombination operator and a local search method. Like
TABUCOL, the HEA operates in the space of complete improper k-colourings using cost
function f2.
The algorithm begins by creating an initial population of candidate solutions. Each
member of this population is formed using a modified version of the DSATUR algorithm
for which the number of colours k is fixed at the outset. To provide diversity between
members, the first vertex is selected at random and assigned to the first colour. The
remaining vertices are then taken in sequence according to the maximum saturation
degree (with ties being broken randomly) and assigned to the lowest indexed colour class
Si seen to be feasible (where 1 ≤ i ≤ k). When vertices are encountered for which no feasible
colour class exists, these are kept to one side and are assigned to random colour classes at
the end of this process. Upon construction of this initial population, an attempt is then
made to improve each member by applying the local search routine.
As is typical for an evolutionary algorithm, for the remainder of the
run the algorithm evolves the population using recombination, mutation, and evolutionary
pressure. In each iteration two parent solutions S1 and S2 are selected from the population
at random, and copies of these are used in conjunction with the recombination operator
to produce one child solution S_. This child is then improved via the local search operator,
and is inserted into the population by replacing the weaker of its two parents. Note that
there is no bias towards selecting fitter parents for recombination; rather evolutionary
pressure only exists due to the offspring replacing their weaker parent (regardless of
whether the parent has a better cost than its child).

The most popular evolutionary algorithms for graph coloring are the classical
steady-state genetic algorithms. These coloring algorithms often use local search,
and so, the approach can also be regarded as an instance of memetic computing
. The population (referred to as Pop) is defined as a set of “individuals”
39

Indiv1, Indiv2, . . . Indiv|Pop|. Each individual represents a k-coloring (with k fixed or


not) that is evaluated according to a fitness function – in the evolutionary terminology, the
objective function from optimization is referred to as the fitness function. At each
“generation”, two or more parent individuals are selected and recombined to generate
offspring solutions. This offspring can be improved via local search and then replace some
individuals selected for elimination. The generic schema of this genetic process is described
below.[5]

Pseudo Code :

1: Initialize parent population Pop←{Indiv1, Indiv2, . . . , Indiv|Pop|}


2: while a stopping condition is not met do
3: parents←matingSelection(Pop)
4: Offspr← recombination(parents)
5: Offspr← localSearch(Offspr, maxIter)
6: R←eliminationSelection(Pop)
7: Pop ← Pop ∪{Offspr}−R
8: end while
9: Return the fittest individual ever visited

The last decade has seen a surge of interest in integrating local search in genetic coloring
algorithms (see step 5 in the above schema), making the memetic approach [51] more and
more popular. Besides classical genetic algorithms, even more particular evolutionary
paradigms (crossover-free distributed search [8, 49], scatter search [29] or adaptive
memory algorithms [22]) do incorporate a local search coloring routine (often based on
TABUCOL [33] or on the k-fixed partial proper strategy [49]). In this section, we discuss
some key issues of evolutionary coloring algorithms like crossover design, population
dynamics and diversity, hybridizations with other search methods.

3.1.6 The DSATUR Algorithm


The DSATUR algorithm (abbreviated from “degree of saturation”) was originally
proposed by Br´elaz (1979). In essence it is very similar in behaviour to the GREEDY
algorithm in that it takes each vertex in turn according to some ordering and then assigns
it to the first suitable colour class, creating new colour classes when necessary. The
difference between the two algorithms lies in the way that these vertex orderings are
generated. With GREEDY the ordering is decided before any colouring takes place; on
the other hand, for the DSATUR algorithm the choice of which vertex to colour next is
decided heuristically based on the characteristics of the current partial colouring of the
graph. This choice is based primarily on the saturation degree of the vertices, defined as
follows.[6]
40

Pseudo code:

Let c(v) = NULL for any vertex v ∈ V not currently assigned to a colour class. Given such
a vertex v, the saturation degree of v, denoted by sat(v), is the number of different colours
assigned to adjacent vertices. That is, sat(v) = |{c (u) : u ∈ Γ (v)∧c(u) != NULL}|

DSATUR (S ← ϕ, X ←V)
(1) while X != ϕ do
(2) Choose v ∈ X
(3) for j ←1 to |S|
(4) if (Sj ∪{v}) is an independent set then
(5) Sj ←Sj ∪{v}
(6) break
(7) else j ← j+1
(8) if j > |S| then
(9) Sj ←{v}
(10) S ←S ∪Sj
(11) X ←X −{v}
It can be seen that the majority of the algorithm is the same as the GREEDY algorithm in
that once a vertex has been selected, a colour is found by simply going through each colour
class in turn and stopping when a suitable one has been found. Consequently, the worst-
case complexity of DSATUR is the same as GREEDY at O(n2), although in practice some
extra bookkeeping is required to keep track of the saturation degrees of the uncoloured
vertices.
41

References
1. R.M.R. Lewis “A Guide to Graph Colouring Algorithms and Applications”. ISBN
978-3-319-25730-3 pp 29-31

2. R.M.R. Lewis “A Guide to Graph Colouring Algorithms and Applications”. ISBN


978-3-319-25730-3 pp 42-44

3. R.M.R. Lewis “A Guide to Graph Colouring Algorithms and Applications”. ISBN


978-3-319-25730-3 pp 84-86

4. R.M.R. Lewis “A Guide to Graph Colouring Algorithms and Applications”. ISBN


978-3-319-25730-3 pp 87

5. Philippe Galinier, Jean-Philippe Hamiez, Jin-Kao Hao, and Daniel Porumbel


“Recent Advances in Graph Vertex Coloring”.

6. R.M.R. Lewis “A Guide to Graph Colouring Algorithms and Applications”. ISBN


978-3-319-25730-3 pp 39-41.
42

Chapter 4
Hopfield Network
Approach for k-Coloring
43

4.1 Hopfield Nets


The Hopfield network (model) consists of a set of neurons and a corresponding set of unit
delays, forming a -multiple-loop feedback system. The number of feedback loops is equal
to the number of neurons. Basically, the output of each neuron is fed back. via a unit delay
element, to each of the other neurons in the network. In other words, there is no self-
feedback in the network. [1]

A Hopfield neural network of order n comprises of n computational units, called neurons.


Generally, Hopfield’s neural net is considered to be able to solve combinatorial
optimization problems in a very short time if it is realised using analog elements. The
algorithm minimizes an energy or cost function representing the optimization criteria. The
model for each neuron 𝑖𝑖 consists of an internal state 𝑢𝑢𝑖𝑖 (in biological terms : soma
potential ) and an output Vi (in biological terms : spike frequency), where 𝑉𝑉𝑖𝑖 is a fixed
function of the internal state, i.e. 𝑉𝑉𝑖𝑖 = 𝑓𝑓(𝑢𝑢𝑖𝑖 ) and 𝐼𝐼𝑖𝑖 is the external input of the 𝑖𝑖 𝑡𝑡ℎ neuron
( see fig 4.1). In general, any continuous, bounded, monotonic function may be taken as
𝑓𝑓. [2]

𝐼𝐼𝑖𝑖

𝑉𝑉𝑖𝑖
𝑢𝑢𝑖𝑖

Figure 4.1 The artificial neuron

The gradient descent method seeks the local minimum of a predefined Lyapunov
energy function E, which follows the quadratic form [3]

(4.1)
44

𝑁𝑁 𝑁𝑁 𝑁𝑁
1
Ε = − � � 𝑇𝑇𝑖𝑖𝑖𝑖 𝑉𝑉𝑖𝑖 𝑉𝑉𝑗𝑗 − � 𝑉𝑉𝑖𝑖 𝐼𝐼𝑖𝑖
2
𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖=1

where N is the number of neurons in the system, 𝑇𝑇𝑖𝑖𝑖𝑖 is the symmetrical (𝑇𝑇𝑖𝑖𝑖𝑖 = 𝑇𝑇𝑗𝑗𝑗𝑗 ) synapse
weight between the 𝑖𝑖 𝑡𝑡ℎ and the 𝑗𝑗 𝑡𝑡ℎ neurons. The output 𝑉𝑉𝑖𝑖 follows the nondecreasing
function

1 (4.2)
ℎ(𝑢𝑢𝑖𝑖 ) = (tanh(𝜆𝜆𝑢𝑢𝑖𝑖 ) + 1)
2

where 𝜆𝜆 is the gain of the sigmoid function. The rate of change of the internal state or
motion equation of the 𝑖𝑖 𝑡𝑡ℎ neuron given by [3] is

𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝜕𝜕Ε


= − − (4.3)
𝑑𝑑𝑑𝑑 𝜏𝜏 𝜕𝜕𝑉𝑉𝑖𝑖

But the effectiveness of this sigmoid model is not guaranteed [4] [5].
𝑢𝑢𝑖𝑖
Takefuji [6] proved that the decay term − in eq.(4.3) is harmful, as it increases the
𝜏𝜏
energy function Ε under certain circumstances. Removing it from eq.(4.3) solves the
problem. That is, the energy function Ε is forced to decrease if the decay term is removed.
In his book, Takefuji presents a number of different neural network models without the
decay term that have been successfully used for several optimization problem.

4.2 WHY IS THE DECAY TERM HARMFUL?


The goal of using the artificial neural network is to minimize a fabricated computational
energy function E where it is given by considering the necessary and sufficient constraints
and the cost function. The computational energy function E is given by: Ε =
Ε( 𝑉𝑉1 , 𝑉𝑉2 , … , 𝑉𝑉𝑛𝑛 ) Before showing why the artificial neural network can minimize the
fabricated function Ε( 𝑉𝑉1 , 𝑉𝑉2 , … , 𝑉𝑉𝑛𝑛 ), “why is the decay term is harmful?” is shown
although it has been widely believed that the decay term is absolutely essential. The
motion equation of the 𝑖𝑖 𝑡𝑡ℎ neuron with the decay term is generally given by:
𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝜕𝜕Ε (4.3)
= − −
𝑑𝑑𝑑𝑑 𝜏𝜏 𝜕𝜕𝑉𝑉𝑖𝑖
45

𝑢𝑢𝑖𝑖
where in eq.(4.3) is the controversial decay term. Note that 𝜏𝜏 is a constant parameter.
𝜏𝜏
In order to mathematically prove the use of the decay term in the motion equation
harmful, the conditions to increase the fabricated energy function E instead of decreasing
it are given in Theorem 4.1.

Theorem 4.1 The use of the decay term in eq.(4.3) increases the computational energy E
when
𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑖𝑖
� ∑𝑖𝑖 𝑑𝑑𝑑𝑑 𝜏𝜏
� > � ∑𝑖𝑖 � � � 𝑑𝑑𝑑𝑑 � � and if either ( 𝑢𝑢𝑖𝑖 > 0 𝑎𝑎𝑎𝑎𝑎𝑎 < 0 ) or
𝑑𝑑𝑑𝑑 𝑖𝑖 𝑑𝑑𝑑𝑑

𝑑𝑑𝑑𝑑𝑖𝑖
� 𝑢𝑢𝑖𝑖 < 0 𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑑𝑑
> 0 � is satisfied.

Proof:
Consider the derivatives of the computational energy E with respect to time t.
𝑑𝑑Ε 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε 𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε
= ∑𝑖𝑖 𝑑𝑑𝑑𝑑 𝜕𝜕𝑉𝑉𝑖𝑖
= ∑𝑖𝑖
𝑑𝑑𝑑𝑑
�−
𝜏𝜏

𝑑𝑑𝑑𝑑
� where is replaced by
𝑑𝑑𝑑𝑑 𝜕𝜕𝜕𝜕𝑖𝑖
𝑢𝑢𝑖𝑖 𝑑𝑑𝑢𝑢𝑖𝑖
�− − � from eq.(4.3)
𝜏𝜏 𝑑𝑑𝑑𝑑

𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑢𝑢𝑖𝑖


= − ∑𝑖𝑖 − ∑𝑖𝑖 = − ∑𝑖𝑖 ∑𝑖𝑖 � �� �
𝑑𝑑𝑑𝑑 𝜏𝜏 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝜏𝜏 𝑑𝑑𝑑𝑑 𝑑𝑑𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑

𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 2


= − ∑𝑖𝑖 − ∑𝑖𝑖 � �� �
𝑑𝑑𝑑𝑑 𝜏𝜏 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖
The first term − ∑𝑖𝑖 can be positive, negative, or zero.
𝑑𝑑𝑑𝑑 𝜏𝜏

𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 2
The second term − ∑𝑖𝑖 � �� � is always negative or zero, because the output
𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑
𝑉𝑉𝑖𝑖 = 𝑓𝑓(𝑢𝑢𝑖𝑖 ) is a nondecreasing function. The following condition can be true:
𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 2
− ∑𝑖𝑖 − ∑𝑖𝑖 � �� � >0
𝑑𝑑𝑑𝑑 𝜏𝜏 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑

𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 2


when � ∑𝑖𝑖 𝑑𝑑𝑑𝑑 𝜏𝜏
� > � ∑𝑖𝑖 � �� � � and if one of the following condition
𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑
46

𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖
is satisfied : �𝑢𝑢𝑖𝑖 > 0 𝑎𝑎𝑎𝑎𝑎𝑎 < 0� or � 𝑢𝑢𝑖𝑖 < 0 𝑎𝑎𝑎𝑎𝑎𝑎 > 0 � . Under such a
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
𝑑𝑑Ε
condition the derivatives of Ε with respect to time t must be positive; > 0.
𝑑𝑑𝑑𝑑

Therefore, the decay term increases the energy function under such conditions, which
contradicts the conventional convergence theorem.
The harmfulness of using the decay term in the motion equation can be easily tested
by empirical simulation with varying the value of' 𝜏𝜏. In order to satisfy the convergence to
the local minimum, we must eliminate the decay term from the motion equation. Theorem
4.2 states that the computational energy function E monotonically decreases regardless of
the condition of the symmetry and diagonal constraints in the conductance matrix as long
as the neurons obey a nondecreasing function and the motion equation of the 𝑖𝑖 𝑡𝑡ℎ neuron
is given by:

𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε
=− (4.4)
𝑑𝑑𝑑𝑑 𝜕𝜕𝑉𝑉𝑖𝑖
𝑑𝑑Ε 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε
Theorem 4.2 ≤ 0 is satisfied under two conditions (1) = − and
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝜕𝜕𝑉𝑉𝑖𝑖

(2) 𝑉𝑉𝑖𝑖 = 𝑓𝑓(𝑢𝑢𝑖𝑖 ) , where (𝑢𝑢𝑖𝑖 ) , where 𝑓𝑓(𝑢𝑢𝑖𝑖 ) is a nondecreasing function.

Proof:
𝑑𝑑Ε 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε
= ∑𝑖𝑖
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕𝜕𝜕𝑖𝑖

𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε 𝑑𝑑𝑑𝑑𝑖𝑖


= − ∑𝑖𝑖( )2 where
𝜕𝜕𝜕𝜕𝑖𝑖
is replaced by − ( condition 1 )
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑𝑖𝑖
≤0 where > 0 ( condition 2 )
𝑑𝑑𝑑𝑑𝑖𝑖

Theorem 4.2 guarantees the convergence of the continuous system. However, the system
is usually simulated on the digital computer. There always exist errors between the real
values and the quantized one. For example, the discrete sigmoid function must be used in
the digital computer instead of the continuous one.
47

4.3 MAXIMUM NEURON MODEL


The neural network model proposed by M.O.Berger [2] uses a two-dimensional neural
network model called the “maximum neural model” consisting of M clusters each with N
neurons, resulting in M × N processing elements. The energy function extended for
the two-dimensional case written as:

𝑀𝑀 𝑁𝑁 𝑀𝑀 𝑁𝑁 𝑀𝑀 𝑁𝑁
1
Ε = − � � � � 𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚′𝑗𝑗 𝑉𝑉𝑚𝑚𝑚𝑚 𝑉𝑉𝑚𝑚′𝑖𝑖 − � � 𝑉𝑉𝑚𝑚𝑚𝑚 𝐼𝐼𝑚𝑚𝑚𝑚 (4.5)
2 ′ 𝑚𝑚=1 𝑖𝑖=1 𝑚𝑚 =1 𝑗𝑗=1 𝑚𝑚=1 𝑖𝑖=1

Only the one neuron in each cluster with the maximum state will have nonzero output, if
there is more than one neuron with the same maximum input in any cluster, the neuron
with the smallest subscript has nonzero output. The output of the other neurons in the
same cluster become zero, so that always one and only one neuron in each cluster has
nonzero output. The input/output function of the 𝑖𝑖 𝑡𝑡ℎ maximum neuron in the 𝑚𝑚𝑡𝑡ℎ cluster
is defined as

1 ∶ 𝑖𝑖𝑖𝑖 𝑢𝑢𝑚𝑚𝑚𝑚 = 𝑚𝑚𝑚𝑚𝑚𝑚{𝑢𝑢𝑚𝑚1 , 𝑢𝑢𝑚𝑚2 , … , 𝑢𝑢𝑚𝑚𝑚𝑚 } 𝑎𝑎𝑎𝑎𝑎𝑎 𝑢𝑢𝑚𝑚𝑚𝑚 ≥ 𝑢𝑢𝑚𝑚𝑚𝑚 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 > 𝑗𝑗
𝑉𝑉𝑚𝑚𝑚𝑚 = �
0 ∶ 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 (4.6)

The convergence conditions for the maximum neural model to a local minimum of the
energy function Ε are given by the following theorem.

𝑑𝑑Ε 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε


Theorem 4.3 ≤ 0 is satisfied under two conditions (1) = − and
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝜕𝜕𝑉𝑉𝑖𝑖

(2) 𝑉𝑉𝑚𝑚𝑚𝑚 = 1 and 𝑢𝑢𝑚𝑚𝑚𝑚 = 𝑚𝑚𝑚𝑚𝑚𝑚{𝑢𝑢𝑚𝑚1 , 𝑢𝑢𝑚𝑚2 , … , 𝑢𝑢𝑚𝑚𝑚𝑚 } 𝑎𝑎𝑎𝑎𝑎𝑎 𝑢𝑢𝑚𝑚𝑚𝑚 ≥ 𝑢𝑢𝑚𝑚𝑚𝑚 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 > 𝑗𝑗 , 0
otherwise.[ 6]
Proof:
Consider the derivatives of the computational energy Ε with respect to time t.
48

𝑑𝑑Ε 𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝜕𝜕Ε


= ��
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 𝜕𝜕𝑉𝑉𝑚𝑚,𝑖𝑖
𝑚𝑚 𝑖𝑖

𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
= −��� �
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
𝑚𝑚 𝑖𝑖

𝜕𝜕Ε 𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖
where is replaced by −
𝜕𝜕𝑉𝑉𝑚𝑚,𝑖𝑖 𝑑𝑑𝑑𝑑

(condition 1)
𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 𝑢𝑢𝑚𝑚,𝑖𝑖 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑖𝑖 (𝑡𝑡)
Let be .
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑

𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝑉𝑉𝑚𝑚,𝑖𝑖 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑉𝑉𝑚𝑚,𝑖𝑖 (𝑡𝑡)


Let be .
𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 𝑢𝑢𝑚𝑚,𝑖𝑖 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑖𝑖 (𝑡𝑡)

𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
Let us consider the term ∑𝑖𝑖 � � for each module separately.
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖

Let 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) be the maximum at time 𝑡𝑡 + 𝑑𝑑𝑑𝑑 and 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) be the maximum at
time t for the module m.

𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) = max{𝑢𝑢𝑚𝑚,1 (𝑡𝑡 + 𝑑𝑑𝑑𝑑), 𝑢𝑢𝑚𝑚,2 (𝑡𝑡 + 𝑑𝑑𝑑𝑑), 𝑢𝑢𝑚𝑚,3 (𝑡𝑡 + 𝑑𝑑𝑑𝑑), 𝑢𝑢𝑚𝑚,4 (𝑡𝑡 + 𝑑𝑑𝑑𝑑)}
𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) = max{𝑢𝑢𝑚𝑚,1 (𝑡𝑡), 𝑢𝑢𝑚𝑚,2 (𝑡𝑡), 𝑢𝑢𝑚𝑚,3 (𝑡𝑡), 𝑢𝑢𝑚𝑚,4 (𝑡𝑡)}
It is necessary and sufficient to consider the following two cases :

1) 𝑎𝑎 = 𝑏𝑏
2) 𝑎𝑎 ≠ 𝑏𝑏
If the condition 1) is satisfied, then there is no state change for the module m.

𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
Consequently, ∑𝑖𝑖 � � must be zero.
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
49

If 2) is satisfied, then

𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡) 2 𝑉𝑉𝑚𝑚,𝑎𝑎 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑉𝑉𝑚𝑚,𝑎𝑎 (𝑡𝑡)
∑𝑖𝑖 � � = � �
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝑑𝑑𝑑𝑑 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡)

𝑢𝑢𝑚𝑚,𝑏𝑏 �𝑡𝑡+𝑑𝑑𝑑𝑑�− 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) 2 𝑉𝑉𝑚𝑚,𝑏𝑏 �𝑡𝑡+𝑑𝑑𝑑𝑑�− 𝑉𝑉𝑚𝑚,𝑏𝑏 (𝑡𝑡)


+� 𝑑𝑑𝑑𝑑
� 𝑢𝑢 �𝑡𝑡+𝑑𝑑𝑑𝑑�− 𝑢𝑢 (𝑡𝑡)
𝑚𝑚,𝑏𝑏 𝑚𝑚,𝑏𝑏

𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡) 2 1


= � �
𝑑𝑑𝑑𝑑 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡)

𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) 2 −1


+� �
𝑑𝑑𝑑𝑑 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡)

𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡) 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡)


= −
(𝑑𝑑𝑑𝑑)2 (𝑑𝑑𝑑𝑑)2

1
= { 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) − 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡) − 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) + 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) }
(𝑑𝑑𝑑𝑑)2

1
= { 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) − 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) + 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) − 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡) }
(𝑑𝑑𝑑𝑑)2

>0
because 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) is the maximum at time 𝑡𝑡 + 𝑑𝑑𝑑𝑑 and 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) is the maximum at
time t for the module m.
The contribution from each term is either 0 or positive, therefore

𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝑑𝑑Ε


∑𝑖𝑖 � � ≥0 and − ∑𝑚𝑚 ∑𝑖𝑖 � � ≤0 ⇒ ≤0
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝑑𝑑𝑑𝑑

The termination condition of the net is given as follows: As long as the system
reaches a stable state or an equilibrium state, the procedure will terminate. The equilibrium
50

state of the maximum neural model is defined as all firing neurons having the smallest rate
of change of the input per cluster. In contrast to existing Hopfield neural networks, where
the condition of the system convergence has never been clearly defined, the condition of
system convergence has never been clearly defined, the condition of the equilibrium state
for the maximum neural model is given by:

𝑑𝑑𝑑𝑑𝑚𝑚𝑚𝑚 𝑑𝑑𝑑𝑑𝑚𝑚1 𝑑𝑑𝑑𝑑𝑚𝑚2 𝑑𝑑𝑑𝑑𝑚𝑚𝑚𝑚


𝑉𝑉𝑚𝑚𝑚𝑚 = 𝑓𝑓(𝑢𝑢𝑚𝑚𝑚𝑚 ) = 1 𝑎𝑎𝑎𝑎𝑎𝑎 = max{ , ,⋯, } (4.7)
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑

4.4 Convergence of Maximum Neural Nets to Valid Solutions

The convergence proof of a Hopfield net to a stable state when simulated on a computer
is proven by Rojas [7].
Theorem 4.4 A Hopfield net with n neurons reaches equilibrium when simulated using
asynchronous update starting from arbitrary input states.
Proof:

For a vector = (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ) , a vector 𝑦𝑦 = (𝑦𝑦1 , 𝑦𝑦2 , … , 𝑦𝑦𝑘𝑘 ) and an 𝑛𝑛 × 𝑘𝑘 weight
matrix 𝐖𝐖 = �𝑤𝑤𝒊𝒊𝒊𝒊 � the energy function is the bilinear form

𝑤𝑤11 𝑤𝑤22 ⋯ 𝑤𝑤1𝑘𝑘 𝑦𝑦1


1 𝑤𝑤21 𝑤𝑤22 … 𝑤𝑤2𝑘𝑘 𝑦𝑦2
Ε(𝑥𝑥, 𝑦𝑦) = − (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ) � � � ⋮�
2 ⋮ ⋱ ⋮
𝑤𝑤𝑛𝑛1 𝑤𝑤𝑛𝑛2 ⋯ 𝑤𝑤𝑛𝑛𝑛𝑛 𝑦𝑦𝑘𝑘

The value of Ε(𝑥𝑥, 𝑦𝑦) can be computed by multiplying first 𝐖𝐖 by 𝑦𝑦 𝑇𝑇 and the result with
𝑥𝑥
− . The product of the i-th row of W and 𝑦𝑦 𝑇𝑇 represents the excitation of the i-th unit
2
in the left layer. If we denote these excitations by 𝑔𝑔1 , 𝑔𝑔2 , … , 𝑔𝑔𝑛𝑛 the above expression
transforms to
𝑔𝑔1
1 𝑔𝑔2
Ε(𝑥𝑥, 𝑦𝑦) = − (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ) � ⋮ �
2
𝑔𝑔𝑘𝑘
51

We can also compute Ε(𝑥𝑥, 𝑦𝑦) multiplying first x by W. The product of the i-th column
of W with x corresponds to the excitation of unit 𝑖𝑖 in the right layer. If we denote these
excitations by 𝑒𝑒1 , 𝑒𝑒2 , … , 𝑒𝑒𝑘𝑘 , the expression for Ε(𝑥𝑥, 𝑦𝑦) can be written as
𝑦𝑦1
1 𝑦𝑦2
Ε(𝑥𝑥, 𝑦𝑦) = − (𝑒𝑒1 , 𝑒𝑒2 , … , 𝑒𝑒𝑘𝑘 ) � ⋮ �.
2
𝑦𝑦𝑘𝑘
Therefore, the energy function can be written in the two equivalent forms
𝑘𝑘 𝑘𝑘
1 1
Ε(𝑥𝑥, 𝑦𝑦) = − � 𝑒𝑒𝑖𝑖 𝑦𝑦𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎 Ε(𝑥𝑥, 𝑦𝑦) = − � 𝑔𝑔𝑖𝑖 𝑥𝑥𝑖𝑖 .
2 2
𝑖𝑖=1 𝑖𝑖=1

In asynchronous networks at each time t we randomly select a unit from the left or right
layer. The excitation is computed and its sign is the new activation of the unit. If the
previous activation of the unit remains the same after this operation, then the energy of
the network has not changed. The state of unit 𝑖𝑖 on the left layer will change only when
the excitation 𝑔𝑔𝑖𝑖 has a different sign than 𝑥𝑥𝑖𝑖 , the present state. The state is updated from
𝑥𝑥𝑖𝑖 to 𝑥𝑥𝑖𝑖′ , where 𝑥𝑥𝑖𝑖′ now has the same sign as 𝑔𝑔𝑖𝑖 . Since the other units do not change their
state, the difference between the previous energy Ε(𝑥𝑥, 𝑦𝑦) and the new energy Ε(𝑥𝑥′, 𝑦𝑦) is

1
Ε(𝑥𝑥, 𝑦𝑦) − Ε(𝑥𝑥 ′ , 𝑦𝑦) = − 𝑔𝑔𝑖𝑖 (𝑥𝑥𝑖𝑖 − 𝑥𝑥𝑖𝑖′ ).
2

Since both 𝑥𝑥𝑖𝑖 and −𝑥𝑥𝑖𝑖 have a different sign than 𝑔𝑔𝑖𝑖 it follows that

Ε(𝑥𝑥, 𝑦𝑦) − Ε(𝑥𝑥 ′ , 𝑦𝑦) > 0.


The new state (𝑥𝑥 ′ , 𝑦𝑦) has a lower energy than the original state (𝑥𝑥, 𝑦𝑦). The same
argument can be made if a unit on the right layer has been selected, so that for the new
state (𝑥𝑥, 𝑦𝑦 ′ ) it holds that

Ε(𝑥𝑥, 𝑦𝑦) − Ε(𝑥𝑥, 𝑦𝑦 ′ ) > 0,

whenever the state of a unit in the right layer has been flipped.
52

Any update of the network state reduces the total energy. Since there are only a
finite number of possible combinations of bipolar states, the process must stop at some
point, that is, a state (𝑎𝑎, 𝑏𝑏) is found whose energy cannot be further reduced. The network
has fallen into a local minimum of the energy function and the state (𝑎𝑎, 𝑏𝑏)is an attractor
of the system.
If a Hopfield net is simulated using synchronous update (as in [8] , the network will
eventually show oscillatory behaviour and not converge to a minimum. The net will fall
into the so called “cycle trap”. Theorem 4.4 guarantees convergence of the net and thus
termination of the algorithm.
Applying the previous theorem to the maximum neural net leads to lemma 4.1. If
a problem is mapped on a maximum neural net, so that a solution requires one and only
one neuron to fire in each cluster, a stable state will always represent a valid solution. The
maximum net is guaranteed to solve a problem, when Theorem 4.3 and Theorem 4.4 are
applied.
Lemma 4.1 A maximum neural net is guaranteed to converge to a valid solution of
problem L, if the solution requires one and only one neuron per cluster to fire.

4.5 Computational Power of Hopfield Neural Nets


Hopfield network are using the same calculations as traditional algorithms. This means
that they belong to a subset of all possible algorithms and will not solve NP-complete
problems in polynomial time. Bruck and Goodman [9] proved the following theorem.
Lemma 4.2 If the complement of an NP-complete problem is in NP, then NP = co – NP.[
10]
Theorem 4.5 Let L be a NP-complete problem. The number of processing elements is
polynomially bounded with the size of problem L. The existence of a Hopfield net to solve
L means NP = co – NP.
Proof:
Let L be an NP-hard problem. Suppose there exists a neural network that solves L.
Let L be an NP-complete problem. By definition, L� can be polynomially reduced to L.

Thus, for every instance X ∈ L� , we have a neural network such that from any of its global
maxima we can efficiently recognize whether X is a 'yes' or a 'no' instance of L� .
We claim that we have a nondeterministic polynomial time algorithm to decide
that a given instance X ∈ L� , is a 'no' instance. Here is how we do it: for X ∈ L� we
construct the neural network that solves it by using the reduction to L. We then
nondeterministically examine every state of the network to see if it is a local maximum
53

(that is done in polynomial time). In case it is a local maximum, we check if the instance
is a 'yes' or a 'no' instance (this is also done in polynomial time).
Thus, we have a nondeterministic polynomial time algorithm to recognize any
'no' instance of L� . Thus, the complement of the problem L� is in NP. But L� is an NP-
complete problem; hence, from Lemma 4.2 it follows that NP = co-NP.

Even if the conditions of previous theorem are relaxed, there is no way to perform better.

4.6 Neural Network Representation for Graph Coloring


These steps are generally necessary when using a neural net to solve optimization
problems:

1. Define a set of variables that can take only the values 0 or 1 representing
possible solutions for the problem.
2. Create one neuron for each variable.
3. Translate the optimization criteria into cost function using the variables as
defined in step 1.
4. Translate the cost functions into an energy function E.
5. Derive from E the coupling weights 𝑇𝑇𝑖𝑖𝑖𝑖 and the external input 𝐼𝐼𝑖𝑖 .
6. Apply the motion equation, starting from an arbitrary initial state.
7. Stop computation, when equilibrium state is reached, else go to the previous
step.
8. Interpret the result according to the model.

The graph coloring problem is defined as follows:


Instance: Given the adjacency matrix D of a simple, undirected graph 𝐺𝐺(𝑉𝑉, 𝐸𝐸)
and 𝑘𝑘 ≥ 𝜒𝜒(𝐺𝐺).

Question: Find a valid k-coloring of G.

This leads to the following neural representation:

1. The problem is mapped on 𝑘𝑘|𝑉𝑉| variables 𝑉𝑉𝑚𝑚𝑚𝑚 with

1 ∶ 𝑖𝑖𝑖𝑖 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑚𝑚 𝑖𝑖𝑖𝑖 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤ℎ 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖 (4.8)


𝑉𝑉𝑚𝑚𝑚𝑚 = �
0 ∶ 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
54

2. Each variable 𝑉𝑉𝑚𝑚𝑚𝑚 is associated with the output 𝑓𝑓(𝑢𝑢𝑚𝑚𝑚𝑚 ) of a |𝑉𝑉| × 𝑘𝑘


maximum neural network (eq.4.6) with |𝑉𝑉| clusters and k neurons per cluster.
The output of the 𝑚𝑚𝑖𝑖𝑖𝑖ℎ neuron presents the 𝑖𝑖 𝑡𝑡ℎ out of the possible 𝑘𝑘 colors for
the 𝑚𝑚𝑖𝑖𝑖𝑖ℎ vertex (see figure 4.2).

colors
vertices

Figure 4.2 Mapping the graph coloring problem onto neural net.
3. The optimization criteria are:

(a) One and only one color is to be assigned to each vertex.


(b) No two adjacent vertices should have the same color.
The first criterion can be expressed by the cost function

|𝑉𝑉| 𝑘𝑘 2

𝐸𝐸1 = � �� 𝑉𝑉𝑚𝑚𝑚𝑚 − 1� (4.9)


𝑚𝑚=1 𝑖𝑖=1

which is positive if the first constraint is violated and zero if not. The second
criterion can be expressed as cost function

|𝑉𝑉| |𝑉𝑉| 𝑘𝑘

𝐸𝐸2 = � � � 𝑑𝑑𝑚𝑚𝑚𝑚′ 𝑉𝑉𝑚𝑚𝑚𝑚 𝑉𝑉𝑚𝑚′𝑖𝑖 (4.10)


𝑚𝑚=1 𝑚𝑚′ =1 𝑖𝑖=1
𝑚𝑚′ ≠𝑚𝑚

where 𝑑𝑑𝑚𝑚𝑚𝑚′ is the 𝑚𝑚𝑚𝑚′ th entry in the adjacency matrix D of G. 𝐸𝐸2 is positive
if two adjacent vertices are colored with the same color and zero if a G has a
valid coloring. Note that the formulation of the criteria as cost functions is not
unique.
55

4. When using a maximum neural model, the first constrain 𝐸𝐸1 can be eliminated,
as the model itself already has the behaviour that one and only one neuron is
firing in each cluster of the network. Therefore, the computational energy E is
given by

|𝑉𝑉| |𝑉𝑉| 𝑘𝑘
1 1
𝐸𝐸 = 𝐸𝐸2 = � � � 𝑑𝑑𝑚𝑚𝑚𝑚′ 𝑉𝑉𝑚𝑚𝑚𝑚 𝑉𝑉𝑚𝑚′ 𝑖𝑖 (4.11)
2 2 ′ 𝑚𝑚=1 𝑚𝑚 =1 𝑖𝑖=1
𝑚𝑚′ ≠𝑚𝑚

It is not required to tune any coefficient parameters in E.

5. The coupling weights and the external inputs are then defined as follows:

𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚′𝑗𝑗 = 𝑑𝑑𝑚𝑚𝑚𝑚′ 𝛿𝛿𝑖𝑖𝑖𝑖 (1 − 𝛿𝛿𝑚𝑚𝑚𝑚′ ) (4.12)

𝐼𝐼𝑚𝑚𝑚𝑚 = 0 (4.13)
where 𝛿𝛿 denotes Kronecker’s delta. 𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚′ 𝑗𝑗 inhibits connections within each
row of the neuron matrix.

6. Starting with random 𝑢𝑢𝑚𝑚𝑚𝑚 , the motion equation is computed until equilibrium
state is reached. The motion equation is written as (decay term removed)

|𝑉𝑉|
𝑑𝑑𝑑𝑑𝑚𝑚𝑚𝑚 𝜕𝜕𝜕𝜕
=− = − � 𝑑𝑑𝑚𝑚𝑚𝑚′ 𝑉𝑉𝑚𝑚′𝑖𝑖 (4.14)
𝑑𝑑𝑑𝑑 𝜕𝜕𝑉𝑉𝑚𝑚𝑚𝑚 ′ 𝑚𝑚 =1
𝑚𝑚′ ≠𝑚𝑚

7. The equilibrium state is defined as in eq.(4.7).

8. The result is a valid 𝑘𝑘-coloring ∁ = {𝑊𝑊1 , 𝑊𝑊2 , … , 𝑊𝑊𝑘𝑘 } of G, where

𝑊𝑊𝑖𝑖 = { 𝑚𝑚 𝜖𝜖 𝑉𝑉 |𝑉𝑉𝑚𝑚𝑚𝑚 = 1}

and 𝑉𝑉𝑚𝑚𝑚𝑚 indicates the coloring as defined in eq.(4.8). Note that 𝑊𝑊𝑖𝑖 ≠ ∅ not
necessarily holds.
56

4.6 Pseudo Code

The discrete network model described in the previous section was simulated on a
sequential computer using parallel asynchronous update. The internal states, neuron
outputs and motions were represented as matrices. The pseudo-code for the algorithm is
as follows ( 𝑚𝑚 ∈ { 1, 2, … , |𝑉𝑉|}, 𝑖𝑖 ∈ {1,2, … , 𝑘𝑘} ) .

1. Fill internal state matrix 𝑈𝑈 ≔ �(𝑢𝑢𝑚𝑚𝑚𝑚 )� ∀ 𝑚𝑚, 𝑖𝑖 with random integers.

2. Compute output matrix 𝑉𝑉 ≔ �(𝑉𝑉𝑚𝑚𝑚𝑚 )� ∀ 𝑚𝑚, 𝑖𝑖 using eq.(4.6).

𝑑𝑑𝑑𝑑𝑚𝑚𝑚𝑚
3. Compute rate of change matrix ∆ 𝑈𝑈 ∶= �� �� ∀ 𝑚𝑚, 𝑖𝑖 using eq.(4.14).
𝑑𝑑𝑑𝑑

4. Terminate, if equilibrium (as given in eq.(4.7) ) is reached.

5. Using the first order Euler method, assign 𝑈𝑈 ∶= 𝑈𝑈 + ∆𝑈𝑈 using asynchronous
update.

6. Go to step 2.

If the graph is 𝑘𝑘-colorable, the algorithm terminates with a valid solution, which is
guaranteed by the model (lemma 4.1). Thus a 100% convergence rate to the global
minimum is given. The coloring can be found in matrix 𝑉𝑉 according to step 8 given in
Neural Network Representation section.
57

References
1. Simon Haykin (1999), NEURAL NETWORKS A Comprehensive Foundation
Second Edition, Pearson Education,p-701.ISBN 81-7808-300-0.

2. M.O. Berger, K-Colouring vertices using a neural network with convergence to


valid solutions, Proc. International Conf. on Neural Networks, 1994.

3. Hopfield, J.J., and Tank, D.W., "Neural computation of decisions in


optimization problems", Biol. Cybern.,vol.52,pp.141 - 152,1985.

4. Kunz,D., "Suboptimum solutions obtained by the Hopfield-Tank neural


network algorithm", Biol. Cybern., vol. 65, pp.129-133,1991.

5. Wilson,G.V.,and Pawley, G.S., "On the stability of the Travelling Salesman


algorithm of Hopfield and Tank", Biol.Cybern., vol.58, pp.63-70,1988.

6. Takefuji,Y.,Neural network parallel computing, Kluwer Academic Publishers,


Dordrecht,Netherlands,1992.

7. R.Rojas, Neural Networks A Systematic Introduction ,Springer-


Verlag,Berlin,pp.342-343,1996.

8. Takefuji, Y., and Lee, K.C., " Artificial Neural Network for Four-Coloring
Map Problems and K-Colorability Problems ", IEEE Trans. Circuits
Systems,vol.38,no.3,pp.325-333,Mar.1991.

9. Bruck,J., and Goodman, J., "On the Power of Neural Networks for Solving
Hard Problems",Journal of Complexity,vol.6,pp.129-135,1990.
58

Chapter 5
Proposed Approach
59

5.1 Proposed Modified HNN for Graph Coloring

The method proposed by M.O.Berger[1] give 100% convergence of the network to a valid
solution. Whenever their proposed system converges, the corresponding configuration is
always forced to be a valid solution, while none of the existing neural network can
guarantee it. However, M.O.Berger (their ) method takes huge amount of time to converge
to a valid solution, which is one of its drawback. In their method they process each cluster
one by one sequentially without any order or priority given to any cluster and check
whether the cluster reach equilibrium or not.
In our approach, we proposed a new modified version of M.O.Berger’s method [1]
and try to make the network converge faster to a valid solution. We take the priority of
each cluster into consideration using the degree of each vertex. The vertex with the
highest degree is processed first, and checked for equilibrium. In this way, we process each
cluster in decreasing order of their vertex degree. The vertex with the lowest degree is
processed last.

5.2 Pseudo-Code
The internal states, neuron outputs and motions were represented as matrices. The degrees
of vertices are stored in an array. The pseudo-code for the algorithm is as follows
(𝑚𝑚 ∈ {1,2, … , |𝑉𝑉|), 𝑖𝑖 ∈ {1,2, … , 𝑘𝑘})
1. Compute Adjacency Matrix of graph 𝐺𝐺.
2. Compute degree array D such that 𝐷𝐷𝑚𝑚 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑(𝑉𝑉𝑚𝑚 ).
3. Sort D such that 𝐷𝐷𝑖𝑖 ≥ 𝐷𝐷𝑗𝑗 ∀ 𝑖𝑖 < 𝑗𝑗 where 𝑗𝑗 ∈ {1,2, … , |𝑉𝑉|} and 𝑖𝑖 ≠ 𝑗𝑗.
4. Set 𝑘𝑘 = Δ(𝐺𝐺) + 1
index = 1
5. Fill internal state matrix 𝑈𝑈 ≔ �(𝑢𝑢𝑚𝑚𝑚𝑚 )� ∀ 𝑚𝑚, 𝑖𝑖 with random integers.
6. Compute output matrix 𝑉𝑉 ≔ �(𝑉𝑉𝑚𝑚𝑚𝑚 )� ∀ 𝑚𝑚, 𝑖𝑖 using eq.(4.6).
𝑑𝑑𝑑𝑑𝑚𝑚𝑚𝑚
7. Compute rate of change matrix ∆ 𝑈𝑈 ∶= �� �� ∀ 𝑚𝑚, 𝑖𝑖 using eq.(4.14).
𝑑𝑑𝑑𝑑
8. While 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 ≤ |𝑉𝑉|
if equilibrium is reached for 𝑉𝑉𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 then,
increment 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 ≔ 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 1
else
(a) Using the first order Euler method, assign U∶=U+ ∆U using
asynchronous update.
(b) Repeat Step 6 and 7
(c) Go to Step 8.
In the pseudo-code, Step 1 is used to generate the adjacency matrix of a given graph which
is further used for computing rate of change matrix Δ𝑈𝑈. In Step 2 we compute degree of
each vertex and store it in an array D. Next in Step 3 we sort the elements of array D in
60

non-increasing order. In Step 4 we set 𝑘𝑘 = Δ(𝐺𝐺) + 1 i.e., the chromatic number of a graph
required to color a graph is at most the maximum degree of graph plus 1. [2] and set index
=1. index is used as a pointer to the degree array, which will select the vertices with highest
degree from the degree array. In Step 5 we create a internal state 𝑈𝑈 matrix of size |𝑉𝑉| × k
and fill it with random integers. Then in Step 6 we compute output matrix 𝑉𝑉 from internal
state matrix 𝑈𝑈 using eq.(4.6) i.e., for each row in 𝑉𝑉, 𝑉𝑉 = 1 if the element with same position
as 𝑉𝑉 in 𝑈𝑈 is the maximum element, otherwise 𝑉𝑉 = 0 for that row. Next in Step 7 we
compute rate of change matrix Δ𝑈𝑈 using eq.(4.14) i.e., taking the negative of matrix
multiplication of adjacency matrix, D and output matrix V. In step 8 we select a vertex
from degree array and check if the cluster corresponding to this vertex is in equilibrium or
not. If the cluster is in equilibrium, we increment index pointer and select the next vertex
from degree array. If the cluster in output matrix is not in equilibrium we randomly select
any element from rate of change matrix (from the same cluster position as in output
matrix) and update internal state matrix. Then we again repeat the Step 6 and 7 and
continue this process until this cluster reaches equilibrium. This process is done for all the
clusters (or vertices).

Our method which uses priority (degree of vertices) for selecting which cluster to
process first, has a tendency to converge faster than the one used by M.O.Berger [1] since
the number of constraint checks reduces when processing clusters based on their degree of
corresponding vertices. In a graph where a number of vertices are connected to a centre
vertex, if we process all the other vertices before processing the centre one, the number of
constraint checks increases because the centre vertex has to check with all the other vertices
before reaching equilibrium state. On the other hand, if we process the centre vertex
(which has highest degree) and then all the other vertices, the number of constraint checks
decreases because all the other vertices will have fewer number of constraint checks than
the previously mentioned case.
61

References

1. M.O. Berger, K-Colouring vertices using a neural network with convergence to


valid solutions, Proc. International Conf. on Neural Networks, 1994.

2. Brooks, R. L. “On Coloring the Nodes of a Network.” Proc. Cambridge Philos.


Soc. 37, p.194-197, 1941.
62

Chapter 6
Comparative Study
63

Comparative Study

The algorithm was implemented and tested using C++ on an Intel(R) Core(TM) i7-
7700HQ CPU @2.80GHz processor on Ubuntu System. Various DIMACS graphs were
used to test and compare M.O.Berger [1] approach and our modified approach.

90
81.261
80

70

60
Time in seconds

50 44.856

40
31.492
28.646
30
22.872
19.832
20

10

0
myciel5.col queen6_6.col queen7_7.col

Hopfield Modified Hopfield

Figure 6.1 Comparision Chart 1

Dataset |V| |E| Hopfield Modified Hopfield


Color Time Color Time
myciel5.col 47 236 23 22.872 23 19.832
queen6_6.col 36 290 20 31.492 20 28.646
queen7_7.col 49 476 23 81.261 24 44.856

Table 6.1

For myciel5.col our method converges 13.29% faster, queen6_6.col converges 9.05%
faster and queen7_7.col converges 44.80% faster.
64

250

212.576

200

155.58
Time in seconds

150

100

50

14.341 14.026 9.403 4.391


0
david.col huck.col queen5_5.col

Hopfield Modified Hopfield

Figure 6.2 Comparision Chart 2

Dataset |V| |E| Hopfield Modified Hopfield


Color Time Color Time
david.col 87 406 57 212.576 56 155.58
huck.col 74 301 42 14.341 41 14.026
queen5_5 25 160 17 9.403 15 4.391

Table 6.2
For david.col our method converges 26.81% faster, huck.col converges 2.19% faster and
queen5_5.col converges 53.30% faster.

0.25
0.202
0.2
0.168
Time in seconds

0.15

0.1

0.05
0.007 0.006
0
K4.col myciel3.col

Hopfield Modified Hopfield

Figure 6.3 Comparision Chart 3


65

Dataset |V| |E| Hopfield Modified Hopfield


Color Time Color Time
K4.col 4 6 4 0.007 4 0.006
myciel3.col 11 20 5 0.202 5 0.168

Table 6.3

For K4.col which is a complete graph with 4 nodes our method converges 14.28% faster,
myciel3.col converges 16.83% faster.

800
694.598
700 672.938

600
Time in seconds

500
388.923
400
339.182

300
238.362
215.749
200

100

0
games120.col anna.col queen8_8.col

Hopfield Modified Hopfield

Figure 6.4 Comparision Chart 4

Dataset |V| |E| Hopfield Modified Hopfield


Color Time Color Time
games120.col 120 638 14 672.938 14 694.598
anna.col 138 493 64 339.182 66 388.923
queen8_8.col 64 728 27 215.749 26 238.362

Table 6.4
For games120.col our method converges 3.11% slower, anna.col converges 12.78%
slower and queen8_8.col converges 9.48% slower.
66

References
1. M.O. Berger, K-Colouring vertices using a neural network with convergence to
valid solutions, Proc. International Conf. on Neural Networks, 1994.
67

Chapter 7
Conclusion
68

Conclusion
The vertex coloring problem is one of the difficult problems in graph theory. The problem
is modelled in a Hopfield net divided into |V| clusters each with k maximum neurons.
The general principle of mapping optimizations problems as well the particular neural
representation is presented in Chapter 4.
Our proposed method always terminates with a valid solution and has a guaranteed
100% convergence rate to the global minimum of the energy function. Also, our proposed
method converges faster for most of the graphs than the pre-existing method discussed in
Chapter 4. However, for some cases our method was slower but not by large margin. To
verify that our proposed method was applied to a number of example graphs taken from
DIMACS graph coloring dataset.

You might also like