Professional Documents
Culture Documents
Final Documentation
Final Documentation
A Project Report on
Submitted by
We would like to express our deepest gratitude to our advisor Mr. Soumik Ghosh,
Assistant Professor, CSE Department, University Institute of Technology, The
University of Burdwan for his guidance and support. His extreme energy, creativity
and excellent skill of knowledge have always been a constant source of motivation for
us.
We would also like to thank Dr. Souvik Bhattacharya, In-charge, CSE Department,
University Institute of Technology, The University of Burdwan for devoting his time
inspiring and helping us in many aspects.
We are also grateful to Dr. Abhijit Mitra, Principal, University Institute of Technology,
The University of Burdwan to give us the opportunities for continuing our study.
------------------------------------------------------
-------------------------------------------------------
--------------------------------------------------------
--------------------------------------------------------
Certificate of Approval
This is to certify that project entitled “A Modified HNN Approach for
Constraint Satisfaction Problem” is hereby approved as a creditable
engineering study carried out, presented and submitted by Suman Kumar
Mahato (2015-1035), Navneet Prashant (201501006), Saurav Kumar (2015-
1025), Subhajit Chakraborty (2014-1074) in a satisfactory manner to narrate its
acceptance as prerequisite in partial fulfilment of the academic requirements
for the award of degree of Bachelor of engineering in Computer Science &
Engineering department. It is a bonafide work carried out at University
Institute of Technology, The University of Burdwan. The thesis has not been
submitted for the award of any other degree.
Date:
Keywords:
Constraint Satisfaction Problem(CSP), Graph Coloring Problem (GCP), Hopfield
Network, Maximum Neuron model
1
Contents
Abstract
List of Figures 3
List of Tables 5
1. Chapter 1 : Introduction 6
1.1 Constraint Satisfaction Problem (CSP) 7
1.2 Graph Coloring 8
1.2.1 What is Graph ? 8
1.2 2 Graph Coloring Problem 8
1.2.3 Graph & its Adjacency Matrix 10
1.2.4 Why Coloring? 10
1.2.5 Practical Applications 11
1.3 Artificial Neural Network 18
1.3.1 ANN Model 19
1.3.2 Components of ANN 19
1.3.3 Learning Paradigms 21
References 24
References 41
References 57
2
References 61
References 66
7. Chapter 7 : Conclusion 67
3
List of Figures
2. Fig. 1.2 If we extract the vertices in the dotted circle, we are left 9
with a subgraph that clearly needs more than four colours
5. Fig. 1.5 Tasks allocated to processors, the diagram shows the tasks 13
namely task1, task2, task3 and task4 are allocated to the
processors (P1, P5); (P1, P6); (P2, P4) and (P3, P7) respectively.
7. Fig. 1.7 A set of taxi journey requests over time (a), its 15
corresponding interval graph and 3- colouring (b), and (c) the
corresponding assignment of journeys to taxis.
8. Fig. 1.8 (a) An example computer program together with the live 16
ranges of each variable. Here, the statement “vi ←. . .” denotes
the assignment of some value to variable vi, whereas “. . . vi . . .
” is just some arbitrary operation using vi.
(b) shows an optimal colouring of the corresponding interference
graph
12. Fig. 4.2 Mapping the graph coloring problem onto neural net. 54
List of Tables
1. Table 6.1 63
2. Table 6.2 64
3. Table 6.3 65
4. Table 6.4 65
6
Chapter 1
Introduction
7
Formal definition:
Formally, a constraint satisfaction problem is defined as a triple (X,D,C), where
X= {X1… Xn } is a set of variables,
D= {D1… Dn } is a set of the respective domains of values, and
C= {C1… Cn} is a set of constraints.
Each variable Xi can take on the values in the nonempty domain Di. Every constraint Cj
∈ C is in turn a pair (tj, Rj) , where tj ∪ X is a subset of k variables and Rj is a k -
ary relation on the corresponding subset of domains Dj. An evaluation of the variables is a
function from a subset of variables to a particular set of values in the corresponding subset
of domains Dj. An evaluation v satisfies a constraint (tj ,Rj)if the values assigned to the
variables tj satisfies the relation Rj .
An evaluation is consistent if it does not violate any of the constraints. An evaluation
is complete if it includes all variables. An evaluation is a solution if it is consistent and
complete; such an evaluation is said to solve the constraint satisfaction problem.[1]
For our Project we consider the Graph Coloring Problem as the constraint satisfaction
problem.
8
Formal Definition:
A graph is an ordered pair G = (V, E) comprising:
• V a set of vertices (also called nodes or points);
• E ⊆ {{x, y} | (x, y) ∈ V2 ∧ x ≠ y} a set of edges (also called links or lines), which
are unordered pairs of vertices (i.e., an edge is associated with two distinct vertices).
Figure 1.1 shows a picture of a graph with ten vertices (the circles), and 21 edges (the
lines connecting the circles). It also shows an example colouring of this graph that uses
9
five different colours. We can call this solution a “proper” colouring because all pairs
of vertices joined by edges have been assigned to different colours, as required by the
problem. Specifically, two vertices have been assigned to colour 1, three vertices to
colour 2, two vertices to colour 3, two vertices to colour 4, and one vertex to colour 5.
Fig. 1.2 If we extract the vertices in the dotted circle, we are left with a subgraph that
clearly needs more than four colours
Actually, this solution is not the only possible 5-colouring for this example graph. For
example, swapping the colours of the bottom two vertices in the figure would give us
a different proper 5-colouring. It is also possible to colour the graph with anything
between six and ten colours (where ten is the number of vertices in the graph), because
assigning a vertex to an additional, newly created, colour still ensures that the
colouring remains proper. But what if we wanted to colour this graph using fewer than
five colours? Is this possible? To answer this question, consider Figure 1.2, where the
dotted line indicates a selected portion of the graph. When we remove everything from
outside this selection, we are left with a subgraph containing just five vertices.
Importantly, we can see that every pair of vertices in this subgraph has an edge between
them. If we were to have only four colours available to us, as indicated in the figure
we would be unable to properly colour this subgraph, since its five vertices all need to
be assigned to a different colour in this instance. This allows us to conclude that the
solution in Figure 1.2 is actually optimal, since there is no solution available that uses
fewer than five colours.
10
In Fig 1.3 the graph is represented using a matrix of size total number of vertices
by a total number of vertices. That means a graph with 4 vertices is represented using a
matrix of size 4X4. In this matrix, both rows and columns represent vertices. This matrix
is filled with either 1 or 0. Here, 1 represents that there is an edge from row vertex to
column vertex and 0 represents that there is no edge from row vertex to column vertex.
Fig. 1.4 Illustration of how proper 5- and 4-colourings can be constructed from the same
graph.
12
Let us now attempt to split the eight students of this problem into groups so that each
student is put into a different group to that of his friends’. A simple method to do this
might be to take the students one by one in alphabetical order and assign them to the first
group where none of their friends are currently placed. Walking through the process, we
start by taking student A and assigning him to the first group. Next, we take student B and
see that he is friends with someone in the first group (student A), and so we put him into
the second group. Taking student C next, we notice that he is friends with someone in the
first group (student A) and also the second group (student B), meaning that he must now
be assigned to a third group. At this point we have only considered three students, yet we
have created three separate groups. What about the next student? Looking at the
information we can see that student D is only friends with E and F, allowing us to place
him into the first group alongside student A. Following this, student E cannot be assigned
to the first group because he is friends with D, but can be assigned to the second.
Continuing this process for all eight students gives us the solution shown in Figure 1.4(b).
This solution uses four groups, and also involves student F being assigned to a group by
himself
Can we do any better than this? By inspecting the graph in Figure 1.4(a), we can
see that there are three separate cases where three students are all friends with one another.
Specifically, these are students A, B, and C; students B, E, and F; and students D, E, and
F. The edges between these triplets of students form triangles in the graph. Because of
these mutual friendships, in each case these collections of three students will need to be
assigned to different groups, implying that at least three groups will be needed in any valid
solution. However, by visually inspecting the graph we can see that there is no occurrence
of four students all being friends with one another. This hints that we may not necessarily
need to use four groups in a solution.
In fact, a solution using three groups is actually possible in this case as Figure 1.4(c)
demonstrates. This solution has been achieved using the same assignment process as
before but using a different ordering of the students, as indicated. Since we have already
deduced that at least three groups are required for this particular problem, we can conclude
that this solution is optimal.
The process we have used to form the solutions shown Figures 1.4(b) and (c) is
generally known as the GREEDY algorithm for graph colouring, and we have seen that
the ordering of the vertices (students in this case) can influence the number of colours
(groups) that are ultimately used in the solution it produces. The GREEDY algorithm and
its extensions are a fundamental part of the field of graph colouring and will be considered
further in later chapters. Among other things, we will demonstrate that there will always
be at least one ordering of the vertices that, when used with the GREEDY algorithm, will
result in an optimal solution.
13
Fig. 1.5 . Tasks allocated to processors, the diagram shows the tasks namely task1, task2,
task3 and task4 are allocated to the processors (P1, P5); (P1, P6); (P2, P4) and (P3, P7)
respectively.
Fig. 1.6 A small timetabling problem (a), a feasible 4-colouring (b), and its corresponding
timetable solution using four timeslots (c).
Figure 1.6 shows an example timetabling problem expressed as a graph colouring problem.
Here we have nine events which we have managed to timetable into four timeslots. In this
case, three events have been scheduled into timeslot 1, and two events have been scheduled
into each of the remaining three. In practice, assuming that only one event can take place
in a room at any one time, we would also need to ensure that three rooms are available
during timeslot 1. If only two rooms are available in each timeslot, then an extra timeslot
might need to be added to the timetable. It should be noted that timetabling problems can
often vary a great deal between educational institutions, and can also be subject to a wide
range of additional constraints beyond the event-clash constraint mentioned above.
Figure 1.7(a) shows an example problem where we have ten taxi bookings. For
illustrative purposes these have been ordered from top to bottom according to their start
times. It can be seen, for example, that booking 1 overlaps with bookings 2, 3 and 4; hence
any taxi carrying out booking 1 will not be able to serve bookings 2, 3 and 4. We can
construct a graph from this information by using one vertex for each booking and then
adding edges between any vertex pair corresponding to overlapping bookings. A 3-
colouring of this example graph is shown in Figure 1.7(b), and the corresponding
assignment of the bookings to three taxis (the minimum number possible) is shown in
Figure 1.7(c).
Fig. 1.7 A set of taxi journey requests over time (a), its corresponding interval graph and
3- colouring (b), and (c) the corresponding assignment of journeys to taxis.
In this particular case we see that our example problem has resulted in a graph made of
three smaller graphs (components), comprising vertices v1 to v4, v5 to v7 and v8 to v10
respectively. However, this will not always be the case and will depend on the nature of
the bookings received.
A graph constructed from time-dependent tasks such as this is usually referred to
as an interval graph.
Our fourth and final example in this section concerns the allocation of computer code
variables to registers on a computer processor. When writing code in a particular
programming language, whether it be C++, Pascal, FORTRAN or some other option, the
programmer is free to make use of as many variables as he or she sees fit. When it comes
to compiling this code, however, it is advantageous for the compiler to assign these
variables to registers1 on the processor since accessing and updating values in these
16
locations is far faster than carrying out the same operations using the computer’s RAM
or cache.
Computer processors only have a limited number of registers. For example, most
RISC processors feature 64 registers: 32 for integer values and 32 for floating point values.
However, not all variables in a computer program will be in use (or “live”) at a particular
time. We might therefore choose to assign multiple variables to the same register if they
are seen not to interfere with one another.
Figure 1.8(a) shows an example piece of computer code making use of five
variables, v1, . . . ,v5. It also shows the live ranges for each variable. So, for example, variable
v2 is live only in lines (2) and (3), whereas v3 is live from lines (4) to (9). It can also be
seen, for example, that the live ranges of v1 and v4 do not overlap. Hence we might use
the same register for storing both of these variables at different periods during execution.
Fig. 1.8 (a) An example computer program together with the live ranges of each variable.
Here, the statement “vi ←. . .” denotes the assignment of some value to variable vi,
whereas “. . . vi . . .” is just some arbitrary operation using vi.
(b) shows an optimal colouring of the corresponding interference graph
The problem of deciding how to assign the variables to registers can be modelled as a graph
colouring problem by using one vertex for each live range and then adding edges between
any pairs of vertices corresponding to overlapping live ranges. Such a graph is known as
an interference graph, and the task is to now colour the graph using equal or fewer colours
than the number of available registers. Figure 1.8(b) shows that in this particular case only
three registers are needed: variables v1 and v4 can be assigned to register 1, v2 and v5 to
register 2, and v3 to register 3.
17
Note that in the example of Figure 1.8, the resultant interference graph actually
corresponds to an interval graph, rather like the taxi example from the previous subsection.
Such graphs will arise in this setting when using straight-line code sequences or when using
software pipelining. In most situations however, the flow of a program is likely to be far
more complex, involving if-else statements, loops, goto commands, and so on. In these
cases the more complicated process of liveness analysis will be needed for determining the
live ranges of each variable, which could result in an interference graphs of any arbitrary
topology.
18
Artificial neural networks (ANN) or connectionist systems are computing systems that
are inspired by, but not necessarily identical to, the biological neural networks that
constitute animal brains. Such systems "learn" to perform tasks by considering examples,
generally without being programmed with any task-specific rules. For example, in image
recognition, they might learn to identify images that contain cats by analyzing example
images that have been manually labeled as "cat" or "no cat" and using the results to
identify cats in other images. They do this without any prior knowledge about cats, for
example, that they have fur, tails, whiskers and cat-like faces. Instead, they automatically
generate identifying characteristics from the learning material that they process.
An ANN is based on a collection of connected units or nodes called artificial neurons,
which loosely model the neurons in a biological brain. Each connection, like
the synapses in a biological brain, can transmit a signal from one artificial neuron to
19
another. An artificial neuron that receives a signal can process it and then signal additional
artificial neurons connected to it.
In common ANN implementations, the signal at a connection between artificial neurons
is a real number, and the output of each artificial neuron is computed by some non-linear
function of the sum of its inputs. The connections between artificial neurons are called
'edges'. Artificial neurons and edges typically have a weight that adjusts as learning
proceeds. The weight increases or decreases the strength of the signal at a connection.
Artificial neurons may have a threshold such that the signal is only sent if the aggregate
signal crosses that threshold. Typically, artificial neurons are aggregated into layers.
Different layers may perform different kinds of transformations on their inputs. Signals
travel from the first layer (the input layer), to the last layer (the output layer), possibly after
traversing the layers multiple times.
The original goal of the ANN approach was to solve problems in the same way that
a human brain would. However, over time, attention moved to performing specific tasks,
leading to deviations from biology. Artificial neural networks have been used on a variety
of tasks, including computer vision, speech recognition, machine translation, social
network filtering, playing board and video games and medical diagnosis.
• an activation 𝑎𝑎𝑗𝑗 (𝑡𝑡), the neuron's state, depending on a discrete time parameter,
• possibly a threshold 𝜃𝜃𝑗𝑗 , which stays fixed unless changed by a learning function,
• an activation function 𝑓𝑓 that computes the new activation at a given time 𝑡𝑡 + 1 from
𝑎𝑎𝑗𝑗 (𝑡𝑡), 𝜃𝜃𝑗𝑗 and the net input 𝑝𝑝𝑗𝑗 (𝑡𝑡) giving rise to the relation 𝑎𝑎𝑗𝑗 (𝑡𝑡 + 1) =
𝑓𝑓(𝑎𝑎𝑗𝑗 (𝑡𝑡), 𝑝𝑝𝑗𝑗 (𝑡𝑡), 𝜃𝜃𝑗𝑗 ) ,
20
• and an output function 𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜 computing the output from the activation
𝑜𝑜𝑗𝑗 (𝑡𝑡) = 𝑓𝑓𝑜𝑜𝑜𝑜𝑜𝑜 �𝑎𝑎𝑗𝑗 (𝑡𝑡)�.
An input neuron has no predecessor but serves as input interface for the whole network.
Similarly an output neuron has no successor and thus serves as output interface of the whole
network.
Propagation function
The propagation function computes the input 𝑝𝑝𝑗𝑗 (𝑡𝑡) to the neuron 𝑗𝑗 from the output 𝑜𝑜𝑖𝑖 (𝑡𝑡) of
predecessor neurons and typically has the form[6].
When a bias value is added with the function, the above form changes to the following [7]
𝑝𝑝𝑗𝑗 (𝑡𝑡) = ∑𝑖𝑖 𝑜𝑜𝑖𝑖 (𝑡𝑡)𝑤𝑤𝑖𝑖𝑖𝑖 + 𝑤𝑤0𝑗𝑗 . , where 𝑤𝑤0𝑗𝑗 is a bias.
Learning rule
The learning rule is a rule or an algorithm which modifies the parameters of the neural
network, in order for a given input to the network to produce a favored output. This
learning process typically amounts to modifying the weights and thresholds of the
variables within the network.[6]
Unsupervised learning
In unsupervised learning, some data x is given and the cost function to be minimized, that
can be any function of the data x and the network's output, f. The cost function is
dependent on the task (the model domain) and any a priori assumptions (the implicit
properties of the model, its parameters and the observed variables).
As a trivial example, consider the mode f(x)=a where a is a constant and the cost C=E[(x-
f(x))2].
Minimizing this cost produces a value of a that is equal to the mean of the data. The cost
function can be much more complicated. Its form depends on the application: for example,
in compression it could be related to the mutual information between x and f(x), whereas
in statistical modeling, it could be related to the posterior probability of the model given
the data (note that in both of those examples those quantities would be maximized rather
than minimized).
Tasks that fall within the paradigm of unsupervised learning are in general estimation
problems; the applications include clustering, the estimation of statistical distributions,
compression and filtering.
22
Hebbian learning
In the late 1940s, D. O. Hebb [9] created a learning hypothesis based on the mechanism
of neural plasticity that became known as Hebbian learning. Hebbian learning is
unsupervised learning. This evolved into models for long-term potentiation. Researchers
started applying these ideas to computational models in 1948 with Turing's B-type
machines. Farley and Clark [10] (1954) first used computational machines, then called
"calculators", to simulate a Hebbian network. Other neural network computational
machines were created by Rochester, Holland, Habit and Duda (1956). [11] Rosenblatt [6]
(1958) created the perceptron, an algorithm for pattern recognition. With mathematical
notation, Rosenblatt described circuitry not in the basic perceptron, such as the exclusive-
or circuit that could not be processed by neural networks at the time. [12] In 1959, a
biological model proposed by Nobel laureates Hubel and Wiesel was based on their
discovery of two types of cells in the primary visual cortex: simple cells and complex
cells.[13] The first functional networks with many layers were published by Ivakhnenko
and Lapa in 1965, becoming the Group Method of Data Handling. [14] [15]
Neural network research stagnated after machine learning research by Minsky and Papert
(1969),[16] who discovered two key issues with the computational machines that
processed neural networks. The first was that basic perceptrons were incapable of
processing the exclusive-or circuit. The second was that computers didn't have enough
processing power to effectively handle the work required by large neural networks. Neural
network research slowed until computers achieved far greater processing power. Much of
artificial intelligence had focused on high-level (symbolic) models that are processed by
using algorithms, characterized for example by expert systems with knowledge embodied
in if-then rules, until in the late 1980s research expanded to low-level (sub-symbolic)
machine learning, characterized by knowledge embodied in the parameters of a cognitive
model
Reinforcement learning
In reinforcement learning, data x are usually not given, but generated by an agent's
interactions with the environment. At each point in time t, the agent performs an action yt
and the environment generates an observation xt and an instantaneous cost ct, according
to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions
that minimizes some measure of a long-term cost, e.g., the expected cumulative cost. The
environment's dynamics and the long-term cost for each policy are usually unknown, but
can be estimated.
More formally the environment is modeled as a Markov decision process (MDP)
with states s1…,sn 𝜖𝜖 S and actions a1,…,am 𝜖𝜖 A with the following probability distributions:
the instantaneous cost distribution P( ct | st ), the observation distribution P( xt | st ) and
the transition P( st+1 | st , at ) while a policy is defined as the conditional distribution over
actions given the observations. Taken together, the two then define a Markov chain (MC).
The aim is to discover the policy (i.e., the MC) that minimizes the cost.
23
Optimization
The optimization algorithm repeats a two phase cycle, propagation and weight update.
When an input vector is presented to the network, it is propagated forward through the
network, layer by layer, until it reaches the output layer. The output of the network is then
compared to the desired output, using a loss function. The resulting error value is
calculated for each of the neurons in the output layer. The error values are then propagated
from the output back through the network, until each neuron has an associated error value
that reflects its contribution to the original output.
Backpropagation uses these error values to calculate the gradient of the loss function. In
the second phase, this gradient is fed to the optimization method, which in turn uses it to
update the weights, in an attempt to minimize the loss function.
24
References
1. T Schiex, H Fargier, G Verfaillie - IJCAI (1), 1995 “ Valued constraint satisfaction
problems: Hard and easy problems”.
2. E. G. Coffman, Jr., M. R. Garey, D. S. Johnson, and A. S. LaPaugh, “Scheduling
file transfers,” SIAM J. Comput., 14(3):744–780, 1985.
3. J. A. Hoogeveen, S. L. van de Velde, and B. Veltman, “Complexity of scheduling
multiprocessor tasks with prespecified processor allocations,” Discrete Appl. Math.,
55(3):259–272, 1994.
4. I. Holyer, “The NP-completeness of edge-coloring,” SIAM J. Comput., 10(4):718–
720, Nov. 1981.
5. Abbod, Maysam F (2007). "Application of Artificial Intelligence to the
Management of Urological Cancer". The Journal of Urology. 178 (4): 1150–1156.
doi:10.1016/j.juro.2007.05.122. PMID 17698099.
6. Zell, Andreas (1994). "chapter 5.2". Simulation Neuronaler Netze [Simulation of
Neural Networks] (in German) (1st ed.). Addison-Wesley. ISBN 978-3-89319-554-
1.
7. Dawson, Christan W (1998). "An artificial neural network approach to rainfall-
runoff modelling". Hydrological Sciences Journal. 43 (1): 47–66.
8. Ojha, Varun Kumar; Abraham, Ajith; Snášel, Václav (1 April 2017). "Metaheuristic
design of feedforward neural networks: A review of two decades of research".
Engineering Applications of Artificial Intelligence. 60: 97–116. arXiv:1705.05584
9. Hebb, Donald (1949). The Organization of Behavior. New York: Wiley. ISBN 978-
1-135-63190-1
10. Farley, B.G.; W.A. Clark (1954). "Simulation of Self-Organizing Systems by Digital
Computer". IRE Transactions on Information Theory. 4 (4): 76–84.
doi:10.1109/TIT.1954.1057468.
11. Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). "Tests on a cell
assembly theory of the action of the brain, using a large digital computer". IRE
Transactions on Information Theory.
12. Werbos, P.J. (1975). Beyond Regression: New Tools for Prediction and Analysis in
the Behavioral Sciences.
13. David H. Hubel and Torsten N. Wiesel (2005). Brain and visual perception: the
story of a 25-year collaboration. Oxford University Press US. p. 106. ISBN 978-0-
19-517618-6.
14. Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview".
Neural Networks. 61: 85–117. ArXiv:1404.7828
15. Ivakhnenko, A. G.; Grigorʹevich Lapa, Valentin (1967). Cybernetics and
forecasting techniques. American Elsevier Pub. Co.
16. Minsky, Marvin; Papert, Seymour (1969). Perceptrons: An Introduction to
Computational Geometry. MIT Press. ISBN 978-0-262-63022-1.
25
Chapter 2
Literature Survey
26
Over the past two decades, the feedforward neural network (FNN) optimization
has been a key interest among the researchers and practitioners of multiple
disciplines. The FNN optimization is often viewed from the various perspectives:
the optimization of weights, network architecture, activation nodes, learning
parameters, learning environment, etc. Researchers adopted such different
viewpoints mainly to improve the FNN's generalization ability. The gradient-
descent algorithm such as backpropagation has been widely applied to optimize the
FNNs. Its success is evident from the FNN's application to numerous real-world
28
8. Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). "Tests on a cell
assembly theory of the action of the brain, using a large digital computer". IRE
Transactions on Information Theory.
Theories by D.O. Hebb and P.M. Milner on how the brain works were tested by
simulating neuron nets on the IBM Type 704 Electronic Calculator. The formation
of cell assemblies from an unorganized net of neurons was demonstrated, as well
as a plausible mechanism for short-term memory and the phenomena of growth
and fractionation of cell assemblies. The cell assemblies do not yet act just as the
theory requires, but changes in the theory and the simulation offer promise for
further experimentation.
In recent years, deep artificial neural networks (including recurrent ones) have won
numerous contests in pattern recognition and machine learning. This historical
survey compactly summarizes relevant work, much of it from the previous
millennium. Shallow and Deep Learners are distinguished by the depth of their
credit assignment paths, which are chains of possibly learnable, causal links
between actions and effects. I review deep supervised learning (also recapitulating
the history of backpropagation), unsupervised learning, reinforcement learning &
evolutionary computation, and indirect search for short programs encoding deep
and large networks.
29
10. M.O. Berger, K-Colouring vertices using a neural network with convergence to
valid solutions, Proc. International Conf. on Neural Networks, 1994
This paper proposes a new algorithm using a maximum neural network model to
k -color vertices of a simple undirected graph. Unlike traditional neural nets, the
proposed network is guaranteed to converge to valid solutions with no parameter
tuning needed. The power of the new method to solve this NP- complete problem
will be shown in a number of simulations.
11. Takefuji, Y., and Lee, K.C., " Artificial Neural Network for Four-Coloring
Map Problems and K-Colorability Problems ", IEEE Trans. Circuits
Systems,vol.38,no.3,pp.325-333,Mar.1991
12. Bruck,J., and Goodman, J., "On the Power of Neural Networks for Solving
Hard Problems",Journal of Complexity,vol.6,pp.129-135,1990
This paper deals with a neural network model in which each neuron performs a
threshold logic function. An important property of the model is that it always
converges to a stable state when operating in a serial mode. This property is the
basis of the potential applications of the model such as associative memory devices
and combinatorial optimization. One of the motivations for use of the model for
solving hard combinatorial problems is the fact that it can be implemented by
optical devices and thus operate at a higher speed than conventional electronics.
The main theme in this work is to investigate the power of the model for solving
NP-hard problems and to understand the relation between speed of operation and
the size of a neural network. In particular, it will be shown that for any NP-hard
problem the existence of a polynomial size network that solves it implies that NP
= co-NP. Also, for the Traveling Salesman Problem (TSP), even a polynomial size
network that gets an e-approximate solution does not exist unless P = NP. The
above results are of great practical interest, because right now it is possible to build
neural networks which will operate fast but are limited in the number of neurons
they contain.
30
References
8. Rochester, N.; J.H. Holland; L.H. Habit; W.L. Duda (1956). "Tests on a cell
assembly theory of the action of the brain, using a large digital computer". IRE
Transactions on Information Theory.
10. M.O. Berger, K-Colouring vertices using a neural network with convergence to
valid solutions, Proc. International Conf. on Neural Networks, 1994
31
11. Takefuji, Y., and Lee, K.C., " Artificial Neural Network for Four-Coloring Map
Problems and K-Colorability Problems ", IEEE Trans. Circuits
Systems,vol.38,no.3,pp.325-333,Mar.1991
12. Bruck,J., and Goodman, J., "On the Power of Neural Networks for Solving Hard
Problems",Journal of Complexity,vol.6,pp.129-135,1990
32
Chapter 3
Graph Coloring
Techniques
33
To start, the algorithm takes an empty solution S = ϕ and an arbitrary permutation of the
vertices π. In each outer loop the algorithm takes the ith vertex in the permutation, πi, and
attempts to find a colour class Sj ∈ S into which it can be inserted. If such a colour class
currently exists in S, then the vertex is added to it and the process moves on to consider
the next vertex πi+1. If not, lines (8–9) of the algorithm are used to create new colour class
for the vertex.
Let us now estimate the computational complexity of the GREEDY algorithm with regard
to the number of constraint checks that are performed. We see that one vertex is coloured
at each iteration, meaning n = |π| iterations of the algorithm are required in total. At the
ith iteration (1 ≤ i ≤ n), we are concerned with finding a feasible colour for the vertex πi.
In the worst case this vertex will clash with all vertices that have preceded it in π, meaning
that (i−1) constraint checks will be performed before a suitable colour is determined.
Indeed, if the graph we are colouring is the complete graph Kn, the worst case will occur
34
for all vertices; hence a total of 0+1+2+. . .+(n−1) constraint checks will be performed.
This gives GREEDY an overall worst-case complexity O (n2).
As shown in the pseudocode, in each cycle of the algorithm (lines (3) to (19)), a number
of ants each produce a complete, though not necessarily feasible, solution. In line (16) the
details of each of these solutions are then added to a trail update matrix δ and, at the end
of a cycle, the contents of δ are used together with an evaporation rate ρ to update the
global trail matrix t. At the start of each cycle, each individual ant attempts to construct a
solution using the procedure BUILDSOLUTION. This is based on the RLF method
which, we recall, operates by building up each colour class in a solution one at a time.
Also recall that during the construction of each class Si ∈ S, RLF makes use of two sets:
X, which contains uncoloured vertices that can currently be added to Si without causing a
clash; and Y, which holds the uncoloured vertices that cannot be feasibly added to Si. The
modifications to RLF that BUILDSOLUTION employs are as follows:
• In the procedure a maximum of k colour classes is permitted. Once these have been
constructed, any remaining vertices are left uncoloured.
• The first vertex to be assigned to each colour class Si (1 ≤ i ≤ k) is chosen randomly
from the set X.
• In remaining cases, each vertex v is then assigned to colour Si with probability
𝛼𝛼 𝛽𝛽
𝜏𝜏𝑣𝑣𝑣𝑣 × 𝜂𝜂𝑣𝑣𝑣𝑣
𝑃𝑃𝑣𝑣𝑣𝑣 = �∑ 𝛼𝛼 𝛽𝛽 𝑖𝑖𝑖𝑖 𝑣𝑣 𝜖𝜖 𝑋𝑋
𝑢𝑢 𝜖𝜖 𝑋𝑋(𝜏𝜏𝑢𝑢𝑢𝑢 × 𝜂𝜂𝑢𝑢𝑢𝑢 )
0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
Note that the calculation of τvi makes use of the global trail matrix t, meaning that higher
values are associated with combinations of vertices that have been assigned the same
colour in previous solutions. The value ηvi, meanwhile, is associated with a heuristic rule
which, in this case, is the degree of vertex v in the graph induced by the set of currently
uncoloured vertices X ∪Y. Larger values for τvi and ηvi thus contribute to larger values for
Pvi, encouraging vertex v to be assigned to colour class Si. The parameters α and β are used
to control the relative strengths of τ and η in the equation.
swap neighbourhoods are called with probabilities 0.99 and 0.01 respectively. Finally,
when constructing the permutation of the vertices for passing to the GREEDY algorithm,
the independent sets are ordered using the same 5:5:3 ratio.[4]
Pseudo Code:
1. I=initial solution
2. While f(s)<= f(i) s 𝜖𝜖 Neighbours (i) do
3. Generates an s € Neighbours (i);
4. If fitness (s) > fitness (i) then
5. Replace s with the i;
6. End if
The most popular evolutionary algorithms for graph coloring are the classical
steady-state genetic algorithms. These coloring algorithms often use local search,
and so, the approach can also be regarded as an instance of memetic computing
. The population (referred to as Pop) is defined as a set of “individuals”
39
Pseudo Code :
The last decade has seen a surge of interest in integrating local search in genetic coloring
algorithms (see step 5 in the above schema), making the memetic approach [51] more and
more popular. Besides classical genetic algorithms, even more particular evolutionary
paradigms (crossover-free distributed search [8, 49], scatter search [29] or adaptive
memory algorithms [22]) do incorporate a local search coloring routine (often based on
TABUCOL [33] or on the k-fixed partial proper strategy [49]). In this section, we discuss
some key issues of evolutionary coloring algorithms like crossover design, population
dynamics and diversity, hybridizations with other search methods.
Pseudo code:
Let c(v) = NULL for any vertex v ∈ V not currently assigned to a colour class. Given such
a vertex v, the saturation degree of v, denoted by sat(v), is the number of different colours
assigned to adjacent vertices. That is, sat(v) = |{c (u) : u ∈ Γ (v)∧c(u) != NULL}|
DSATUR (S ← ϕ, X ←V)
(1) while X != ϕ do
(2) Choose v ∈ X
(3) for j ←1 to |S|
(4) if (Sj ∪{v}) is an independent set then
(5) Sj ←Sj ∪{v}
(6) break
(7) else j ← j+1
(8) if j > |S| then
(9) Sj ←{v}
(10) S ←S ∪Sj
(11) X ←X −{v}
It can be seen that the majority of the algorithm is the same as the GREEDY algorithm in
that once a vertex has been selected, a colour is found by simply going through each colour
class in turn and stopping when a suitable one has been found. Consequently, the worst-
case complexity of DSATUR is the same as GREEDY at O(n2), although in practice some
extra bookkeeping is required to keep track of the saturation degrees of the uncoloured
vertices.
41
References
1. R.M.R. Lewis “A Guide to Graph Colouring Algorithms and Applications”. ISBN
978-3-319-25730-3 pp 29-31
Chapter 4
Hopfield Network
Approach for k-Coloring
43
𝐼𝐼𝑖𝑖
𝑉𝑉𝑖𝑖
𝑢𝑢𝑖𝑖
The gradient descent method seeks the local minimum of a predefined Lyapunov
energy function E, which follows the quadratic form [3]
(4.1)
44
𝑁𝑁 𝑁𝑁 𝑁𝑁
1
Ε = − � � 𝑇𝑇𝑖𝑖𝑖𝑖 𝑉𝑉𝑖𝑖 𝑉𝑉𝑗𝑗 − � 𝑉𝑉𝑖𝑖 𝐼𝐼𝑖𝑖
2
𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖=1
where N is the number of neurons in the system, 𝑇𝑇𝑖𝑖𝑖𝑖 is the symmetrical (𝑇𝑇𝑖𝑖𝑖𝑖 = 𝑇𝑇𝑗𝑗𝑗𝑗 ) synapse
weight between the 𝑖𝑖 𝑡𝑡ℎ and the 𝑗𝑗 𝑡𝑡ℎ neurons. The output 𝑉𝑉𝑖𝑖 follows the nondecreasing
function
1 (4.2)
ℎ(𝑢𝑢𝑖𝑖 ) = (tanh(𝜆𝜆𝑢𝑢𝑖𝑖 ) + 1)
2
where 𝜆𝜆 is the gain of the sigmoid function. The rate of change of the internal state or
motion equation of the 𝑖𝑖 𝑡𝑡ℎ neuron given by [3] is
But the effectiveness of this sigmoid model is not guaranteed [4] [5].
𝑢𝑢𝑖𝑖
Takefuji [6] proved that the decay term − in eq.(4.3) is harmful, as it increases the
𝜏𝜏
energy function Ε under certain circumstances. Removing it from eq.(4.3) solves the
problem. That is, the energy function Ε is forced to decrease if the decay term is removed.
In his book, Takefuji presents a number of different neural network models without the
decay term that have been successfully used for several optimization problem.
𝑢𝑢𝑖𝑖
where in eq.(4.3) is the controversial decay term. Note that 𝜏𝜏 is a constant parameter.
𝜏𝜏
In order to mathematically prove the use of the decay term in the motion equation
harmful, the conditions to increase the fabricated energy function E instead of decreasing
it are given in Theorem 4.1.
Theorem 4.1 The use of the decay term in eq.(4.3) increases the computational energy E
when
𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑖𝑖
� ∑𝑖𝑖 𝑑𝑑𝑑𝑑 𝜏𝜏
� > � ∑𝑖𝑖 � � � 𝑑𝑑𝑑𝑑 � � and if either ( 𝑢𝑢𝑖𝑖 > 0 𝑎𝑎𝑎𝑎𝑎𝑎 < 0 ) or
𝑑𝑑𝑑𝑑 𝑖𝑖 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑𝑖𝑖
� 𝑢𝑢𝑖𝑖 < 0 𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑑𝑑
> 0 � is satisfied.
Proof:
Consider the derivatives of the computational energy E with respect to time t.
𝑑𝑑Ε 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε 𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε
= ∑𝑖𝑖 𝑑𝑑𝑑𝑑 𝜕𝜕𝑉𝑉𝑖𝑖
= ∑𝑖𝑖
𝑑𝑑𝑑𝑑
�−
𝜏𝜏
−
𝑑𝑑𝑑𝑑
� where is replaced by
𝑑𝑑𝑑𝑑 𝜕𝜕𝜕𝜕𝑖𝑖
𝑢𝑢𝑖𝑖 𝑑𝑑𝑢𝑢𝑖𝑖
�− − � from eq.(4.3)
𝜏𝜏 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 2
The second term − ∑𝑖𝑖 � �� � is always negative or zero, because the output
𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑
𝑉𝑉𝑖𝑖 = 𝑓𝑓(𝑢𝑢𝑖𝑖 ) is a nondecreasing function. The following condition can be true:
𝑑𝑑𝑑𝑑𝑖𝑖 𝑢𝑢𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 2
− ∑𝑖𝑖 − ∑𝑖𝑖 � �� � >0
𝑑𝑑𝑑𝑑 𝜏𝜏 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖
is satisfied : �𝑢𝑢𝑖𝑖 > 0 𝑎𝑎𝑎𝑎𝑎𝑎 < 0� or � 𝑢𝑢𝑖𝑖 < 0 𝑎𝑎𝑎𝑎𝑎𝑎 > 0 � . Under such a
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
𝑑𝑑Ε
condition the derivatives of Ε with respect to time t must be positive; > 0.
𝑑𝑑𝑑𝑑
Therefore, the decay term increases the energy function under such conditions, which
contradicts the conventional convergence theorem.
The harmfulness of using the decay term in the motion equation can be easily tested
by empirical simulation with varying the value of' 𝜏𝜏. In order to satisfy the convergence to
the local minimum, we must eliminate the decay term from the motion equation. Theorem
4.2 states that the computational energy function E monotonically decreases regardless of
the condition of the symmetry and diagonal constraints in the conductance matrix as long
as the neurons obey a nondecreasing function and the motion equation of the 𝑖𝑖 𝑡𝑡ℎ neuron
is given by:
𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε
=− (4.4)
𝑑𝑑𝑑𝑑 𝜕𝜕𝑉𝑉𝑖𝑖
𝑑𝑑Ε 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε
Theorem 4.2 ≤ 0 is satisfied under two conditions (1) = − and
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝜕𝜕𝑉𝑉𝑖𝑖
Proof:
𝑑𝑑Ε 𝑑𝑑𝑑𝑑𝑖𝑖 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕Ε
= ∑𝑖𝑖
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑖𝑖 𝜕𝜕𝜕𝜕𝑖𝑖
Theorem 4.2 guarantees the convergence of the continuous system. However, the system
is usually simulated on the digital computer. There always exist errors between the real
values and the quantized one. For example, the discrete sigmoid function must be used in
the digital computer instead of the continuous one.
47
𝑀𝑀 𝑁𝑁 𝑀𝑀 𝑁𝑁 𝑀𝑀 𝑁𝑁
1
Ε = − � � � � 𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚′𝑗𝑗 𝑉𝑉𝑚𝑚𝑚𝑚 𝑉𝑉𝑚𝑚′𝑖𝑖 − � � 𝑉𝑉𝑚𝑚𝑚𝑚 𝐼𝐼𝑚𝑚𝑚𝑚 (4.5)
2 ′ 𝑚𝑚=1 𝑖𝑖=1 𝑚𝑚 =1 𝑗𝑗=1 𝑚𝑚=1 𝑖𝑖=1
Only the one neuron in each cluster with the maximum state will have nonzero output, if
there is more than one neuron with the same maximum input in any cluster, the neuron
with the smallest subscript has nonzero output. The output of the other neurons in the
same cluster become zero, so that always one and only one neuron in each cluster has
nonzero output. The input/output function of the 𝑖𝑖 𝑡𝑡ℎ maximum neuron in the 𝑚𝑚𝑡𝑡ℎ cluster
is defined as
1 ∶ 𝑖𝑖𝑖𝑖 𝑢𝑢𝑚𝑚𝑚𝑚 = 𝑚𝑚𝑚𝑚𝑚𝑚{𝑢𝑢𝑚𝑚1 , 𝑢𝑢𝑚𝑚2 , … , 𝑢𝑢𝑚𝑚𝑚𝑚 } 𝑎𝑎𝑎𝑎𝑎𝑎 𝑢𝑢𝑚𝑚𝑚𝑚 ≥ 𝑢𝑢𝑚𝑚𝑚𝑚 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 > 𝑗𝑗
𝑉𝑉𝑚𝑚𝑚𝑚 = �
0 ∶ 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 (4.6)
The convergence conditions for the maximum neural model to a local minimum of the
energy function Ε are given by the following theorem.
(2) 𝑉𝑉𝑚𝑚𝑚𝑚 = 1 and 𝑢𝑢𝑚𝑚𝑚𝑚 = 𝑚𝑚𝑚𝑚𝑚𝑚{𝑢𝑢𝑚𝑚1 , 𝑢𝑢𝑚𝑚2 , … , 𝑢𝑢𝑚𝑚𝑚𝑚 } 𝑎𝑎𝑎𝑎𝑎𝑎 𝑢𝑢𝑚𝑚𝑚𝑚 ≥ 𝑢𝑢𝑚𝑚𝑚𝑚 𝑓𝑓𝑓𝑓𝑓𝑓 𝑖𝑖 > 𝑗𝑗 , 0
otherwise.[ 6]
Proof:
Consider the derivatives of the computational energy Ε with respect to time t.
48
𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
= −��� �
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
𝑚𝑚 𝑖𝑖
𝜕𝜕Ε 𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖
where is replaced by −
𝜕𝜕𝑉𝑉𝑚𝑚,𝑖𝑖 𝑑𝑑𝑑𝑑
(condition 1)
𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 𝑢𝑢𝑚𝑚,𝑖𝑖 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑖𝑖 (𝑡𝑡)
Let be .
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
Let us consider the term ∑𝑖𝑖 � � for each module separately.
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
Let 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) be the maximum at time 𝑡𝑡 + 𝑑𝑑𝑑𝑑 and 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) be the maximum at
time t for the module m.
𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) = max{𝑢𝑢𝑚𝑚,1 (𝑡𝑡 + 𝑑𝑑𝑑𝑑), 𝑢𝑢𝑚𝑚,2 (𝑡𝑡 + 𝑑𝑑𝑑𝑑), 𝑢𝑢𝑚𝑚,3 (𝑡𝑡 + 𝑑𝑑𝑑𝑑), 𝑢𝑢𝑚𝑚,4 (𝑡𝑡 + 𝑑𝑑𝑑𝑑)}
𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) = max{𝑢𝑢𝑚𝑚,1 (𝑡𝑡), 𝑢𝑢𝑚𝑚,2 (𝑡𝑡), 𝑢𝑢𝑚𝑚,3 (𝑡𝑡), 𝑢𝑢𝑚𝑚,4 (𝑡𝑡)}
It is necessary and sufficient to consider the following two cases :
1) 𝑎𝑎 = 𝑏𝑏
2) 𝑎𝑎 ≠ 𝑏𝑏
If the condition 1) is satisfied, then there is no state change for the module m.
𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
Consequently, ∑𝑖𝑖 � � must be zero.
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖
49
If 2) is satisfied, then
𝑑𝑑𝑢𝑢𝑚𝑚,𝑖𝑖 2 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡) 2 𝑉𝑉𝑚𝑚,𝑎𝑎 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑉𝑉𝑚𝑚,𝑎𝑎 (𝑡𝑡)
∑𝑖𝑖 � � = � �
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑𝑚𝑚,𝑖𝑖 𝑑𝑑𝑑𝑑 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡+𝑑𝑑𝑑𝑑)− 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡)
1
= { 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) − 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡) − 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) + 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) }
(𝑑𝑑𝑑𝑑)2
1
= { 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) − 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) + 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) − 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡) }
(𝑑𝑑𝑑𝑑)2
>0
because 𝑢𝑢𝑚𝑚,𝑎𝑎 (𝑡𝑡 + 𝑑𝑑𝑑𝑑) is the maximum at time 𝑡𝑡 + 𝑑𝑑𝑑𝑑 and 𝑢𝑢𝑚𝑚,𝑏𝑏 (𝑡𝑡) is the maximum at
time t for the module m.
The contribution from each term is either 0 or positive, therefore
The termination condition of the net is given as follows: As long as the system
reaches a stable state or an equilibrium state, the procedure will terminate. The equilibrium
50
state of the maximum neural model is defined as all firing neurons having the smallest rate
of change of the input per cluster. In contrast to existing Hopfield neural networks, where
the condition of the system convergence has never been clearly defined, the condition of
system convergence has never been clearly defined, the condition of the equilibrium state
for the maximum neural model is given by:
The convergence proof of a Hopfield net to a stable state when simulated on a computer
is proven by Rojas [7].
Theorem 4.4 A Hopfield net with n neurons reaches equilibrium when simulated using
asynchronous update starting from arbitrary input states.
Proof:
For a vector = (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ) , a vector 𝑦𝑦 = (𝑦𝑦1 , 𝑦𝑦2 , … , 𝑦𝑦𝑘𝑘 ) and an 𝑛𝑛 × 𝑘𝑘 weight
matrix 𝐖𝐖 = �𝑤𝑤𝒊𝒊𝒊𝒊 � the energy function is the bilinear form
The value of Ε(𝑥𝑥, 𝑦𝑦) can be computed by multiplying first 𝐖𝐖 by 𝑦𝑦 𝑇𝑇 and the result with
𝑥𝑥
− . The product of the i-th row of W and 𝑦𝑦 𝑇𝑇 represents the excitation of the i-th unit
2
in the left layer. If we denote these excitations by 𝑔𝑔1 , 𝑔𝑔2 , … , 𝑔𝑔𝑛𝑛 the above expression
transforms to
𝑔𝑔1
1 𝑔𝑔2
Ε(𝑥𝑥, 𝑦𝑦) = − (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ) � ⋮ �
2
𝑔𝑔𝑘𝑘
51
We can also compute Ε(𝑥𝑥, 𝑦𝑦) multiplying first x by W. The product of the i-th column
of W with x corresponds to the excitation of unit 𝑖𝑖 in the right layer. If we denote these
excitations by 𝑒𝑒1 , 𝑒𝑒2 , … , 𝑒𝑒𝑘𝑘 , the expression for Ε(𝑥𝑥, 𝑦𝑦) can be written as
𝑦𝑦1
1 𝑦𝑦2
Ε(𝑥𝑥, 𝑦𝑦) = − (𝑒𝑒1 , 𝑒𝑒2 , … , 𝑒𝑒𝑘𝑘 ) � ⋮ �.
2
𝑦𝑦𝑘𝑘
Therefore, the energy function can be written in the two equivalent forms
𝑘𝑘 𝑘𝑘
1 1
Ε(𝑥𝑥, 𝑦𝑦) = − � 𝑒𝑒𝑖𝑖 𝑦𝑦𝑖𝑖 𝑎𝑎𝑎𝑎𝑎𝑎 Ε(𝑥𝑥, 𝑦𝑦) = − � 𝑔𝑔𝑖𝑖 𝑥𝑥𝑖𝑖 .
2 2
𝑖𝑖=1 𝑖𝑖=1
In asynchronous networks at each time t we randomly select a unit from the left or right
layer. The excitation is computed and its sign is the new activation of the unit. If the
previous activation of the unit remains the same after this operation, then the energy of
the network has not changed. The state of unit 𝑖𝑖 on the left layer will change only when
the excitation 𝑔𝑔𝑖𝑖 has a different sign than 𝑥𝑥𝑖𝑖 , the present state. The state is updated from
𝑥𝑥𝑖𝑖 to 𝑥𝑥𝑖𝑖′ , where 𝑥𝑥𝑖𝑖′ now has the same sign as 𝑔𝑔𝑖𝑖 . Since the other units do not change their
state, the difference between the previous energy Ε(𝑥𝑥, 𝑦𝑦) and the new energy Ε(𝑥𝑥′, 𝑦𝑦) is
1
Ε(𝑥𝑥, 𝑦𝑦) − Ε(𝑥𝑥 ′ , 𝑦𝑦) = − 𝑔𝑔𝑖𝑖 (𝑥𝑥𝑖𝑖 − 𝑥𝑥𝑖𝑖′ ).
2
Since both 𝑥𝑥𝑖𝑖 and −𝑥𝑥𝑖𝑖 have a different sign than 𝑔𝑔𝑖𝑖 it follows that
whenever the state of a unit in the right layer has been flipped.
52
Any update of the network state reduces the total energy. Since there are only a
finite number of possible combinations of bipolar states, the process must stop at some
point, that is, a state (𝑎𝑎, 𝑏𝑏) is found whose energy cannot be further reduced. The network
has fallen into a local minimum of the energy function and the state (𝑎𝑎, 𝑏𝑏)is an attractor
of the system.
If a Hopfield net is simulated using synchronous update (as in [8] , the network will
eventually show oscillatory behaviour and not converge to a minimum. The net will fall
into the so called “cycle trap”. Theorem 4.4 guarantees convergence of the net and thus
termination of the algorithm.
Applying the previous theorem to the maximum neural net leads to lemma 4.1. If
a problem is mapped on a maximum neural net, so that a solution requires one and only
one neuron to fire in each cluster, a stable state will always represent a valid solution. The
maximum net is guaranteed to solve a problem, when Theorem 4.3 and Theorem 4.4 are
applied.
Lemma 4.1 A maximum neural net is guaranteed to converge to a valid solution of
problem L, if the solution requires one and only one neuron per cluster to fire.
(that is done in polynomial time). In case it is a local maximum, we check if the instance
is a 'yes' or a 'no' instance (this is also done in polynomial time).
Thus, we have a nondeterministic polynomial time algorithm to recognize any
'no' instance of L� . Thus, the complement of the problem L� is in NP. But L� is an NP-
complete problem; hence, from Lemma 4.2 it follows that NP = co-NP.
Even if the conditions of previous theorem are relaxed, there is no way to perform better.
1. Define a set of variables that can take only the values 0 or 1 representing
possible solutions for the problem.
2. Create one neuron for each variable.
3. Translate the optimization criteria into cost function using the variables as
defined in step 1.
4. Translate the cost functions into an energy function E.
5. Derive from E the coupling weights 𝑇𝑇𝑖𝑖𝑖𝑖 and the external input 𝐼𝐼𝑖𝑖 .
6. Apply the motion equation, starting from an arbitrary initial state.
7. Stop computation, when equilibrium state is reached, else go to the previous
step.
8. Interpret the result according to the model.
colors
vertices
Figure 4.2 Mapping the graph coloring problem onto neural net.
3. The optimization criteria are:
|𝑉𝑉| 𝑘𝑘 2
which is positive if the first constraint is violated and zero if not. The second
criterion can be expressed as cost function
|𝑉𝑉| |𝑉𝑉| 𝑘𝑘
where 𝑑𝑑𝑚𝑚𝑚𝑚′ is the 𝑚𝑚𝑚𝑚′ th entry in the adjacency matrix D of G. 𝐸𝐸2 is positive
if two adjacent vertices are colored with the same color and zero if a G has a
valid coloring. Note that the formulation of the criteria as cost functions is not
unique.
55
4. When using a maximum neural model, the first constrain 𝐸𝐸1 can be eliminated,
as the model itself already has the behaviour that one and only one neuron is
firing in each cluster of the network. Therefore, the computational energy E is
given by
|𝑉𝑉| |𝑉𝑉| 𝑘𝑘
1 1
𝐸𝐸 = 𝐸𝐸2 = � � � 𝑑𝑑𝑚𝑚𝑚𝑚′ 𝑉𝑉𝑚𝑚𝑚𝑚 𝑉𝑉𝑚𝑚′ 𝑖𝑖 (4.11)
2 2 ′ 𝑚𝑚=1 𝑚𝑚 =1 𝑖𝑖=1
𝑚𝑚′ ≠𝑚𝑚
5. The coupling weights and the external inputs are then defined as follows:
𝐼𝐼𝑚𝑚𝑚𝑚 = 0 (4.13)
where 𝛿𝛿 denotes Kronecker’s delta. 𝑇𝑇𝑚𝑚𝑚𝑚𝑚𝑚′ 𝑗𝑗 inhibits connections within each
row of the neuron matrix.
6. Starting with random 𝑢𝑢𝑚𝑚𝑚𝑚 , the motion equation is computed until equilibrium
state is reached. The motion equation is written as (decay term removed)
|𝑉𝑉|
𝑑𝑑𝑑𝑑𝑚𝑚𝑚𝑚 𝜕𝜕𝜕𝜕
=− = − � 𝑑𝑑𝑚𝑚𝑚𝑚′ 𝑉𝑉𝑚𝑚′𝑖𝑖 (4.14)
𝑑𝑑𝑑𝑑 𝜕𝜕𝑉𝑉𝑚𝑚𝑚𝑚 ′ 𝑚𝑚 =1
𝑚𝑚′ ≠𝑚𝑚
𝑊𝑊𝑖𝑖 = { 𝑚𝑚 𝜖𝜖 𝑉𝑉 |𝑉𝑉𝑚𝑚𝑚𝑚 = 1}
and 𝑉𝑉𝑚𝑚𝑚𝑚 indicates the coloring as defined in eq.(4.8). Note that 𝑊𝑊𝑖𝑖 ≠ ∅ not
necessarily holds.
56
The discrete network model described in the previous section was simulated on a
sequential computer using parallel asynchronous update. The internal states, neuron
outputs and motions were represented as matrices. The pseudo-code for the algorithm is
as follows ( 𝑚𝑚 ∈ { 1, 2, … , |𝑉𝑉|}, 𝑖𝑖 ∈ {1,2, … , 𝑘𝑘} ) .
𝑑𝑑𝑑𝑑𝑚𝑚𝑚𝑚
3. Compute rate of change matrix ∆ 𝑈𝑈 ∶= �� �� ∀ 𝑚𝑚, 𝑖𝑖 using eq.(4.14).
𝑑𝑑𝑑𝑑
5. Using the first order Euler method, assign 𝑈𝑈 ∶= 𝑈𝑈 + ∆𝑈𝑈 using asynchronous
update.
6. Go to step 2.
If the graph is 𝑘𝑘-colorable, the algorithm terminates with a valid solution, which is
guaranteed by the model (lemma 4.1). Thus a 100% convergence rate to the global
minimum is given. The coloring can be found in matrix 𝑉𝑉 according to step 8 given in
Neural Network Representation section.
57
References
1. Simon Haykin (1999), NEURAL NETWORKS A Comprehensive Foundation
Second Edition, Pearson Education,p-701.ISBN 81-7808-300-0.
8. Takefuji, Y., and Lee, K.C., " Artificial Neural Network for Four-Coloring
Map Problems and K-Colorability Problems ", IEEE Trans. Circuits
Systems,vol.38,no.3,pp.325-333,Mar.1991.
9. Bruck,J., and Goodman, J., "On the Power of Neural Networks for Solving
Hard Problems",Journal of Complexity,vol.6,pp.129-135,1990.
58
Chapter 5
Proposed Approach
59
The method proposed by M.O.Berger[1] give 100% convergence of the network to a valid
solution. Whenever their proposed system converges, the corresponding configuration is
always forced to be a valid solution, while none of the existing neural network can
guarantee it. However, M.O.Berger (their ) method takes huge amount of time to converge
to a valid solution, which is one of its drawback. In their method they process each cluster
one by one sequentially without any order or priority given to any cluster and check
whether the cluster reach equilibrium or not.
In our approach, we proposed a new modified version of M.O.Berger’s method [1]
and try to make the network converge faster to a valid solution. We take the priority of
each cluster into consideration using the degree of each vertex. The vertex with the
highest degree is processed first, and checked for equilibrium. In this way, we process each
cluster in decreasing order of their vertex degree. The vertex with the lowest degree is
processed last.
5.2 Pseudo-Code
The internal states, neuron outputs and motions were represented as matrices. The degrees
of vertices are stored in an array. The pseudo-code for the algorithm is as follows
(𝑚𝑚 ∈ {1,2, … , |𝑉𝑉|), 𝑖𝑖 ∈ {1,2, … , 𝑘𝑘})
1. Compute Adjacency Matrix of graph 𝐺𝐺.
2. Compute degree array D such that 𝐷𝐷𝑚𝑚 = 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑(𝑉𝑉𝑚𝑚 ).
3. Sort D such that 𝐷𝐷𝑖𝑖 ≥ 𝐷𝐷𝑗𝑗 ∀ 𝑖𝑖 < 𝑗𝑗 where 𝑗𝑗 ∈ {1,2, … , |𝑉𝑉|} and 𝑖𝑖 ≠ 𝑗𝑗.
4. Set 𝑘𝑘 = Δ(𝐺𝐺) + 1
index = 1
5. Fill internal state matrix 𝑈𝑈 ≔ �(𝑢𝑢𝑚𝑚𝑚𝑚 )� ∀ 𝑚𝑚, 𝑖𝑖 with random integers.
6. Compute output matrix 𝑉𝑉 ≔ �(𝑉𝑉𝑚𝑚𝑚𝑚 )� ∀ 𝑚𝑚, 𝑖𝑖 using eq.(4.6).
𝑑𝑑𝑑𝑑𝑚𝑚𝑚𝑚
7. Compute rate of change matrix ∆ 𝑈𝑈 ∶= �� �� ∀ 𝑚𝑚, 𝑖𝑖 using eq.(4.14).
𝑑𝑑𝑑𝑑
8. While 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 ≤ |𝑉𝑉|
if equilibrium is reached for 𝑉𝑉𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 then,
increment 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 ≔ 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 + 1
else
(a) Using the first order Euler method, assign U∶=U+ ∆U using
asynchronous update.
(b) Repeat Step 6 and 7
(c) Go to Step 8.
In the pseudo-code, Step 1 is used to generate the adjacency matrix of a given graph which
is further used for computing rate of change matrix Δ𝑈𝑈. In Step 2 we compute degree of
each vertex and store it in an array D. Next in Step 3 we sort the elements of array D in
60
non-increasing order. In Step 4 we set 𝑘𝑘 = Δ(𝐺𝐺) + 1 i.e., the chromatic number of a graph
required to color a graph is at most the maximum degree of graph plus 1. [2] and set index
=1. index is used as a pointer to the degree array, which will select the vertices with highest
degree from the degree array. In Step 5 we create a internal state 𝑈𝑈 matrix of size |𝑉𝑉| × k
and fill it with random integers. Then in Step 6 we compute output matrix 𝑉𝑉 from internal
state matrix 𝑈𝑈 using eq.(4.6) i.e., for each row in 𝑉𝑉, 𝑉𝑉 = 1 if the element with same position
as 𝑉𝑉 in 𝑈𝑈 is the maximum element, otherwise 𝑉𝑉 = 0 for that row. Next in Step 7 we
compute rate of change matrix Δ𝑈𝑈 using eq.(4.14) i.e., taking the negative of matrix
multiplication of adjacency matrix, D and output matrix V. In step 8 we select a vertex
from degree array and check if the cluster corresponding to this vertex is in equilibrium or
not. If the cluster is in equilibrium, we increment index pointer and select the next vertex
from degree array. If the cluster in output matrix is not in equilibrium we randomly select
any element from rate of change matrix (from the same cluster position as in output
matrix) and update internal state matrix. Then we again repeat the Step 6 and 7 and
continue this process until this cluster reaches equilibrium. This process is done for all the
clusters (or vertices).
Our method which uses priority (degree of vertices) for selecting which cluster to
process first, has a tendency to converge faster than the one used by M.O.Berger [1] since
the number of constraint checks reduces when processing clusters based on their degree of
corresponding vertices. In a graph where a number of vertices are connected to a centre
vertex, if we process all the other vertices before processing the centre one, the number of
constraint checks increases because the centre vertex has to check with all the other vertices
before reaching equilibrium state. On the other hand, if we process the centre vertex
(which has highest degree) and then all the other vertices, the number of constraint checks
decreases because all the other vertices will have fewer number of constraint checks than
the previously mentioned case.
61
References
Chapter 6
Comparative Study
63
Comparative Study
The algorithm was implemented and tested using C++ on an Intel(R) Core(TM) i7-
7700HQ CPU @2.80GHz processor on Ubuntu System. Various DIMACS graphs were
used to test and compare M.O.Berger [1] approach and our modified approach.
90
81.261
80
70
60
Time in seconds
50 44.856
40
31.492
28.646
30
22.872
19.832
20
10
0
myciel5.col queen6_6.col queen7_7.col
Table 6.1
For myciel5.col our method converges 13.29% faster, queen6_6.col converges 9.05%
faster and queen7_7.col converges 44.80% faster.
64
250
212.576
200
155.58
Time in seconds
150
100
50
Table 6.2
For david.col our method converges 26.81% faster, huck.col converges 2.19% faster and
queen5_5.col converges 53.30% faster.
0.25
0.202
0.2
0.168
Time in seconds
0.15
0.1
0.05
0.007 0.006
0
K4.col myciel3.col
Table 6.3
For K4.col which is a complete graph with 4 nodes our method converges 14.28% faster,
myciel3.col converges 16.83% faster.
800
694.598
700 672.938
600
Time in seconds
500
388.923
400
339.182
300
238.362
215.749
200
100
0
games120.col anna.col queen8_8.col
Table 6.4
For games120.col our method converges 3.11% slower, anna.col converges 12.78%
slower and queen8_8.col converges 9.48% slower.
66
References
1. M.O. Berger, K-Colouring vertices using a neural network with convergence to
valid solutions, Proc. International Conf. on Neural Networks, 1994.
67
Chapter 7
Conclusion
68
Conclusion
The vertex coloring problem is one of the difficult problems in graph theory. The problem
is modelled in a Hopfield net divided into |V| clusters each with k maximum neurons.
The general principle of mapping optimizations problems as well the particular neural
representation is presented in Chapter 4.
Our proposed method always terminates with a valid solution and has a guaranteed
100% convergence rate to the global minimum of the energy function. Also, our proposed
method converges faster for most of the graphs than the pre-existing method discussed in
Chapter 4. However, for some cases our method was slower but not by large margin. To
verify that our proposed method was applied to a number of example graphs taken from
DIMACS graph coloring dataset.