You are on page 1of 14

Peterson’s algorithm in a multi agent database system

Vikram Goenka
vikramg@students.cs.mu.oz.au

Research Paper for 433-481


Knowledge representation and reasoning
July 2, 2003

Abstract
Distributed database systems are a common thing in today’s wired world.
These systems have the limitation of using a centralized knowledge base
though. This paper suggests a model for distributing this knowledge over a
multi agent setting, where knowledge is distributed and embodied among
multiple agents.
Data concurrency and consistency are critical in such multi-user database
systems. Thus, I go on to propose a simple yet efficient model for
implementing concurrency in such a system, while maintaining mutual
exclusion to ensure data consistency.
Keywords
Peterson’s algorithm, database, BDI agents

1
1. Introduction
“Distributed systems are everywhere.”[1] What makes a system distributed though? Such systems
allow their applications to run over several processors to benefit from the speed-ups of parallel
computing. Distributed systems are also characterized by their ability to provide increased scalability,
failure handling, concurrency and security.
A limitation of such systems however, is the need for knowledge to be centralized. Take distributed
database systems for example, most leading vendors, IBM and Oracle included provide users with
distributed versions of their systems. A common feature with all such systems however, is a centralized
arbitrator or knowledge management software known as the Distributed Database Management System
(D-DBMS).
Figure 1 adapted from [3] gives us a clearer picture of this setup.

DISTRIBUTED DBMS
Global
(D-DBMS)
schema

DBMS DBMS DBMS

Database Database Database

NODE A NODE B NODE C

Figure 1: Distributed database environment


Such a system uses its centralized command structure to maintain concurrency and consistency in a
multi-user environment. Each node can function individually, but in order to communicate with another
node it must route its communication through the D-DBMS.
Distributing not only the computation and functionality, but also distributing knowledge in such a
system is what we discuss in the paper. A model for a multi-agent DBMS based on the BDI model of
agency is proposed, where knowledge is distributed and embodied among multiple agents. Figure 2
illustrates this idea.

-2-
COMMUNICATION NETWORK

DBMS DBMS DBMS

Database Database Database

AGENT A AGENT B AGENT C

Figure 2: Multi-agent DBMS (MAD)


The diagram shows the system being modelled, which is a collection of agents, each possessing its own
database (DB) and database management system (DBMS). A communication medium is shared by the
agents to facilitate communication and data sharing.
2. Background
“Autonomous execution is central to agency.”[2] Thus, it is important for each agent in our system to
possess its own DBMS which is autonomous in all respects. Dependency on a distributed DBMS in
order to gain access to data on another agent’s database would reduce the autonomy of these agents.
The system being modelled is essentially a multi-agent environment where agents will each have local
data that is unique to environment and cannot be found on another agent. This also means that there is
no replication of data unlike traditional synchronous or asynchronous distributed databases [3].
Data will be partitioned vertically in our model. This means columns (parts of a table) will be placed on
agents based on relevance. For example, a relation for a machine can be vertically partitioned so that
columns used primarily by the manufacturing department can be distributed to its computer, while the
rest of the columns can be distributed to the engineering department’s computer.
In a multi-user database, multiple statements inside different transactions could attempt to update the
same data. This could lead to the data becoming inconsistent. This is undesirable, and commercial
systems use rather complex techniques to handle concurrent transactions. Another issue is determining
the location of remote data.
In a traditional setup these issues are handled by the different subsystems of a distributed DBMS which
sits on top of all the autonomous DBMS’s to manage remote or distributed transactions. In the absence
of such a piece of software, other means and algorithms must be thought of to handle the issues
mentioned above.
The problem this paper will look into is that of concurrency control, a function which is handled by the
concurrency-control manager within the transaction management subsystem in a traditional DBMS [5].
It is assumed that the agents already know where to find the data they need.
A naïve concurrency-control manager could use a simple mutual exclusion algorithm like Peterson’s
[7, 9]. I will in this paper attempt to adapt this algorithm to a multi-agent setting. This would involve
adapting the standard two process algorithm to an n-process algorithm. In doing so, I will also improve
throughput by working on the busy-waiting characteristic of Peterson’s algorithm.

-3-
2.1 Mutual exclusion and Peterson’s algorithm
Mutual exclusion is a way of making sure that while one process is using a piece of shared data, no
other process does the same. Several solutions have been put forward to resolve this problem. They all
revolve around the concept that while one process is using a shared resource in its critical region, “no
other process will enter its critical region and cause trouble” [6]. The critical section refers to the part
of the program where the resource in contention is accessed. In our case, this would be when an agent
attempts to query a table, since all tables are shared data.
T. Dekker, a Dutch mathematician, was one of the first to provide a software solution to this problem in
1965 [6]. He achieved this using the idea of flags and turn variables. Then, in a remarkable 1981 paper
of less than two pages, G. L. Peterson developed and proved several versions of his algorithm for
mutual exclusion [4].
It is a three phase algorithm which involves an initialisation phase where the shared variables are
declared and initialised. Before using the shared resource (entering the critical region) a process must
go through an entry protocol. This ensures that it waits, if required, till the resource is available. On
finishing the process must go through the exit protocol to allow other processes to access the resource.
The pseudo-code for Peterson’s algorithm where there are two processes A and B, is as follows.
Initialisation Phase:
shared flags[n -1];
shared turn;
turn = A;
flags[A] = FREE;
flags[B] = FREE;
Entry Protocol (for Process A):
// claim the resource
flags[A] = BUSY;
// give away the turn
turn = B;
// wait while the other process is using the resource *and* has the turn
while ((flags[B] == BUSY) && (turn == B)) {;}
// use resource
Exit Protocol (for Process A):
// release the resource
flags[A] = FREE;
As we can see both the turn variable and the status flags are used. After setting its flag process A
immediately gives away the turn to B. Process A then waits for the turn if and only if the other flag is
set. By waiting on the conjunction of the two conditions, the process avoids entering the critical section
when it should not.
The mutual exclusion requirement is assured. Suppose instead that both processes are in their critical
section. Since only one can have the turn, the other must have reached the while test before the process
with the turn set its flag. But after setting its flag, the other process had to give away the turn. The
process at the while test has already changed the turn and will not change it again, contradicting our
assumption.
Also, the turn variable is only considered when both processes are using, or trying to use, the resource.
This ensures that a process does not wait unnecessarily. Deadlocks are not possible because in a
scenario where both processes are testing the while condition, one of them must have the turn. That
process will proceed.

-4-
Finally, bounded waiting is assured. When a process that has exited the critical region re-enters, it will
give away the turn. If the other process is already waiting, it will be the next to proceed.
So we see that Peterson’s algorithm provides us with mutual exclusion along with other benefits such
as bounded waiting and no deadlocks. One shortcoming of this method is that, it makes a process poll
continuously to check whether the other process has set the flag or not. This busy-waiting behaviour
can prove to be extremely damaging to a systems throughput. Later in this paper I will suggest means
of improving on this characteristic.
2.2 Why BDI agents?
The BDI (Belief Desire Intention) model is a simple yet efficient way of modelling such scenarios.
In our case, each autonomous agent has a database and a DBMS. Along with this each agent has:
• Beliefs – of the time other agents take to process their queries on a particular table in their
database.
• Desires – of providing mutual exclusion and achieving maximum throughput through smart
scheduling, or providing FIFO behaviour.
• Intentions – or plans of predetermined sequences of actions that can guide it in accomplishing
its desires.
Modelling our system this way means that each agent senses the environment around it, receiving
messages from other agents. On doing so it updates its beliefs, if there are new queries from other
agents. Following this the agent deliberates about what intention to achieve next. Details of how it
reasons are discussed in depth in later sections.
By modifying a set of data structures that determine an agent’s desires and by choosing the right plans
for the accomplishment of each such desire our system is modelled in a complete and elegant way
using the BDI framework.
3. Understanding the model
To access data on a remote table (one on another agent’s DB) the agent must send the holding agent a
request query. The requesting agent must also specify a Maximum Wait Time (MWT) with each
request; the reasons for passing this parameter will become clear as our discussion progresses.
Since the focus of this paper is on the algorithm, and also because we will not actually look the
processing of queries, queries have been simplified as being requests for locks on a particular table.
The system is modelled using this notion of locking to highlight the impact of smart scheduling on the
systems throughput.
The following assumptions are made in order to restrict the focus of this discussion:

• Agents posses a data dictionary which allows them to know the schema of other Agents. Thus,
finding a table in the multi-agent environment is made possible with needing a global schema.
• The communication network is reliable, message passing is error-free and messages take no
time to propagate through the network to the destination.
• The discussion uses an environment with 3 autonomous agents at a given time; a generalized
solution for n Agents is left for later work.
• The procedure the concurrency-control manager uses for managing concurrency is simplified
in the following ways:
o Locks will be of one type only – exclusive, which means that if a table is being
queried by an agent, no other agent can access that table.
o Communication over the network is deemed expensive and a remote request will pre-
empt a local process’s lock, unless both can be satisfied by sequential scheduling.

-5-
3.1 Scenarios
Let us now consider a few scenarios to understand the model better: (All the scenarios are explained
keeping throughput maximisation as the desired behaviour)
3.1.1 Simple case

Query(tblBlue,6) Query(tblGreen,4)

Response(data) Response(data)

DBMS DBMS DBMS

Database Database Database

AGENT A AGENT B AGENT C

• There are 3 agents, A, B and C.


• Agent A requests a query on agent B’s tblBlue with MWT as 6 seconds.
• Agent C requests a query on agent B’s tblGreen with MWT as 4 seconds a few seconds later.
• No other concurrent requests.
• This is a simple case where the queries are concurrent but request locks on different tables.
The Concurrency Manager (CM) sends responses to both agents with the data they requested.
3.1.2 Belief based scheduling

Query(tblBlue,6) Query(tblBlue,14)

Retry(tblBlue,?) Retry(tblBlue,?)

Response(data) Response(data)

DBMS DBMS DBMS

Database Database Database

AGENT A AGENT B AGENT C

-6-
• There are 3 agents, A, B and C.
• Agent A requests a query on agent B’s tblBlue with MWT as 6 seconds.
• 2 seconds later, Agent C requests a query on agent B’s tblBlue with MWT as 14 seconds.
• There are concurrent queries on agent B’s tblBlue.
• The reasoning that the agent B’s CM does in this scenario is:
o Can it schedule and process both sequentially? – Maybe, depends on how soon it
believes agent A’s query on tblBlue ends and how long agent C’s query takes. Agent
B’s beliefs play a key role here. Beliefs which are formed as a result of learning from
the previous queries on tblBlue by agents A and B. Because, if it believes that agent
A’s query on tblBlue takes <= 4 seconds and agent C’s query takes <=8 seconds, it
can then wait till agent A finishes and then process agent C’s query.
o Should it refuse agent C’s request straight away? – Maybe, this again depends on its
beliefs. Because if agent B believes that agent C’s query takes 8 seconds and also that
agent A’s query takes > 4 seconds, it can then confidently tell agent C to retry instead
of waiting. Resulting in agent C saving time it would have spent waiting
unnecessarily.
3.1.3 Scheduling with rollback

Query(tblBlue,14) Query(tblBlue,6)

Retry(tblBlue,?) Retry(tblBlue,?)

Response(data) Response(data)

DBMS DBMS DBMS

Database Database Database

AGENT A AGENT B AGENT C

• There are 3 agents, A, B and C.


• Agent A requests a query on agent B’s tblBlue with MWT as 14 seconds.
• 2 seconds later, Agent C requests a query on agent B’s tblBlue with MWT as 6 seconds.
• There are concurrent queries on agent B’s tblBlue.
• The reasoning that the agent B’s CM does in this scenario is:
o Can it schedule and process both sequentially? – Similar reasoning as in the previous
example.
o Should it refuse agent C’s request straight away? – Similar reasoning as in the
previous example.
o Should it temporarily rollback agent A’s query? – Maybe, this again depends on its
beliefs. Because if agent B believes from past experience that agent C’s query on
tblBlue takes <= 4 seconds and that agent A’s query takes <= 8 seconds, it can then:

-7-
1. Rollback agent A’s query.
2. Execute agent C’s query.
3. Reschedule agent A’s query.

Thus, both Agents requests are satisfied on being scheduled efficiently.

4. Basis for reasoning

As we have seen, scheduling of concurrent queries depends on the CM’s:


1. Knowledge about the source of request (Remote/Local), because remote requests are given a
higher priority.
2. Knowledge about the MWT of the concurrent requests, because this allows for rescheduling
and immediate retry messages being sent.
3. Belief of the processing time of an agent’s query on a particular table. Initially when the CM
has no beliefs associating an agent with a table (first instance of a query being requested on a
table by the agent), it does not pre-empt locks.
Thus, knowledge and belief form one part of the basis for reasoning. The concurrency manager
subsystem uses these and the desired behaviour as factors and filters plans for execution.
5. Implementation
The model has two facets, smart scheduling and ensuring data consistency via Peterson’s algorithm.

5.1 Adapting Peterson’s algorithm for a multi agent environment

For use in our model, Peterson’s algorithm must be generalised for an environment with an unlimited
number of agents (n-processes). The pseudo-code for Peterson’s algorithm where there are n processes:
Initialisation Phase (for process [i]):
shared in_stage [n] = 0;
shared last_process [n] = 0;

Entry protocol (for process [i]):


for j=1 to n {

// process i is in stage j and is the last process


in_stage [i] = j;
last_process [j] = i;

// for each process k other than i


for k=1 to n except k==i
// wait while process k is in a higher-numbered stage than process i and
// process i was the last to enter stage j
while ( in_stage [k] >= in_stage [i] and last_process [j] == i ) { };
}

Exit protocol (for process [i]):


in_stage [i] = 0;
updateBeliefs(B, string name, string table_name, int time);
setResponse(string data);

-8-
This adaptation of Peterson’s algorithm to allow for n-processes uses a variable to indicate the last
process which entered its critical section and to prevent that process from entering it again if others are
waiting.

For each process, the entry protocol loop that iterates through n-1 stages allows at most one process at a
time to get through all stages into its critical section. The stage of each process is stored in in_stage[i],
and the last process to begin stage j is stored in last_process[j].

This adaptation ensures similar behaviour to the two process algorithm. It ensures mutual exclusion, no
deadlocks and bounded waiting. However, the busy waiting characteristic still remains. This is worked
around by using the BDI loop.

5.2 BDI loop

The BDI aspect of the model works like a wrapper around the concurrency manger (scheduler). It sits
in a loop listening for events and reacting to them. The pseudo-code for this is as follows (adapted from
[7]):
While true do
Q = get-next-query();
if Q not null
updateKnowledge(K, Q);
I = deliberate(B, D, K);
A = plan(B, K, I);
execute(A);
end while

The purpose of this is to tell the concurrency manager what to do and when by executing an action.
Thus, scheduling efficiently and avoiding the busy waiting characteristic.

5.3 Implementing the model


A data structure containing the name of the table queried, MWT, name of the requesting agent and time
of arrival of request, called query is used. Each agent maintains a public requestQueue, retryQueue and
a response data structure. This, along with the interface defined below, allows the agents to query and
respond as required.
The agents have the following interface:
1. void query(string tblName, integer MWTime) – allowing a request to be made to an agent by
pushing a query on the requestQueue of the agent being requested.
2. void setRetry(string tblName, integer newTime) – the agent being requested informs the
calling agent to retry by pushing a retry entry on to the retryQueue of the agent who requested
the query.
3. void setResponse(string data) - the agent being requested returns the calling agent the
requested data, by setting the response data structure with the result of the query.
Moving on to the implementation of the BDI loop, the get-next-query() method senses the environment
for new queries from other agents. This is done by continuously checking the requestQueue. The queue
is popped and the query Q, if any, is stored. The pseudo-code for this method is as follows:
query get-next-query()
query popped = null;
if requestQueue not empty
popped = pop(requestQueue);

-9-
return popped;
The next step following this would be updating the knowledge structure of the agent. Here, K0 – the
initial empty knowledge structure, will be updated by the updateKnowledge() method. The knowledge
structure K, is essentially a list of query structures currently in execution or ready for execution. As
well as the time the query started executing – start_time. Once a request has been processed its
corresponding entry is popped from the knowledge list. The pseudo-code for the updateKnowledge()
method is as follows:
void updateKnowledge(knowledge K, query Q)
add Q to the end of the list K
set the corresponding start_time to -1;
The deliberate() method then uses the agent’s beliefs, desire and knowledge to determine which
intention to achieve next. Desire in our model can be represented by a one bit or a boolean variable.
The two possible desires would be FIFO behaviour (no scheduling) or a smartly scheduled behaviour.
The beliefs are a more complex structure. We can maintain beliefs in an array of belief structures. With
each belief structure associating an agent with the processing time it takes when accessing/querying a
certain table. Thus, the belief structure would contain:
• The agents name. E.g. A
• The table being accessed. E.g. tblBlue
• The mean processing time on that table by the agent.
• The number of previous accesses (queries on the table).
The beliefs array, B, is sorted by agent and table name, to allow for efficient look up. Using the above
mentioned structures the deliberate() method determines the agent’s intention I. The possible intentions
would be smart_schedule, schedule and do_nothing. The pseudo-code for this method is:
intention deliberate(beliefs B, desires D, knowledge K)
if D = FIFO
return the intention as schedule;
else
// chose an intention based on B, D and K
// choose an intention for the first entry in K that is not executing
get query from K where start_time = -1;
if a query that has not started executing exists
if ( the table requested by the query is locked AND
B has entries for the agent-table combination for holder and requestor)
return the intention as smart_ schedule;
else
return the intention as schedule;
else
return the intention as do_nothing;

- 10 -
Following this, the plan() method reasons and choses the appropriate plan to be executed from the plan
library. It returns an action for execution. An action is either null, sequential_schedule, rollback or
send_retry. The pseudo-code for this method is:
action plan (beliefs B, knowledge K, intention I)
if I = do_nothing
return action as null;
else if I = schedule
return action as sequential_schedule;
else if I = smart_schedule
// Can it schedule and process both sequentially?
// the time the holder will take in executing
curr_etime = mean execution time of holder;
// compute the remaining wait time for holder
curr_wtime_remaining = MWT – (time elapsed waiting + time elapsed executing);
// compute the remaining execution time of holder
curr_etime_remaining = curr_etime – time elapsed executing;
// the time the requestor will take in executing
next_etime = mean execution time of requestor;
// compute the time the current requestor can wait
next_wtime_remaining = MWT – (time elapsed waiting + next_etime);
if next_wtime_remaining > curr_etime_remaining
return action as sequential_schedule;
else if curr_wtime_remaining > (next_etime + curr_etime)
return action as rollback;
else
// Refuse the next request by sending a retry message
return action as send_retry;
The chosen action is then executed using the execute() method. This method basically controls the
scheduling of the queries, and adds or removes them from the queues of the n-process Peterson’s
algorithm. The pseudo-code for this method is as follows:
void execute(action A)
if A = send_retry
remove the corresponding entry from the knowledge list K;
push an entry on the requestors retryQueue by calling the setRetry() method;
else if A =sequential_schedule
add a process for the query;
else if A = rollback
kill the current holder;
add a process for the next query;
add a process for the killed process;

- 11 -
Each time a query passes the exit protocol of Peterson’s algorithm it calls the updateBeliefs() method to
update the agent’s beliefs. The updateBeliefs() method uses the current queries execution statistics and
updates the agent’s beliefs B. It also pops the corresponding entry from the knowledge list K. The
pseudo-code for this method is as follows:
void updateBeliefs(beliefs B, string agentName, string table_name, int time)
if belief exists for agentName’s access to table_name
compute new mean as (old mean* no. of access + time / (no. of accesses +1))
increment the no. of accesses by 1;
else
add a new belief entry in array B
pop the entry from K;
This frame work, suggested in terms of pseudo-code could be used to develop a simulator for
examining the effects of the different scheduling methods.
6. Future work
The purpose of this paper is to suggest and model an idea in a multi-agent setting. As a result, many
assumptions and simplifications were made in the modelling of the system. This means that further
work would be needed before any commercial application can be implemented using this model.
As we can see the design of the model allows for choice of an agents desire. An agent can either desire
to function like a FIFO scheduler or opt to use its beliefs and knowledge to try and optimise query
processing and increase throughput. Using this as a platform, a simulator would be able to highlight the
benefits of the smart scheduling by providing numerical evidence.
In commercial database systems, exclusive table level locking alone would be unacceptable because of
its impact on concurrency and throughput. So, the first improvement that needs to be made is that
locking should be made granular. At the same time there should be locks of different levels of
restrictiveness.
Also, resource discovery was assumed to be a service which was already available. Future work could
work on developing a means by which agents could learn about the location of desired data using one
of the network searching algorithms.
Communicating over a network is a messy affair, and thus error handling and allowing for
communication related problems should also be taken into account and worked around.
Last but not the least, the number of agents in the environment was limited to three for the purpose of
this paper. This limitation should be removed to allow for wider use. Most of the ideas proposed can
easily be adapted to an environment with n-agents; as a result doing this should not be a real problem.
7. Conclusions
I have described a model for agent based database systems. This different approach to database systems
spread over wide geographic regions could be used in several scenarios. In places where lock
contention is low and remote access to data is required infrequently, such a model could be used rather
cheaply without dependence on a distributed DBMS.
A simulator built based on this model, could also be a useful statistical tool. It could be used to test the
effect of learning tolerance threshold changes on a systems throughput.
References
[1] Coulouris, G. et al. Distributed Systems Concepts and Designs. Pearson Education Ltd., 2001.

[2] Franklin, S. and Graesser, A. Is it an Agent, or just a Program?: A Taxonomy for Autonomous
Agents. In Proceedings of the Third International Workshop on Agent Theories, Architectures, and
Languages, Springer-Verlag, 1996.

- 12 -
[3] McFadden, F.R. et al. Modern Database Management. Addison Wesley Longman Publishers, 2001.

[4] Peterson, G. L. Myths about the mutual exclusion problem. Information Processing Letters, 12(1),
June 1981.

[5] Silberschatz, A. et al. Database System Concepts. McGraw Hill Publishers, 2000.

[6] Tanenbaum, A.S. Modern Operating Systems. Addison Wesley Longman Publishers, 2001.

[7] Wooldridge, M. Reasoning about Rational Agents. The MIT Press, June 2000.

- 13 -
Appendix A - Definitions:
1. Database (DB)
A DB is a logically coherent collection of data with some inherent meaning. [1]
Each database in the system is controlled by its local server but cooperates to maintain the
consistency of the global distributed database.
2. Distributed database (DDB)
A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed
over a computer network.
A distributed database is a set of databases stored on multiple computers that typically appears to
applications as a single database. Consequently, an application can simultaneously access and
modify the data in several databases in a network.
In a pure distributed database, the system manages a single copy of all data and supporting database
objects. Distributed database applications typically use distributed transactions to access both local
and remote data and modify the global database in real-time.
3. Decentralized database
A database that is stored on computers at multiple locations; however, the computers are not inter-
connected by a network, so that users at the various sites cannot share data. [3]
4. Homogeneous
The same DBMS is used at all nodes. [3]
5. Autonomous
Each DBMS works independently, passing messages back and forth to share data updates. [3]
6. Database management system (DBMS)
A software application that is used to create, maintain and provide controlled access to user
databases.
7. Distributed database management system (DDBMS)
It is a piece of software that coordinates the access to data at various nodes of a DDB and provides
functions such as a distributed data dictionary, location transparency, security, concurrency, dead-
lock control and failure recovery.
8. Node
Each computer in a system is a node. A node in a distributed database acts as a client, a server, or
both, depending on the situation.
9. Transaction
A transaction defines a sequence of operations that perform a single logical function in a database
application. [1]
A transaction can span multiple databases, while still guaranteeing that changes are either all
committed or all rolled back.
10. Remote transaction
A remote transaction is a transaction that contains one or more remote statements, all of which
reference the same remote node.
11. Distributed transaction
A distributed transaction is a transaction that includes one or more statements that, individually or
as a group, update data on two or more distinct nodes of a distributed database.

- 14 -

You might also like