Professional Documents
Culture Documents
Terrm Paper - Fts (05CS3003)
Terrm Paper - Fts (05CS3003)
By
Debobrata Podder
05CS3003
Indian Institute Of Technology –Kharagpur
Introduction
Indian Railways is the world's fourth largest railway network after those of
the United States, Russia and China and transports 20 millions passengers
across the country. So the need of a safe and reliable railway signal
communication is of top priority. Though the distributed safety critical
railway signaling systems are based on fault tolerant and fail safe techniques
to provide high safety and reliability yet the railway communications
sometimes suffer from unique communications issues. In our article we will
discuss one such an issue and try to provide a possible solution to avoid that.
The various safety critical system functions distributed geographically in a
railway yard can be grouped and interconnected to form a local area network
.For a dual node topology of the local area network each node provides the
following functions -
4. It ensures safe reaction in the safety system functions within its coverage
area in the event of multiple processor failures.
The reliability and safety of the node depends on the failure rates of
components in use in the node. Each node consists of four transputers. In this
article we will try to improve the existing system by adding an extra
transputers and using an extended byzantine algorithm.
Work Already Done
Basic Structure of a fail safe node
Each node consists of four transputers configured as a square mesh using the serial
links of the transputers. Out of the four serial links of each transputer, one is used
for maintaining a communication link with neighboring node ,one is used for
performing input/output for driving system functions and the remaining two are
used, one each for connecting with neighboring transputers of the same node.
2. Election of a leader from among the four transputers for a fixed tenure.
The majority voting requires an external reliable majority voter, while the election
of a leader from among the four transputers for fixed amount of time requires the
process of election which involves time overhead. In addition both these options
require fault detection mechanisms. The selection of leadership on rotation basis
for a fixed amount of time the above disadvantages.
Algorithm AByz(N,m)
Step A1. The original source signs its message ψ and sends it out to each of the
processors.
Step A2. Each processor i that receives a signed message ψ : A, where A is the
set of signatures appended to the message ψ, checks the number of signatures
in A. If this number is less than m + 1, it sends out ψ : A ∪ {i} (i.e., what it
received plus its own signature) to each of the processors not in set A. It also
adds this message, ψ, to its list of received messages.
Step A3: When a processor has seen the signatures of every other processor
(or has timed out), it applies some decision function to select from among the
messages it has received.
Extension of byzantine generals on a network of
interconnected fail safe nodes with authenticated
messages
Let each fail safe node have n transputers (Generals), where n is even.each fail
safe node is partially connected network with the smallest number of disjoint
paths between each pair of transputers (Generals) being two. There are N fail
safe nodes interconnected to form n/2 multiple rings with one of the fail safe
node acting as a command unit (central controller).
Let each transputer of the fail safe node be designated as Tij where i is the
number of the transputer within the node (1<=i<=n) and j is the node number
(1<=J<=N). The command unit fail safe node guides each transputer of the fail
safe node when it is indecisive. In the n/2 multiple ring networks , consider a
fail safe node j with its adjacent nodes j-1 and j+1.For all j, the activities of the
transputers of fail safe node j is monitored by the neighboring transputers of
nodes j-1 and j+1 by executing some process S in parallel to observe the
activities of the neighboring transputer Puv , (where 1<=u<=n and
1<=v<=N)and conveys it to the command unit .This process S is called spy
process and differs from a General of fail safe node in the following ways –
1. The spy process does not have any control on the outputs of fail safe node
or on the transputer (General) which it is spying.
2. The spy process does not require any additional resources and it runs in
parallel with the General's functions of the fail safe node on the each
transputer.
Thus each transputer has dual role of performing the functions of the fail safe
node to which it belongs and at the same time spy the activities of the General
of the adjacent node connected to it which is conveyed to the command unit.
The spy does not take any decisions but conveys the following in a message
frame to the command unit –
1. Aberration in the incoming data rate from the General of the adjacent node
connected to it.
2. Deviation in the leadership time of the General of the adjacent node
connected to it.
The command unit would decide whether a given General of a fail safe node is
traitor or not based on the messages form the spy processes monitoring the
Generals.
2. If the number of traitor Generals of a fail safe node is more than the number
of loyal Generals, it is not valid and then fail safe node is shut down and is
isolated from the network.
Extra transputer
Fig 1
As shown in Fig 1, each of the extra transputer has four links , one outgoing
link to some other fail safe node’s extra transputer and one out-going node to
the command unit and one in-going link from the existing four transputers
structure of the same node and another in-going link from the extra
transputer of some other fail safe node.