You are on page 1of 7

Neurocomputing 121 (2013) 25–31

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

A self-adaptive hardware architecture with fault tolerance capabilities


Javier Soto n, Juan Manuel Moreno, Joan Cabestany
Technical University of Catalunya, Electronic Engineering, Campus Nord, Building C4, c/Jordi Girona 1-3 08034 Barcelona, Spain

art ic l e i nf o a b s t r a c t

Article history: This paper describes a Fault Tolerance System (FTS) implemented in a new self-adaptive hardware architecture.
Received 25 January 2012 This architecture is based on an array of cells that implements in a distributed way self-adaptive capabilities.
Received in revised form The cell includes a configurable multiprocessor, so it can have between one and four processors working in
12 October 2012
parallel, with a programmable configuration mode that allows selecting the size of program and data
Accepted 29 October 2012
Available online 6 May 2013
memories. The self-elimination and self-replication capabilities of cell(s) are performed when the FTS detects
a failure in any of the processors that include it, so that this cell(s) will be self-discarded for future
Keywords: implementations. Other adaptive capabilities of the system are self-routing, self-placement and runtime self-
Self-adaptive configuration. Additionally, it is described as an example application and a software tool that has been
Self-placement
implemented to facilitate the development of applications to test the system.
Self-routing
& 2013 Elsevier B.V. All rights reserved.
Self-replication
MIMD
Dynamic fault tolerance

1. Introduction Technique [2], that has the ability to create and eliminate the
redundant copies of the functional section of a specific application.
In coming years adaptive systems promise to change radically our The other mechanism of fault tolerance (presented in this paper)
experience, in order to provide high performance computation power is a dedicated or static Fault Tolerance System (FTS). It provides
to serve the increasing need of large, complex and parallel applica- redundant processing capabilities that are working continuously.
tions. Self-adaptation is defined as the ability of a system to react to its When a failure in the execution of a program is detected, the
environment in order to optimize its performance. The system could processors of the cell are stopped and the self-elimination and
also change its behavior to adapt its functionality to a new mission, self-replication processes starts for the cell (or cells) involved
with respect to the environment and the user needs [1]. Adaptive in the failure. This cell(s) will be self-discarded for future self-
computing systems constitute a promising technology with respect to placement processes.
classical computing architectures. Other projects that propose fault tolerance on adaptive architec-
Self-healing is a special feature of an adaptive system, where tures have been developed, like ReCoNet-platform [3] that presents
hardware failures should be detected, handled and corrected by a framework for increasing fault tolerance and flexibility by solving
the system automatically. the problem of hardware/software codesign online. It is based on
In this paper we aim to show a new hardware architecture, whose field-programmable gate arrays (FPGAs) in combination with CPUs
principal characteristics are the self-adaptive capabilities implemen- that allow migrating tasks implemented in hardware or software
ted, that are executed autonomously and in a distributed way by the from one node to another. The principal difference with the platform
system members (cells). Basically, this is a novel unconventional presented in this paper is its high scalability and that it could
MIMD hardware architecture with self-adaptive capabilities like self- provide a lot of processing units, with less processing capacities, but
routing, self-placement, self-configuration, self-elimination and self- where any processor could implement a fault tolerance system.
replication, which includes a fault tolerance system that permits a Additionally it provides self-adaptive capabilities like self-placement
given subsystem to modify autonomously its structure in order to and self-routing in a large array of processing units.
achieve fault detection and fault recovery. Reference [4] proposes a framework under which different fault
The architecture proposed includes two mechanisms of fault tolerant schemes can be incorporated in applications using an
tolerance (FT). One of these is the Dynamic Fault Tolerance Scaling adaptive method. Under this framework, applications are able to
choose near optimal fault tolerance schemes at run time according to
the specific characteristics of the platform on which the application is
n
Corresponding author. Tel.: +34 66 035 7384. executing. This framework is presented for high performance parallel
E-mail addresses: javier.soto.vargas@upc.edu, and distributed computing. Compared with this framework the
javiersotovargas@gmail.com (J. Soto),
joan.manuel.moreno@upc.edu (J. Manuel Moreno),
architecture proposed in this paper provides adaptive capabilities
joan.cabestany@upc.edu (J. Cabestany). at hardware level different from application level.

0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.neucom.2012.10.038
26 J. Soto et al. / Neurocomputing 121 (2013) 25–31

2. Description of the architecture In addition to the physical implementation of the architecture


described previously, two additional layers are defined as con-
Any application scheduled to the system has to be organized in ceptual organization for the implementation of general purpose
components, where each component is composed by one or more applications: the SANE and the SANE assembly. The SANE (Self-
interconnected cells. The interconnection of cells inside a compo- Adaptive Networked Entity) is composed by a group of compo-
nent is made at cell level, while the physical interconnections of nents. This is the basic self-adaptive computing system. It has the
components are made in switch matrix level. ability to monitor its local environment and its internal computa-
In the initial state, all cells are free, i.e., they do not belong to any tion process. The SANE assembly is composed by a group of
component. Then, the components have to be placed and connected interconnected SANEs.
for data processing and information exchange. This is a sequential
process where each cell has to be placed and routed in the system. 2.1. Cell architecture
For this purpose, the cells execute in a distributed way the self-
placement and self-routing algorithms. All cells have got a 32-bit The cell is the basic element of the proposed architecture.
unique identifier called address. This field is divided into two 16-bit Therefore, the cell has to include the necessary hardware to carry
words, called id_component and id_cell. The id_component is the out the basic principles of self-adaptation; dynamic and distributed
component unique identifier, where the value FFFFh is reserved for self-routing [5–7], dynamic and distributed self-placement, self-
special features and the value 0000h to indicate that the cell is free elimination, self-replication, scalability and distributed control.
and does not belong to any component. Therefore, it is possible to The cell consists of the Functional Unit (FU), the Cell Configuration
instantiate up to 65534 different components. The id_cell is the cell Unit (CCU) and multiplexers that allow the interconnection between
unique identifier in a component, so there may be up to 65536 cells FU ports of two cells. The cell is interconnected with its four direct
in a component and a maximum close to 232 cells in the system. neighbors by means of local, remote and expansion ports. The cell
The physical implementation of this architecture is depicted in architecture and port distribution is depicted in Fig. 2(a). The local and
Fig. 1, which shows the representation of a chip that includes an remote ports are 9-bit wide bus (8-bit for data and 1 bit for read
array of clusters, pin interconnection matrices and a Global Con- enable). The read enable (RE) is set to logic ‘1’ for a clock pulse when
figuration Unit (GCU). Fig. 1(b) shows a cluster, that is composed a processor performs a write operation over the corresponding
by a 3  3 cell array and a switch matrix. output port.
This two-layer implementation is composed by interconnected
cells in the first level and interconnected switch and pin matrices in 2.2. Functional unit (FU)
the second level, which are interconnected to the GCU by means of an
Internal Network. Several chips can be interconnected by means of an The FU is in charge of executing the processes scheduled to the
External Network connected to the GCUs. These networks have been cell. The FU can be described as a configurable multicomputer
designed to support the system with the necessary functionality to with four cores (Fig. 2(b)). The FU has four 9-bit input ports and
carry out the self-adaptive capabilities. four 9-bit output ports. Additionally the FU includes four 9-bit FT

Fig. 1. System architecture. Composed by a cluster array, pin interconnections matrices and GCU. Cluster: 3  3 cell array and switch matrix. (a) Architecture overview and
(b) 3D view of a cluster.

Fig. 2. Cell architecture. (a) Cell block diagram and (b) block diagram of FU.
J. Soto et al. / Neurocomputing 121 (2013) 25–31 27

input ports exclusively for FTS, if it is enabled. The FU includes an FT_input ports are not used, and the FTS only performs compar-
Outputs Multiplexing System that allows the cores to write data in ison between cores of the cell.
the output ports, as well as direct the data flow of cores to outputs Otherwise, when the redundant processor is located in another
when the FTS is enabled and any of its cores is working as a cell (FT_modes 5, 6, 7 or 8), the FTS of the primary cell must
redundant processor. The FTS is explained in detail in Section 3. perform a comparison between cores of different cells, in this case
Each core has program memory, data memory and additional the Output Multiplexing System of the cell that includes the
hardware necessary for its functionality. The instruction set is redundant processor must drive the output of the data flow of
composed of 44 instructions, which includes arithmetic, logic, shift, the core(s) to the output of the cell (RE is set to logic 1), which in
branch, conditional branch and special instruction for the execution turn should be connected to the FT_input of the FU of the cell that
of microthreads [8]. All instructions can be executed in a single clock includes the primary processor.
cycle. The program memory can store up to 64 instructions. The data When a hardware failure is detected by the FTS, the damaged
memory is 8-bit wide and it is composed of eight general-purpose cell(s) is self-eliminated and self-replicated to another location.
registers and 14 Configuration and Status Registers. The cell data registers and its processing capacity are lost (prob-
The configuration modes of the FU consists basically in grouping ably corrupted). However, the routing resources and some adap-
cores, where the expansion of data and program memory describes tive capabilities of the cell (included in the CCU) are still working.
the specific configuration mode. The data memory can be combined in This means that cell continues participating in the self-placement
width and length, achieving combinations for data processing of 8, 16, (as busy cell) and self-routing algorithms. This allows the inter-
24 and 32 bits. The program memory can only be joined in length, connection of two distant cells in a component using the routing
making possible to have programs of 64, 128, 192 or 256 instructions. resources of a cell with damage in its functional unit.
There are 12 different configuration modes, ranging between one and The next steps describe the procedure for self-elimination and self-
four processors working in parallel. replication of a single damaged cell, the procedure is repeated for a
second damaged cell if the redundant processor is located there:

2.3. Cell configuration unit (CCU)  The cores of damaged cell are blocked, no more instructions
will be executed. This cell requests to the GCU to start the
Using a distributed working principle, the CCUs of the cells process.
in the array are responsible for the execution of the required  The GCU sends a command through the Internal Network
algorithms for the implementation of the self-adaptive capabilities requesting the cell array to perform a derouting process for
of the system, specifically the self-placement and self-routing all connections with the damaged cell.
algorithms [9]. These algorithms are executed by the CCU using  The damaged cell is self-eliminated, this cell is configured
the Internal Network and the expansion ports. as busy cell and its address will be fixed to FFFF0001 to avoid
The self-placement algorithm is responsible for finding out the most future use.
suitable position in the cell array to insert the new cell of a component.  The GCU starts the replication process. The GCU sends the
For the placement of the first cell of a component, a particular pro- appropriate information to start the execution of the processes
cedure is used, different from other cells. In this case a good candidate for self-placement and self-routing the cell again. The program
position is one where a free cell has low routing congestion and the memories and the appropriate configuration registers of the
largest number of free neighboring cells. After inserting the compo- new cell are configured again.
nent first cell, the next cell to be inserted is placed as close as possible  The GCU asks the system to start the process of self-routing
to the same component cells present in the array. at component level to route the missing connections in the
The self-routing algorithm is executed since the insertion of the system.
second cell of a component, each time that the self-placement process
ends. The algorithm allows interconnecting the ports of the functional When elimination and replication processes end, the proces-
unit of two cells through the local and remote cell ports. sors in the system are enabled again and continue working (start
The self-elimination and self-replication of cells allows the replica- working for replicated cells). It is user's responsibility to restart the
tion and elimination of cells that are suspect of having a hardware software application scheduled to the SANE, or to recover a known
failure, these are self-adaptive capabilities that are closely tied with starting point.
the FTS (see Section 3).
3.1. Fault tolerance configuration modes

3. Fault Tolerance System (FTS) The FTS could be implemented in any processors available in
the system. Table 1 shows the nine FT_modes available and the
The FTS enables the system to continue operating properly in comparison performed. This table indicates the cores that are
the event of the failure of some of its processors. The FTS consists compared in each FT_mode, which could be combined with any of
of a specific hardware that allows the comparison of two identical the 12 configuration modes available for the system, obtaining 108
processors each time that an instruction is executed, that means possible combinations. It is the responsibility of the developer
each clock cycle. This involves the comparison of 2, 4, 6 or 8 cores, to configure an appropriate combination between configuration
depending of the configuration mode of the FTS (FT_mode). mode of cells (always necessary) and FT_mode (if FTS is enabled).
These processors that will be compared are called primary and When a specific application needs to implement a processor
redundant. They must share the same inputs, but on the other with fault tolerance capabilities, the developer has to set the
hand, the primary processor takes over writing the output ports, configuration mode and the FT_mode by means of the FTCSR
because two outputs cannot be routed to the same location. register, which also includes the next fields: FT_enable that enable
The redundant processor may or may not be included in the the FTS. If it is not enabled, the other bits of FTS are not taken into
same cell where the primary processor is located, this depends on account. The FT_error_flag indicates when the FTS has found an
the processor on which the FTS is to be implemented (configura- error while performing a comparison between two processors, this
tion mode). If the redundant processor is located in the same cell bit stops the execution of the programs in the cores and alerts the
that includes the primary processor (FT_modes 0, 1 , 2, 3 or 4), the CCU to start the self-elimination and self-replication processes
28 J. Soto et al. / Neurocomputing 121 (2013) 25–31

of damaged cells. The FT_redundant_cell bit indicates if the cell is and redundant processors could calculate the next data of the
a redundant cell, it must be set in the redundant cell for the sequence in a parallel way.
FT_modes 5, 6, 7 and 8 only. If the primary or redundant processor have a hardware failure,
the cells AAAA0001 and BBBB0002 are self-eliminated, the address
3.2. Example of an application that implements a FTS FFFF0001 will be fixed in both cells, and they will not be used in
subsequent self-placement operations. Its routing resources continue
Let us suppose an 8-bit sequence of data that has to be to be available. Next, the cells AAAA0001 and BBBB0002 start
generated by a processor with a capacity for 250 instructions, the self-replication process, it involves the self-placement and self-
which has to include a FTS to protect the reliability of the routing of these cells in another location inside the cell array. Once
sequence. The sequence has to be generated at a low speed, much this process ends, the sequence starts the generation again.
lower than the clock available, so it is necessary to implement a At this point of the explanation, the reader could be thinking
delay in the sequence. This could be implemented in the processor “and what happens if…?”, turning the sample in a more elaborated
of another cell with a capacity for 50 instructions. and complex situation. Therefore, it is important to take into
The problem can be solved in many ways, Fig. 3(a) shows a specific account that the system should provide enough spare cells, so that
solution. This SANE includes three components (AAAA, BBBB and this application could be extended.
CCCC). The cell identified as AAAA0001 includes the primary proces-
sor, which generates the sequence, and the cell BBBB0002 includes the 3.3. Prototype implementation
redundant processor that generates the same sequence. These cells are
configured in configuration mode 4 and in FT_mode 5, that allows the For demonstration purposes, the original architecture previously
FTS of the primary cell making the comparison of 2 cores (core 0 in described has been modified for the construction of a prototype,
primary and redundant cells). The cell CCCC0003 contains a processor due principally to the physical limitations in the FPGAs used for the
that performs the delay. This will be in mode 0, and its FTS will be implementation of the system. It is important to note that cells
disabled (only one processor is used, the other three could be used for in prototype include and execute all the self-adaptive capabilities
future implementations). described in this paper. The prototype has been developed in two
When the primary and redundant cells generate one data of the chips, each one is a Virtex4 Xilinx FPGA (XC4VLX60), with an
sequence, the OUT0 port of the primary cell is written, producing an utilization rate close to 80% of their capacity.
RE flag. This is conducted to the cell CCCC0003, which waits this RE Each chip consists of a cluster that was reduced to a 2  2 cell array
pulse from cell AAAA0001 to start the generation of a delay (this RE is (this includes the switch matrix), one pin interconnection matrix,
received by means of the special instruction called BLMOV “Blocked one control microprocessor ðμPÞ and the GCU. The control micro-
Move”, which reads the port, saves the data in a memory location processor ðμPÞ is responsible for implementing the main program for
and follows with the next instruction when a RE pulse is produced). the configuration and execution of system functionality, even during
The data read is not important for the cell that produces the delay runtime. The μP through the GCU is responsible for controlling all the
(CCCC0003). This cell executes the delay algorithm, and then writes self-adaptive processes in the chips. Each chip through its μP is
any data in its OUT0 port, which produces a RE pulse that has to be connected to a computer by means of a serial communication port,
conducted to the inputs of the primary and redundant cell. When which allows to read continuously messages that indicate the internal
these cells receive the RE pulse they generate the next data of the processes executed in the chips (see Section 5 for details). The system
special sequence, and so on. While the delay is performed, the primary can have between 8 and 32 processors working in parallel, it depends
on the configuration mode implemented.
Table 1 Fig. 3(b) shows a 3D view of the prototype. Additionally, the
Fault tolerances modes (FT_modes). figure shows the location of cells after executing the self-
placement and self-routing algorithms for the application example
FT_mode Core comparison
(Section 3.2). The cells were placed in the order AAAA0001,
0 C0⇔C1 BBBB0002 and CCCC0003. The figure shows only the initial SANE
1 C0⇔C1 & C2⇔C3 configuration, and does not show details after failures, which
2 C2⇔C3 implies the elimination and replication of cell(s).
3 C0⇔C2
4 C0⇔C2 & C1⇔C3
5 C0⇔C0n
6 C0⇔C0n & C1⇔C1n 4. SANE developer system
7 C0⇔C0n & C1⇔C1n & C2⇔C2n
8 C0⇔C0n & C1⇔C1n & C2⇔C2n & C3⇔C3n
One of the main inconveniences in the design phase of the
⇔ denotes a comparison between cores. & denotes a logic AND. architecture was the creation of applications that allow to test
n
denotes a core in the redundant cell (a connection between FUs of primary the system. This process consists in the creation of SANEs, that
and redundant cell is assumed). includes a specific number of interconnected components, and

Fig. 3. FT application example. (a) Components configuration and (b) prototype.


J. Soto et al. / Neurocomputing 121 (2013) 25–31 29

interconnected cells inside each component. This functionality The SPD allows the insertion and modification of all related
involved, for each cell in the application, the creation of inputs and information of the SANE assembly, this involves for each cell: the
outputs connection tables, the configuration of special registers, identification number (address), the configuration registers, the
and the generation of hexadecimal instructions code for each connection tables for inputs and outputs, description and alias
processor (between one and four) from programs written in of cell, description of component, and assignment of assembler
the native assembler language created for the functional unit. (ASM) or C files to processor(s) in cell. For writing the Program
Any modification in the application implies a lot of time rewriting Memories of the processors, assembler (ASM) or C files will be
the data for its configuration. Similar to any commercial general created, this allows to execute the functionality of a processor in
purpose device a software tool is fundamental for improving the the cell.
capacity of a designer when developing applications. The SPD supports building of the final hexadecimal file (SHEX)
The SANE Project Developer (SPD) is an Integrated Develop- with the configuration of the SANE assembly that will be imple-
ment Environment (IDE) that allows generating the memory mented in the FPGA. This process includes the compilation of the
initialization data for the control microprocessor inside the pro- files involved in the process according to the SASM file. The SPD
totype. The SDP allows the creation of files that describe the generates a list of errors, warnings and information for all files
configuration of a SANE assembly (SASM files). This file includes involved in the building process and guides the user to make
instructions that support the next functionalities in the system: the appropriate corrections if required. The SPD also supports
the compilation of individual ASM files. Compilation for C files is
 Create, interconnect and delete components. currently under development.
 Write the Program Memories and Configuration Registers of Fig. 4 shows a screen capture of the SPD, which shows the
the cores in the Functional Unit of the cell. SASM file for the problem proposed in Section 3.2. It also shows
 Commands for global manipulation of processors, like enable, the structure of the solution in the left tree and the output of the
disable and restart processors. Including or not a special “wait” building process in the bottom section.
instruction, that enables the dynamic and static fault tolerance
systems. 5. Experimental results
 Special instructions related with the dynamic fault tolerance
system. To test the architecture we have developed some applications,
 Special instructions related with the static fault tolerance one of them is proposed in Section 3.2. In normal conditions the
system. sequence is executed continuously without interruption. For test

Fig. 4. Screen capture of the SANE Project Developer (SPD).


30 J. Soto et al. / Neurocomputing 121 (2013) 25–31

purposes, one data in the sequence was modified by software (in 9. All processors in the cells configured are enabled. The wait
ASM file). Therefore, when the sequence arrives to this execution mode is enabled (The FTS waits for a fault.). The execution
point, the FTS detects (or interprets the FTS_error as) a hardware of the tasks scheduled to the SANE has started.
failure and the process for self-elimination and self-replication of Note 1: Steps 1–9 correspond with the program executed
the primary and redundant cells starts. The new cells start the by the configuration file (SASM) showed in Fig. 4.
execution again and by the modification in the software men- Note 2: The next steps are executed when a hardware
tioned previously, the failure occurs repeatedly performing the failure occurs. For test purposes the failure was inten-
self-organization processes until the system runs out of resources, tionally induced by software.
in this moment the system returns the correspondent error. 10. The program jumps to the instruction ft_configuration related
Fig. 5 shows a screen capture of the messages sent by the chips with the primary cell of the fault. The FTS will start the
in the execution time of the application. For space reasons the elimination and replication of cell(s) involved in the failure.
entire record of the messages is not shown. The following is a brief 11. All processors are restarted and disabled.
description of the sequential processes running internally on the 12. The cell 0xAAAA0001 is self-eliminated. Every connection
chips: with this cell is unrouted. The cell is self-discarded for future
self-placement, its address is configured to 0xFFFF0001 (cell
1. Initialization and synchronization of chips (chip0 and with a hardware failure). The routing resources of the cell
chip1). remain active and can participate in future self-routing
2. Jump instruction ft_configuration. This instruction is only processes.
used when a fault occurs in the system. 13. The cell 0xBBBB0002 is self-eliminated and self-discarded.
3. Create the component 0xAAAA. The self-placement and 14. The reinsertion of cell 0xAAAA0001 is executed. The self-
self-routing processes for cell 0xAAAA0001 are executed. placement and self-routing processes are executed in chip0
A contest process determines that the cell is placed in the (at this moment the chip0 is full).
chip0 by a priority order previously configured by hardware. 15. The reinsertion of cell 0xBBBB0002 is executed. The self-
4. Create the component 0xBBBB. The self-placement and self- placement and self-routing processes are executed in chip1.
routing processes for cell 0xBBBB0002 are executed in chip0. 16. The program memories and configuration registers of cells
5. Create the component 0xCCCC. The self-placement and self- 0xAAAA0001 and 0xBBBB0002 are written.
routing processes for cell 0xCCCC0002 are executed in chip0. 17. The self-elimination and self-replication processes end. The
6. All processors are restarted and disabled. program goes back to the instruction enable_processors_wait.
7. The program memories and configuration registers of cells All processors in the cells configured are enabled. The wait
0xAAAA0001, 0xBBBB0002 and 0xCCCC0003 are written. mode is enabled (The FTS waits for a fault.). The execution of
8. The components self-routing process is executed. the tasks scheduled to the SANE has started again.

Fig. 5. Screen capture of the messages read from the chips when an application is executed.
J. Soto et al. / Neurocomputing 121 (2013) 25–31 31

Table 2
Results of the synthesis process for the proposed prototype.

Part (Description) Slices Slice flip flops Four input LUTs RAMB16

FPGA: chip resources, for comparison 26,624 53,248 53,248 160


100% 100% 100% 100%
Cell: the routing resources of two sides were eliminated 3540 992 6760 6
13% 1% 12% 3%
Cluster: 2  2 cell array+switch matrix 17,716 7881 33,801 24
66% 14% 63% 15%
Chip: cluster þ GCU þ μP of control+pin interconnection matrix 19,557 9638 36,424 40
73% 18% 68% 25%

Note 1: Steps 10–17 are executed each time when the [6] J. Moreno, Y. Thoma, E. Sanchez, POEtic: a prototyping platform for bio-inspired
induced failure occurs. The process ends when the hardware, in: Proceedings of the 6th International Conference on Evolvable
Systems (ICES), pp. 180–182.
system generates an error indicating that the system is
[7] J. Moreno, E. Sanchez, J. Cabestany, An in-system routing strategy for evolvable
full and it is not possible to place more cells in system. hardware programmable platforms, in: Proceedings of the Third NASA/DoD Work-
shop on Evolvable Hardware, IEEE Computer Society Press, 2001. pp. 157–166.
[8] T. Vu, C. Jesshope, Formalizing SANE virtual processor in thread algebra, in:
The results of the usage rate for elements of the architecture
ICFEM'07: Proceedings of the Formal Engineering Methods, 9th International
are detailed in Table 2. The Xilinx Synthesis Technology (XST) was Conference on Formal methods and Software Engineering, Springer-Verlag,
used for the system implementation in the device selected. This Boca Raton, FL, USA, 2007, pp. 345–365.
table shows the usage rate for a cell, a cluster and a complete chip, [9] J. Soto, J.M. Moreno, J. Madrenas, J. Cabestany Communication infrastructure for
a self-Adaptive hardware architecture, in: Proceedings of the Reconfigurable
this allows to have an idea of the system granularity. The program Communication-centric Systems-on-Chip workshop (ReCoSoC 08), Barcelona,
memory of the FU and the connection table have been implemen- Spain, July 9–11, 2008, pp. 175–180, ISBN: 978-84-691-3603-4.
ted by means of the RAM blocks available in the FPGA used.

6. Conclusion and future work Javier Soto Vargas is an Instructor Professor at the
Electronic Engineering Department of the “Escuela
Colombiana de Ingeniería Julio Garavito” (ECI), where
A Fault Tolerance System (FTS) for a novel self-adaptive hard- he was the coordinator of the digital systems area
ware architecture has been reported. The FTS performs a contin- as well as the robotic group ECIBOT. He received his
degree in Electronic Engineering in 2000 from ECI and
uous comparison between primary and redundant processors, that
the M.Sc. in Teleinformatic from the “Universidad Dis-
could be implemented in one or two cells, depending on the trital Francisco José de Caldas” in 2007. He is currently
configured FT_mode. The cell (or cells) that includes the primary working toward the PhD degree in Electronic Engineer-
ing from the Technical University of Catalunya. His
and redundant processors involved in the FTS will be self-
current research interests are in self-adaptive architec-
eliminated and self-replicated to other free cell(s) by the system tures based on bio-inspired systems.
when the FTS detects a failure. This cell(s) will be self-discarded
for future implementations.
An example application that illustrates the functionality of the
system has been described. Also, the principal characteristics of a Juan Manuel Moreno received the Master and PhD
degrees in Telecommunication Engineering from the
software tool that allows the creation of general purpose applica-
Technical University of Catalunya (UPC) in 1991 and
tions has been described. 1994, respectively. He is an Associate Professor at the
As future work for the SANE developer system, we consider the Electronics Engineering Department of the Technical
University of Catalunya. He has led and participated
implementation of a C-compiler (under development) to expand
in several research projects related to reconfigurable
the possibilities and provide an alternative solution to program- architectures, SoCs, artificial neural networks and bio-
ming in assembler for the system processors. Also we consider inspired systems. His research interests are in dynami-
the direct communication with the FPGAs in order to program and cally reconfigurable architectures, physical implemen-
tation of bio-inspired systems and artificial neural
control the execution of a SANE assembly. networks for signal processing.

References

[1] AETHER Project Home, URL: 〈http://www.aether-ist.org〉. Joan Cabestany is a professor at the Department of
[2] J. Soto, J. Moreno, J. Madrenas, J. Cabestany, Implementation of a dynamic fault- Electronic Engineering (UPC). He received his degree in
tolerance scaling technique on a self-adaptive hardware architecture, in: Telecommunication Engineering in 1976, and the PhD in
Proceedings of the International Conference on Reconfigurable Computing 1982, both from the Technical University of Catalunya.
and FPGAs, 2009, pp. 445–450. He is the responsible of the AHA (Advanced Hardware
[3] T. Streichert, D. Koch, C. Haubelt, J. Teich, Modeling and Design of Fault-Tolerant Architectures) research group, with expertise on reconfi-
and Self-Adaptive Reconfigurable Networked Embedded Systems, Hindawi gurable hardware, electronic system design, advanced
Publishing Corp, New York, 2006. hardware architectures, microelectronics and VLSI design.
[4] Z. Chen, M. Yang, G. Francia, J. Dongarra, Self adaptive application level fault He is member of the CETpD staff structure since 2005.
tolerance for parallel and distributed computing, in: IEEE International Parallel He has been the UPC responsible several EU funded
and Distributed Processing Symposium, 2007. IPDPS 2007, pp. 1–8. projects, and their main interest topics are artificial
[5] N. Macias, L. Durbeck, Self-Assembling circuits with autonomous fault handling, intelligence and electronic technology applied to different
in: Proceedings of the 2002 NASA/DoD Conference on Evolvable Hardware topics (aging and dependent people issues, in particular).
(EH'02), IEEE Computer Society Press, 2002, pp. 46–55.

You might also like