Professional Documents
Culture Documents
_
(5)
P
T
44
1 1 1 0
1 1 0 1
0 1 1 1
1 0 1 1
_
_
_
_
_
_
M
1 0 1 1
_ _
(6)
G
2
[I
1616
jP
165
] (7)
P
T
165
1 1 1 0
1 1 0 1
0 1 1 1
0 0 0 0
0 0 0 0
.,,.
row a
1 1 1 0
1 1 0 1
0 1 1 1
1 1 1 1
0 0 0 0
.,,.
row b
1 1 1 0
1 1 0 1
0 1 1 1
0 0 0 0
1 1 1 1
.,,.
row c
1 1 1 0
1 1 0 1
0 1 1 1
1 1 1 1
1 1 1 1
.,,.
row d
_
_
_
_
_
_
_
_
_
_
_
_
_
M M M M
0 0 0 0
0 0 0 0
1 1 1 1
0 0 0 0
0 0 0 0
1 1 1 1
1 1 1 1
1 1 1 1
_
_
_
_
(8)
Hamming code H(8, 4) with the generator matrix in (5). The
parity check bits of each group can be combined to generate
parity check bits of an extended Hamming code H(21, 16)
with the generator matrix in (7), where P
T
165
is transpose
of parity matrix. The hardware implementation of the
H(8, 4) and H(22, 16) encoder is shown in Fig. 5. By using
Figure 4 Implementation of the proposed transmitter design
254 IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 251261
& The Institution of Engineering and Technology 2010 doi: 10.1049/iet-cdt.2008.0130
www.ietdl.org
the hardware sharing method, the congurable encoder1 is
implemented in two stages parity calculation and merge
circuits, as shown in Fig. 4. The parity calculation outputs
can be directly used as the parity check bits of four
Hamming encoders with input width (K/4)-bit or merged
together to generate the parity check bits of a Hamming
encoder with input width K bits.
3.2.2 Proposed receiver design
Fig. 6 shows an implementation of the proposed receiver
design. Decoder1 can be congured as a single Hamming
decoder, which uses the whole codeword as the input, or a
component decoder (row decoder) in product codes. The
component decoder consists of four Hamming decoders
and each decoder uses a part of the codeword as input. The
realisation of congurable decoder1 is divided into three
steps, as shown in Fig. 6. The hardware sharing method
used in transmitter design is implemented for the parity
calculation circuits. The syndrome calculation circuit is an
XOR operation of the parity calculation outputs and the
parity check bits in the codeword. Syndrome calculation1
generates the syndrome vector of operating mode-(a) and
syndrome calculation2 generates the syndrome vector of
operating mode-(b). The syndrome vectors are fed into a
syndrome decoder and error correction circuit. The
syndrome decoder is implemented as an AND tree, whose
inputs are the syndrome value or its inverse. The error
correction is implemented as an XOR gate. A hardware
sharing method is proposed to realise the syndrome
decoder and error correction circuits in this paper. Fig. 7
shows an example of the proposed hardware sharing
method. The 16-bit message is encoded by a Hamming
code H(21, 16) [operating mode-(a)] or four extended
Figure 5 Hardware sharing method to calculate the parity check bits
Figure 6 Implementation of the proposed receiver design
IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 251261 255
doi: 10.1049/iet-cdt.2008.0130 & The Institution of Engineering and Technology 2010
www.ietdl.org
Hamming code EH(8, 4) [operating mode-(b)]. Four-bit
syndrome vectors are used for each EH(8, 4) code and a
5-bit vector is used for the H(21, 16) code. By properly
selecting the syndrome value and its inverse from the
different operating modes, the syndrome decoder circuits
and error correction circuit can be shared. 1 is assigned as
the extra syndrome bit for each EH(8, 4) code. The
outputs of congurable decoder1 are saved into a receiver
buffer. In operating mode-(a), only the decoded message is
saved. In operating mode-(b), the decoded message and
row parity check bits are both saved. The saved message
and row parity check bits are used to perform an iterative
decoding procedure when the column parity check bits are
transmitted [17].
4 Evaluation of the proposed
error control scheme
The proposed congurable error control scheme can be
employed in NoCs. In this context, the encoder and
decoder circuits are integrated into the NoC routers, as
shown in Fig. 8. Registers are inserted between encoder
and link, and also between link and decoder to allow
pipelined operation. In this section, the performance of the
proposed congurable coding scheme is evaluated in terms
of codec delay, complexity, reliability and energy
consumption. We used a 64-bit input message. The
Hamming code H(71, 64) is used in operating mode-(a). In
operating mode-(b), the 64-bit input message is arranged
into a 4 16 matrix. Each row is encoded using an
extended Hamming code EH(22, 16) and each column is
encoded using an extended Hamming code EH(8, 4). The
total number of wires in the link of the proposed method is
88. In operating mode-(a), only 71 wires are used and the
remaining wires are connected to ground. The proposed
error correction scheme is developed and veried in Verilog
HDL. The encoder and decoder are synthesised using
TSMC 45 nm technology. The delay, area and power of the
encoder and decoder are reported using synopsis design
compiler at 1 GHz clock frequency. The link power is
measured in Cadence Spectre using a 45 nm global link
interconnect model [21], with parameters shown in Table 1.
Simulation results of the proposed method are compared to
directly using Hamming code H(71, 64), a 3-bit error
correction BCH(85, 64), and a ReedSolomon RS(85, 65)
code. Zero padding is applied to meet the length
requirement of RS code. The number of wires in the link
for different coding schemes is shown in Table 2.
Figure 7 Hardware sharing of syndrome decoder and error correction circuits
Figure 8 Implementation of proposed congurable error
control scheme in NoC platform
256 IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 251261
& The Institution of Engineering and Technology 2010 doi: 10.1049/iet-cdt.2008.0130
www.ietdl.org
4.1 Codec delay and area
Table 2 compares the synthesised codec delay of the proposed
scheme to the directly using H(71, 64), BCH(85, 64) and
RS(85, 65). The decoder delay, typically much larger than
encoder delay, is reported here. A three-stage pipelined
process is implemented in operating mode-(b) to decode
product codes [17]. The decoding process for operating
mode-(a), described in Fig. 6, is implemented within one
clock cycle. Compared to directly using H(71, 64) code,
the decoder delay of the proposed congurable coding
method increases about 10% because of the overhead of the
extra MUX for mode switching. The BCH(85, 64) and
RS(85, 65) decoder are implemented in a seven-stage
pipelined architecture. In order to improve the throughput
of BCH(85, 64) and RS(85, 65) codes, the parallel method
in [22] is applied. Compared to the RS(85, 65), the
proposed congurable coding method achieves a 15% delay
reduction.
Table 2 also shows the synthesised codec area for different
error control schemes. The area includes the encoder and
decoder area. The encoder area includes the retransmission
buffer. The decoder buffer storing the original message in
the receiver is not included, because it can always be shared
with the routing buffer in the router. The area of the error
counter and comparison circuits in the congurable control
logic is also included in the proposed method. The results
show that multiple error correction codes have much larger
area than that of simple Hamming codes. The area
overhead of the product codes is mainly because of the
retransmission buffer and the pipelined decoder architecture.
Compared to BCH(85, 64) and RS(85, 65), the product
code has a smaller area, because each component code is still
a simple Hamming code. BCH(85, 64) has the largest area
because of the complexity of the eld operation and the
decoding process. By using the proposed hardware sharing
method, the area overhead of the conguration circuit is
relatively small compared to product code itself [17].
4.2 Reliability
On-chip communication errors can be attributed to voltage
perturbations induced by noise from many sources. In [23],
a Gaussian pulse function is used to model the error
probability for a single wire when a transition occurs
1 Q
V
swing
2s
N
_ _
_
1
V
swing
=2s
N
1
2p
p e
y
2
=2
dy (9)
where V
swing
is the link swing voltage and s
N
is the standard
deviation of the noise voltage, which is assumed to be a
normal distribution. The model in (9) assumes the
probability of error in each wire is independent. As
technology scales, the probability of a single noise source
causing errors in multiple neighbouring wires increases [5,
6]. A more realistic error model including spatial burst
errors should be considered. The error model in (9) can be
extended to include burst errors. In the extended error
model, we still assume that a noise source affects wire d
n
with a probability 1. Instead of only affecting one wire, this
noise source also affects its neighbouring wires with a
certain probability P
n
. For the purpose of simplicity, two
different P
n
values, 10
22
and 1, are used in the simulations.
The case of P
n
equal to 1 refers to the case where a noise
source always causes a three-wire burst error.
Fig. 9 shows the residual it error rate of different error
control schemes as a function of noise voltage deviation at
P
n
1 and 10
22
. A link swing voltage of 1 V is assumed
in model (9). The simulation results show that the
H(71, 64) used in operating mode-(a) has the worst
residual it error rate, because Hamming codes can only
correct one error at a time and simultaneous errors greater
than one will lead to uncorrected errors. Compared to the
BCH(85, 64) code, the product code used in operating
mode-(b) achieves a better residual it error rate, because
the product code can effectively correct multiple random
and burst errors, whereas the BCH code is only good at
correcting multiple random errors. As P
n
increases, the
residual it error rate of the H(71, 64) code and
BCH(85, 64) code decreases because of the higher burst
error probability at larger P
n
. Compared to RS(85, 65), the
product code used in operating mode-(b) has a better error
correction capability, because RS(85, 65) can only correct
multiple errors within two symbols. In NoC links, burst
Table 1 Parameters used for link model
width (mm) 0.31
space (mm) 0.31
thickness (mm) 0.83
height (mm) 0.14
dielectric constant (k) 2.1
Table 2 Number of wires in the link, delay and area
comparison
Error control
scheme
The
number of
wires in
the link
(active/
total)
Decoder
delay (ns)
Codec
area
(mm
2
)
Hamming (71, 64) 71/71 0.53 1550
BCH(85, 64) 85/85 0.63 53547
RS(85, 65) 85/85 0.68 37482
product code 88/88 0.50 7671
proposed
congurable
coding method
(a) 71/88 0.58 8906
(b) 88/88
IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 251261 257
doi: 10.1049/iet-cdt.2008.0130 & The Institution of Engineering and Technology 2010
www.ietdl.org
errors caused by noise and crosstalk can begin at any bit
position of the links. More powerful RS code can be
constructed but with a larger delay and area overhead.
4.3 Power and energy consumption
Fig. 10 shows the energy consumption of the proposed method
at two operating modes. The energy includes encoder, decoder
and link energy consumption. The encoder and decoder energy
is reported by design compiler using TSMC45 nmtechnology
at clock frequency 1 GHz. Clock gating is applied for operating
mode-(a) to reduce the unnecessary energy consumption of the
transmission buffer and component2 decoder of the product
code. The link energy is measured in Cadence Spectre using
Predictive Technology Model 45 nm technology [24] at a
supply voltage of 1 V. The driver of the link is sized to allow a
clock frequency 1 GHz. With the 45 nm global link
parameter in Table 1, the corresponding resistance and
capacitance are 86 V/mm and 218 fF/mm for the link. In
NoC architectures, the link length is the distance between
two switches, which is decided by the tile dimensions. In
mesh- or torus-shaped NoC architectures, the distance
between two switches is generally a few millimetres [25, 26].
Two link lengths, 1 and 3 mm, are examined in the
simulation. The results show that the operating mode-(a)
consumes less codec and link energy compared to operating
mode-(b), if both of the operating modes meet the reliability
requirement. This is because gating techniques is applied and
fewer link wires in operating mode-(a). The results also show
that the link energy dominates the total energy consumption,
as the link length increases. For the 3 mm link length,
operating mode-(b) consumes about 28% more energy than
mode-(a).
The energy consumption of the proposed method is also
compared to the energy consumption of directly using
H(71, 64) code, BCH(85, 64) and RS(85, 64) code. First,
the comparison is performed under a xed residual it error
Figure 9 Residual it error rate of different error control schemes as a function of noise voltage deviation
a P
n
10
22
b P
n
1
Figure 10 Energy comparison of the proposed method for different link lengths
a Link length 1 mm
b Link length 3 mm
258 IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 251261
& The Institution of Engineering and Technology 2010 doi: 10.1049/iet-cdt.2008.0130
www.ietdl.org
rate requirement of 10
210
, shown in Fig. 11. Two noise
environments are considered. For the favourable
environment (s
N
0.06), the proposed method operates in
mode-(a). In the noisy environment (s
N
0.11), the
proposed method switches to operation mode-(b). As the
noise environment worsens, the direct implementation of the
H(71, 64) code requires a higher link swing voltage to meet
the reliability requirement [6, 7, 9, 27]; while the proposed
method can switch to more reliable operating mode-(b). In
the noisy environment (s
N
0.11), the conventional
Hamming implementation requires a 39% increase in the
link swing voltage compared to the proposed method to
achieve the required residual it error rate. The increased
link swing voltage greatly increases the link energy of the
conventional Hamming implementation. Fig. 12 shows
energy consumption of the four error control schemes for
link lengths of 1 and 3 mm. The results show that the
proposed method consumes the least energy in the high-
noise environment by switching to operating mode-(b). The
BCH(85, 64) code consumes the largest energy for a link
length 1 mm because its codec energy is larger than the
other error control schemes. For a 3 mm link in the noisy
environment (s
N
0.11), the proposed method achieves 30
and 25% improvement in energy consumption compared to
the direct implementation of H(71, 64) code and the
BCH(85, 64) code. In the more favourable condition
(s
N
0.06), direct implementation of the H(71, 64) code
consumes the least energy of the compared schemes. By
switching to operating mode-(a) in low-noise environments,
the proposed method consumes 10% more energy than the
H(71, 64) code because of the congurable system overhead.
Compared to BCH(85, 64), mode-(a) of the proposed
method achieves a 40% improvement in energy consumption.
We also evaluate the different requirements of residual it
error rate on the energy consumption. Fig. 13 shows the
energy consumption of different error control schemes at
residual it error rate 10
210
and 10
220
. The energy
consumption is measured at noise condition (s
N
0.06)
with a link length 3 mm. The results show that the high
reliability requirement (residual error rate 10
220
) has the
similar effect of the noisy environment. A 20% higher link
voltage is required for H(71, 64) to meet residual error rate
10
220
increasing the total energy consumption. By switching
to operating mode-(b), the proposed method can achieve 9%
energy reduction compared to directly using H(71, 64).
5 Conclusion
In this paper, an error control scheme combining Hamming
codes with product codes is proposed to address multiple
error correction. The proposed method provides both
reliable and energy efcient on-chip communication for
varied noise environments. To reduce system overhead, the
parity check calculation circuit, syndrome decoder and error
correction circuits are optimised using hardware sharing.
In the proposed scheme, the type of error correction code
can be dynamically selected according to the noise
environment. For a given system reliability requirement
in a noisy environment, the proposed error control scheme
achieves 25% energy reduction compared to a multi-error
correcting BCH code. The proposed method uses a lower
Figure 11 Example of mode switching for a given reliability
requirement
Figure 13 Energy comparison for different residual it error
rate requirements
Figure 12 Energy comparison for different noise
environments and link lengths
IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 251261 259
doi: 10.1049/iet-cdt.2008.0130 & The Institution of Engineering and Technology 2010
www.ietdl.org
swing voltage than a conventional Hamming implementation
to achieve the same reliability in noisy environments,
resulting in a 30% energy reduction. In a low-noise
environment, the proposed method can achieve a 40%
reduction in energy consumption compared to a BCH
code, but has a 10% energy overhead penalty compared to
directly using Hamming codes.
6 References
[1] CONSTANTINESCU C.: Trends and challenges in VLSI circuit
reliability, IEEE Micro, 2003, 23, (4), pp. 1419
[2] LAJOLO M., REORDA M., VIOLANTE M.: Early evaluation of bus
interconnects dependability for system-on-chip designs.
Int. Conf. VLSI Design, 2001, pp. 371376
[3] MAHESHWARI A., BURLESON W., TESSIER R.: Trading off
transient fault tolerance and power consumption in deep
submicron (DSM) VLSI circuits, IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., 2004, 12, (3), pp. 299311
[4] CAIGNET F., BENDHIA S.D., SICARD E.: The challenge of signal
integrity in deep-submicrometer CMOS technology, Proc.
IEEE, 2001, 89, (4), pp. 556573
[5] MICHELI G.DE., BENINI L.: Networks on chips: technology
and tools (Elsevier Inc., 2006)
[6] BERTOZZI D., BENINI L., MICHELI G.DE.: Error control schemes
for on-chip communication links: the energy-reliability
tradeoff, IEEE Trans. Comput.-Aided Des. Integr. Circuits
Syst., 2005, 24, pp. 818831
[7] SRIDHARA S., SHANBHAG N.R.: Coding for system-on-chip
networks: a unied framework, IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., 2005, 13, pp. 655667
[8] MURALI S., THEOCHARIDES T., VIJAYKRISHNAN N., IRWINM.J., BENINI L.,
MICHELI G.DE: Analysis of error recovery schemes for
networks-on-chips, IEEE Des. Test Comput., 2005, 22,
pp. 434442
[9] EJLALI A., AL-HASHIMI B.M., ROSINGER P., MIREMADI S.G.: Joint
consideration of fault-tolerance, energy-efciency and
performance in on-chip networks. Proc. Design,
Automation and Test in Europe Conf. and Exhibition
(DATE07), April 2007, pp. 16
[10] LI L., VIJAYKRISHNAN N., KANDEMIR M., IRWIN M.J.: Adaptive
error protection for energy efciency. Proc. IEEE/ACM
Int. Conf. on Computer-Aided Design (ICCAD03),
November 2003, pp. 27
[11] ROSSI D., ANGELINI P., METRA C.: Congurable error control
scheme for NoC signal integrity. Proc. Int. on Line Testing
Symp. (IOLTS), July 2007, pp. 4348
[12] YU Q., AMPADU P.: Adaptive error control for NoC switch-
to-switch links in a variable noise environment. Proc. IEEE
Int. Symp. on Defect and Fault Tolerance in VLSI System
(DFT08), October 2008, pp. 352360
[13] LEHTONEN T., LILJEBERG P., PLOSILA J.: Online recongurable
self-timed links for fault tolerant NoC. VLSI Design, article
ID 94676:13, April 2007
[14] ZIMMER H., JANTSCH A.: A fault model notation and error-
control scheme for switch-to-switch buses in a network-on-
chip. Proc. CODES-ISSS Conf., October 2003, pp. 188193
[15] GANGULY A., PANDE P.P., BELZER B., GRECU C.: Addressing signal
integrity in networks on chip interconnects through
crosstalk-aware double error correction coding. Proc. IEEE
Computer Society Annual Symp. on VLSI (ISVLSI), May
2007, pp. 317324
[16] LEHTONEN T., LILJEBERG P., PLOSILA J.: Analysis of forward
error correction methods for nanoscale networks-on-chip.
Proc. Second Int. Conf. on Nano-Networks (Nano-Net
2007), September 2007, pp. 15
[17] FU B., AMPADU P.: An energy-efcient multiwire error
control scheme for reliable on-chip interconnects using
Hamming product codes. VLSI Design, doi:10.1155/2008/
109490, article ID 109490, 2008, pp. 114
[18] PYNDIAH R.: Near-optimum decoding of product codes:
Block turbo codes, IEEE Trans. Commun., 1998, 46, (8),
pp. 10031010
[19] LIN S., COSTELLO D.J. JR., MILLER M.J.: Automatic-repeat-
request error-control schemes, IEEE Commun. Mag.,
1984, 22, (12), pp. 517
[20] FU B., AMPADU P.: A dual-mode hybrid ARQ scheme for
energy efcient on-chip interconnects. Proc. Third Int.
Conf. on Nano-Networks (Nano-Net 2008), September
2008, pp. 15
[21] XU S., BENITO I., BURLESON W.: Thermal impacts on NoC
interconnects. Proc. IEEE Int. Symp. on Networks-on-Chip
(NOCS07), May 2007, pp. 220220
[22] SUN F., DEVARAJAN S., ROSE K., ZHANG T.: Design of on-chip
error correction systems for multilevel NOR and NAND
ash memories, IET Circuits Devices Syst., 2007, 1, (3),
pp. 241249
[23] HEGDE R., SHANBHAG N.R.: Towards achieving energy-
efciency in presence of deep submicron noise, IEEE
Trans. Very Large Scale Integr. (VLSI) Syst., 2000, 8,
pp. 379391
[24] Arizona State University: Predictive technology model
[Online], available at: http://www.eas.asu.edu/~ptm/
260 IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 251261
& The Institution of Engineering and Technology 2010 doi: 10.1049/iet-cdt.2008.0130
www.ietdl.org
[25] VANGAL S., HOWARD J., RUHL G., ET AL.: An 80-tile sub-100-W
teraFLOPS processor in 65-nm CMOS, IEEE J. Solid-State
Circuits, 2008, 43, (1), pp. 2941
[26] KIM J.S., TAYLOR M.B., MILLER J., WENTZLAFF D.: Energy
characterization of a tiled architecture processor with
on-chip networks. Proc. Int. Symp. on Low Power
Electronics and Design (ISLPED03), August 2003,
pp. 424427
[27] WORM F., IENNE P., THIRAN P., MICHELI G.D.: A robust self-
calibrating transmission scheme for on-chip networks,
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2005, 13,
(1), pp. 126139
IET Comput. Digit. Tech., 2010, Vol. 4, Iss. 3, pp. 251261 261
doi: 10.1049/iet-cdt.2008.0130 & The Institution of Engineering and Technology 2010
www.ietdl.org