You are on page 1of 6

1

Implementation of Low Power Selective Scan Chain Architecture for VLSI testing
Usha A.; PG Student, Anna University, periuma69@yahoo.co.in Maheswara Venkatesh P., Assistant Professor, Department of Electronics & Communications, Anna University (Trichy), Mahesh_ece@tau.edu.in
a circuit in changing its states due to the charging and discharging of the effective capacitive loads. Dynamic power significantly contributes to total power dissipation. Switching power dissipation is given by PDynamic = CV2ccf, where C is the capacitance of the switching nodes, Vcc is the supply voltage, and f is the effective operating frequency. As the activity of the test input signal is significantly higher than during normal operation, power dissipation can be substantially higher while testing takes place [1], [2]. However, power constraints are usually defined with respect to the normal operational mode. Currently, design techniques are employed to reduce power dissipation during the normal mode of operation [2]. The power constraints that are usually considered during design are much lower than the power consumed during testing [3], thus causing severe reliability problems. Furthermore, the current trend toward VLSI circuit miniaturization prevents the use of dissipating devices to remove excessive heat generated during test [2]. Test time and test data volume. Application time is one of the sources of complexity when testing IP cores as commonly found in SoCs.

Abstract A Selective Trigger Scan architecture, is introduced to address the issues pertaining to Time, power, and data volume while testing System-on-Chip (SoC). This architecture reduces switching activity in the circuit-under-test (CUT) and increases the clock frequency of the scanning process. An auxiliary chain is utilized in this architecture to avoid the large number of transitions to the CUT during the scan-in process, as well as enabling retention of the currently applied test vectors and applying only necessary changes to them. The auxiliary chain shifts in the difference between consecutive test vectors and only the required transitions (referred to as trigger data) are applied to the CUT. Power requirements are substantially reduced; moreover, DFT penalties are reduced because no additional multiplexer is utilized along the scan path. Data reformatting is applied in order to make the proposed architecture amenable to data compression, thus permitting a further reduction in test time. It also permits delay fault testing. Using ISCAS 85 and 89 benchmark circuits, the effectiveness of this architecture for improving SoC test measures (such as power, time, and data volume) is experimentally evaluated and confirmed. Index Terms TermsScan test, test data volume,
test application time, test power, test compression, delay testing.

I. INTRODUCTION

NTELLECTUAL property (IP) cores are commonly used for

1.1 Previous Works


Recent years have seen the development of many techniques for overcoming the aforementioned difficulties in VLSI testing. In this section, some of these works are briefly reviewed. Power reduction. Circuits are often designed to operate in two modes: normal and test modes. The test mode usually dissipates more power than the normal mode, especially if a scan mechanism is employed. During the data scan-in process, the difference between two adjacent bits moves through the scan path due to the shift operation; many floating transitions are then applied to the CUT.
Table 1. Scan cell values during shift operation

designing a System-on-Chip (SoC). Although IP cores can help to reduce the design cycle time, they still pose many challenges when testing is considered. The precomputed test patterns that are provided by core vendors must be applied to each core within the power constraints of the whole SoC. As a system integrator may use a core in different platforms with diverse test mechanisms (whether for on-chip or off-chip implementation), the test mechanism of the core must take into account issues related to data volume, application time, and power consumption during test. Moreover, other models (such as for delay faults) must be considered to improve the overall test quality. A comprehensive solution is very difficult; such a solution requires major changes in different parts of the design as provided by the IP providers. Power and test data volume are especially challenging:

Test power. The increased use of portable computing and wireless communication (together with a growing density
and a higher operational frequency) have made power dissipation an important issue in both the design and test of VLSI circuits. Power consumption in CMOS circuits can be static or dynamic. Dynamic power consists of switching power and short circuit power. Switching power results from the activity of

Fig. 1 - A modification implemented in the scan chain using XOR gates

Table 1 shows this process for shifting the ak vector (ak,0, ak,1, ak,2, ak,3, ak,4) when the ak_1 vector is in the scan chain; (1) can be used to determine the number of shift-in transitions:

Many techniques in the technical literature have been proposed to reduce the number of these transitions for power dissipation management. These techniques can be categorized as follows: transition techniques to reduce the difference between two consecutive test vectors, transition techniques to reduce the effect of the difference between two consecutive bits in the scan chain, transition techniques by partitioning to reduce the effective length of the scan chains, techniques to block transitions in a circuit, scan reordering techniques, and integrated techniques that use two or more of the aforementioned techniques.

If for the majority of the test vectors, a transition occurs between two adjacent bit locations, inserting an inverter between the corresponding two scan cells results in reducing the transition frequency in these bit locations. A transition frequency of 80%, for instance, can be reduced down to 20% through such a modification. Test data distribution within a block therefore determines the effectiveness of implementing a modification that provides a specific transformation; the modification that yields the applied stimuli with the minimum number of transitions is selected for a specific scan chain fragment. The test power reduction technique can be utilized in a similar manner to handle test responses instead, reducing power dissipation during the shift-out of test responses. An intertwined solution, wherein both test vectors and test responses are considered, imposes significant additional challenges as decisions in one domain impact the other in complex ways. We focus only on the reduction of test power dissipation during the shift-in of the test stimuli, in the rest of this work. 1.2 Our Contributions
This work proposes a novel scan architecture that is referred to as the Selective Trigger Scan Architecture (STSA). This scan architecture uses a triggering (enabling) chain in addition to the data registers. Furthermore, triggering chain hardware is designed to take advantage of similar adjacent data for test compression. Instead of shifting new serial data into the data registers, the triggering chain decides where a data flip-flop must toggle or retain its old value. Retaining data causes a small number of transitions at the data register outputs and low power dissipation. Along with test reformatting techniques, this architecture can reduce test time and power. It can also reduce data volume by enabling the application of compression algorithms on its reformatted data. This structure can be used as a core or chip-level design-fortest (DFT) technique. In addition, it is applicable to delay fault testing. When applied at the core level, substantial improvements in power and test time can be achieved by reformatting the precomputed vectors rather than starting with a new set of tests. The rest of this paper is organized as follows: Section 2 describes the proposed scan architecture. Test data reformatting (as required for generating the vectors for the proposed architecture) is

Utilization of the X(N)OR function in a transition-preserving manner necessitates masking the effect of an X(N)OR gate with another one in the subsequent bit location, as illustrated in Figure 1, enabling the examination of test data in small chunks of three bits. The overall impact of such a modification is thus limited to two consecutive transitions either side of the middle cell. As the applied stimulus bits that are to pass through two X(N)OR gates in consecutive locations either remain intact or are negated, such a modification is a transition-preserving one. The local impact of this modification enables the utilization of X(N)OR gates together with inverters for modifying scan chain fragments.

A scan architecture should not add extra inputs compared to a conventional scan approach. A DFT approach must add no delay to the normal operation of the circuit.

explained in Section 3 and 4 describes the test time reduction of the proposed architecture and the application of this architecture to delay fault testing. Sections 5 and 6 report experimental results and conclusions, respectively.

The proposed architecture, shown in Fig. 3, serves two purposes. One is to reduce the activity at the data outputs and the second is to facilitate test data compression. As shown in Fig. 3, this architecture has data registers that contain the test data applied to the CUT, a triggering chain where the test data is shifted in, and a triggering logic circuit with an enabling AND that determines how the test data should be decoded and trigger the data registers. The triggering chain is for reducing the activity at the data register outputs. For this purpose, instead of shifting test vectors into the data registers, triggering data is obtained by formatting test
vectors and shifted into the triggering chain. For example, if the triggering logic has an identity function, the current data register is 00101110, and the new test vector is 01100111, then the triggering chain must contain 01001001, that is, the bitwise difference of the two vectors. This architecture also blocks changes in the test data from being directly applied to the CUT.

TABLE 2 Different Operational Modes of the New Cell

Fig. 2. (a) Necessary transitions between two consecutive test vectors (three transitions). (b) Total number of unnecessary transitions in conventional scan architecture (32 transitions). Fig. 3. Proposed scan-chain architecture

2 THE PROPOSED ARCHITECTURE


We start explanation of our proposed architecture with a simple example. Let us assume that V1 in Fig. 2a is an existing test vector in a scan chain and V2 is the next test vector that must be shifted. Comparing V1 and V2 transitions (Fig. 2a), there are only three differences in their bits that are called necessary transitions. If we were to use a standard scan chain and shift V2 into the scan chain in eight test clocks, with each shift, transitions shown in Fig. 2b would occur. For example, shifting the rightmost 1 of V2 into the scan chain causes five transitions in the eight scan flip-flops. All together, shifting V2 would cause 32 transitions that are called unnecessary transitions. On the other hand, parallel loading V2 directly into our architecture eliminates unnecessary transitions on the input of a CUT. Hence, our scan architecture should eliminate the unnecessary transitions. In addition, the following features should be considered for the proposed scan architecture and DFT method:

As shown in Fig. 4, the DR flip-flops is the main storage cell and contains the vector that must be applied to the CUT. The TR chain provides the data required for selective triggering. Testing starts by resetting the DR chain. The TR chain cell has three modes of operation: Shift, Trigger, and Normal. The Shift and Trigger modes are used for testing, while the Normal mode is used for normal operation of the circuit. Table 1 shows the cell configuration in the different operational modes. In the Shift mode, the Enable signal is low (inactive) and the DR flip-flops remain unchanged. Therefore, the required data can be shifted in the TR chain with no effect on the contents of the DR flip-flops. In the Trigger mode, the Enable signal is high (active) and the multiplexer selects the input connected to the Q output of DR flipflops. If the XOR output is 0, the DR flip-flop value will not change. If the XOR output is 1, the value of the DR flip-flop is inverted. Therefore, in the Trigger mode, a 1 at the XOR output of a cell causes an inversion of the value stored in its DR flip-flop. This is accomplished by storing different values in

4
the TR flip-flops of this neighboring cell (to the left).

cell

and

its

In the normal mode, the TR chain is loaded with a sequence of alternating 1s and 0s (1010...). This activates the outputs of all XORs; by selecting the normal input of the multiplexer and setting the Enable signal to the desired value, each cell performs its normal operation. The loading process of the TR chain with 1010... is performed only once, that is, when the test process is completed and the circuit starts its normal operation. During the test, each new vector is obtained through a vector update cycle.

Fig. 6. Test session state diagram

As the Enable signal can be generated internally, no additional pin is required. This signal can be easily generated through a pulse after receiving n clock cycles. Moreover, the Select signal of the MUX does not require an additional pin; the same signal in conventional scan chains can be used to determine the Test and Normal modes. In the Test
Fig. 4. Scan cell structure of the proposed scan architecture.

The timing diagram for a test vector update cycle is shown in Fig. 5. In the Shift mode, the trigger data is shifted into the TR chain; this requires n scan clocks. After n clocks in the Shift mode, the cell enters the Trigger mode for a single

mode, the Select signal selects the 1 input of the MUX (from the Q output), while, in the normal mode, it selects the other input (which directs the Normal Input to the input of the DR flip-flop). Therefore, the proposed STSA requires no additional test pin compared to conventional scan structures. The output response is not captured in the scan chain. Significant improvements to SoC testing are achieved using the proposed architecture; these improvements are described in more detail here.

Fig. 5. Timing diagram of a test vector update cycle

clock. In this mode (based on the TR chain data), some of the DR flip-flops invert their values to obtain the new vector. A test vector is reconstructed in a single test update cycle (Fig. 6 shows this process).

Power reduction. One of the main features of the proposed architecture is to prevent unnecessary transitions from affecting the CUT by altering a conventional scan chain. As the required transitions are only a small portion of the total transitions made during scanning, a reduction in transitions will affect power consumption. This is accomplished using the so-called trigger data; in the proposed architecture, trigger data is transferred from the TR chain because its transitions have a less significant effect on power dissipation.

3 TEST REFORMATTING

Test vectors should be reformatted for use in the proposed architecture and to generate the original vectors at the inputs of the CUT (that is, the DR outputs of the scan cells). The process of reformatting test vectors consists of the following steps.

3.1 Test Vector Reordering In the first step, test vectors are reordered to further reduce the number of transitions in the DR chain. In the reordering

5 process, similar vectors are placed next to each other to reduce the number of transitions between consecutive test vectors, hence also reducing the number of total transitions resulting from the entire test set. This technique is usually used in data compression to reduce the number of 1s in the difference vectors. The Hamming distance is used as a measure of transition activity to estimate power consumption. A complete undirected graph is generated with test vectors as nodes. Then, the Hamming distance between each pair of vectors is assigned as the weight of the edge connecting them. The solution to this problem consists of finding a path that traverses all nodes with minimum overall weight. 3.2 Extracting the Difference Vectors
The second step extracts the difference vectors from the reordered vectors. The difference vectors show the positions in which two consecutive vectors differ. Dk denotes the difference vector of two vectors Vk_1 and Vk and can be easily calculated as

4.2 Delay Faults Delay faults are those faults that affect the timing of the circuit without changing its logic operation. In a delay fault, the traversal of one or more paths (not necessarily the critical path) exceeds the clock period. Testing a delay fault requires placing the appropriate transition at the input of the path and appropriately setting the required off-path inputs of those gates located on the path under test. Moreover, a circuit should be clocked at-speed after the application of each vector. The transition on the circuit inputs requires the application of different vectors at two consecutive clocks. The proposed scan architecture not only allows a reduction of data volume, time, and power, it also does not increase the delay during normal operation when testing for delay faults.

D0 denotes the difference between the initial state of the scan chain and the first test vector (V0). Difference vectors show where transitions are required to produce the desired test vectors, that is, the positions of the scan chain in which the DR flip-flops should be enabled in the Trigger mode to invert their values. The Figure 7 shows the process of converting the original test vectors into new vector. Each vertical arrow indicates the conversion of a 1 in the difference vector to a transition in the TR-chain scan data.

Fig. 7 Generating TR chain vectors from reordered test vectors

4 TEST TIME REDUCTION AND DELAY FAULT


4.1 Low Test Application Time
The proposed architecture avoids the use of multiplexers (MUX) in the scan path of the TR chain. The absence of these MUXs from the critical scan path leads to a higher scan clock frequency in the Shift mode, hence reducing the total test time. During the shift cycle, the output of the DR flip-flop and the select input of the MUX do not change. The delay of the MUX has no impact on the clock frequency. Therefore, the proposed architecture adds no delay to the normal operation of the circuit compared with a conventional scan arrangement
Fig. 8 Timing diagram of delay testing with the proposed scan architecture.

5 EXPERIMENTAL RESULTS
To validate the proposed technique, experiment is performed son ISCAS89 benchmark circuit S27. The S27 benchmark circuit is shown below. The total leakage and total power was calculated for the circuit shown in figure 9.

6 Leakage Power (mW) Power (mW) Frequency (MHz) 3.12 121 426 2.35 92 160

The experimental results suggest that power has reduced from 121 mW to 92 mW and frequency has reduced from 426 MHz to 160 MHz.
REFERENCES [1] Y. Zorian, A Distributed BIST Control Scheme for Complex VLSI Devices, Proc. IEEE VLSI Test Symp., pp. 4-9, 1993. [2] S. Wang and S.K. Gupta, ATPG for Heat Dissipation Minimization during Scan Testing, Proc. IEEE Design Automation Conf., pp. 614-619, 1997. [3] A. Wang and S. Gupta, LT-RTPG: A New Test-Per-Scan BIST TPG for Low Heat Dissipation, Proc. IEEE Intl Test Conf., pp. 85- 94, 1999. [4] S. Sharifi, J. Jaffari, M. Hosseinabady, Z. Navabi, and A. AfzaliKusha, Simultaneous Reduction of Dynamic and Static Power in Scan Structures, Proc. Design, Automation and Test in Europe, vol. 2, pp. 846-851, 2005. [5] O. Sinanoglu, I. Bayraktaroglu, and A. Orailoglu, Test Power Reduction through Minimization of Scan Chain Transitions, Proc. 20th IEEE VLSI Test Symp., pp. 166-171, 2002. [6] Mohammad Hosseinabady, Shervin Sharifi, Fabrizio ombardi, Senior Member, IEEE, and Zainalabedin Navabi, Senior Member, IEEE A Selective Trigger Scan Architecture for VLSI Testing, IEEE Transactions on Computers, Vol. 57, No. 3, March 2008

Fig 9 S27 Benchmark Circuit Diagram Various power dissipations for different circuits were tested and dynamic and static power improvements were shown in the Table 4 Table 4 Power dissipation
Improvement Compared Traditional Scan (%) Dynamic Static 44.82 14.65 62.90 11.46 69.44 17.00 2.92 4.11 68.80 17.10 71.06 21.23 18.61 17.09 18.64 20.70 75.77 9.02 9.52 7.12 98.68 3.82 98.95 5.80 with

6 CONCLUSIONS
This paper has proposed a novel scan architecture that considerably improves SoC testing in terms of power and time. Power consumption is reduced during the test process by preventing transitions in the scan chain from spreading into the CUT. With fault Without fault