You are on page 1of 14

878 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO.

5, MAY 2012

Single Cycle Access Structure for Logic Test


Tobias Strauch

AbstractThis paper proposes a new single cycle access test chains with BIST and further proposes solutions to reduce
structure for logic test. It eliminates the peak power consumption power by using a controller to allow a given scan chain to
problem of conventional shift-based scan chains and reduces the be driven by a power-aware EDT-decompressor in [2] or to
activity during shift and capture cycles. This leads to more real-
istic circuit behavior during stuck-at and at-speed tests. It enables reduce transitions in scan chains by using a ring generator in
the complete test to run at much higher frequencies equal or [3]. However, the number of test cycles can still be reasonable
close to the one in functional mode. It will be shown, that a lesser improved and parallel shift scan chains generate critical peak
number of test cycles can be achieved compared to other published power so that each shift cycle must be slowed down.
solutions. The test cycles per net based on a simple test pattern The aspect test time can be reduced by optimizing the test
generator algorithm without test pattern compression is below 1
for larger designs and is independent of the design size. Results are pattern. Pomeranz et al. [4] demonstrate this by using limited
compared to other published solutions on ISCAS89 netlists. The scan operations and transfer sequences. Chen et al. [5] combine
structure allows an additional on-chip debugging signal visibility different pattern optimization techniques and a clock disabling
for each register. The method is backward compatible to full scan scheme to further reduce switching activity. Test pattern opti-
designs and existing test pattern generators and simulators can mization to reduce power consumption during test is another
be used with a minor enhancement. It is shown how to combine
the proposed solution with built-in self test (BIST) and massive aspect. A test pattern generation technique, which concentrates
parallel scan chains. solely on minimizing switching activity during scan test by as-
signing optimized values to dont care bits to limit transactions
Index TermsAt-speed testing, low-power testing, on-chip
signal visibility, switching activity during test, test-time reduction. is shown by Wang et al. in [6]. Almukhaizim et al. [7] pro-
pose dynamic scan chain partitioning and inserting delays in the
clock tree. Al-Yamini et al. [8] use segmented addressable scan
I. INTRODUCTION and disabling complete subtrees of the clock to solve the power
consumption problem. Lin et al. [9] reduce power consumption
during shift by a multilayer data copy scheme. However, most
HE production test costs of chips become more and more
T dominant. The standard shift scan (SS) method is the most
popular test implementation within the last decades. It has been
of these methods require a large computational effort and are
therefore not applicable for multimillion gate designs or do not
simultaneous reduce switching activity and test time.
tried to improve this approach in terms of test time, test data Another critical aspect of SS implementations is at-speed
volume and test power by optimizing the scan pattern, using testing. The high peak power during shift leads to an excessive
different scan chain structures, different scan support logic, or a current due to high switching activity, which can lead to a
combination of these modifications. miss-classification of the circuit under test (CUT). This is
Automatic test pattern generation (ATPG) for sequential demonstrated by Sde-Paz et al. in [10]. The pattern reduction
VLSI circuits is an NP-complete problem with an exponential for at-speed tests is proposed by Pomeranz et al. in [11] by test
complexity. The complexity of combinatorial logic varies. Less compaction based on non-scan test sequences and the removal
complex logic is tested within a few capture cycles, generating of transfer sequences. The problem of a slow global scan enable
an immense number of dont cares during the rest of the test, signal in SS is discussed by Ahmed et al. in [12] and solved
even when test compression methods are used. Complex and by implementing a pipelined global scan enable tree. However,
hard to test logic needs to be stimulated and captured quite none of the methods fundamentally solve the problem of high
often but the pattern need to be shifted throughout the complete switching activity, a high number of test cycles and a slow
scan chain. One approach to reduce test time is to use parallel global scan enable signal simultaneously.
scan chain. This leads to a massive increase of parallel scan The aspects test power, test data volume and test time can
chains to reduce the length of the scan chains. In order to further be simultaneously reduced with a modified hardware structure
reduce test data volume, a built-in-self-test (BIST) mechanism known as random access scan (RAS). The enhanced hardware
is used. One example is embedded deterministic test (EDT) allows the read and write of selected registers or set of selected
proposed by Rajski et al. [1], which combines parallel scan registers, which reduces the power problems during shift and
test time. Based on the initial idea in 1980 by Ando [13], three
major groups of RAS schemes can be found.
Manuscript received September 13, 2010; revised December 14, 2010 and
February 10, 2011; accepted February 28, 2011. Date of publication April 21,
The first group uses basically one single address-decoder to
2011; date of current version April 06, 2012. select each individual register in the design and an additional el-
The author is with R&D, EDAptability e.K., Munich 80538, Germany ement (multiplexer, MUX) per register cell enables a hold mode
(e-mail: tobias@edaptability.com). of each register. Baik et al. [14] show how test power, test data
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. volume and test application time can be reduced. Lin et al. [15]
Digital Object Identifier 10.1109/TVLSI.2011.2134875 present a two phase approach how to optimize the pattern based

1063-8210/$26.00 2011 IEEE


STRAUCH: SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST 879

on bit flipping. A test pattern generation scheme based on seg- implementation can improve the bring-up-time and in-system
ment fixing counter reseeding is demonstrated in [16]. However, tests.
the fact that each cell needs to be accessed individually gener-
ates an unaffordable routing overhead. II. CONTRIBUTION AND PAPER ORGANIZATION
The second group addresses each register individually by
This paper presents a novel scan cell register for logic tests
using an (or row) and a (or column) address decoder with
combined with a novel scan cell routing architecture. The struc-
an additional combinatorial element (for instance AND gate)
ture allows a single cycle access (SCA) to individual register
per register cell. The cell values can be individually read by an
sets. This access scheme is fundamentally different to SS. It can
additional signal driven by a tristate logic added to each register
be compared to a memory with single cycle synchronous write
cell. Hu et al. propose a variable-to-fixed run-length decoding
and asynchronous read functionality, whereas the remaining
technique applied on RAS in [17] and a clustered RAS structure
memory content (registers) does not change. Unlike with a cer-
in [18]. A modified T-Flip-Flop is shown in [19] to allow the
tain number of shift cycles in shift-scan designs, the values can
overlap of the test response read out with the loading of the next
be read and written within one single cycle. It will be shown,
test input patterns within the same memory addressing cycle. A
that this method can easily be integrated in todays standard
very similar modified scan register is proposed by Mudlapur et
flows. The structure needs less test cycles to reach a certain or
al. in [20] to reduce area overhead of the RAS. However, these
full coverage and the power consumption during tests is in the
proposals have in common, that the - and -line select routing
range of the one in functional mode. This allows higher test
is unaffordable, the individual register cells are enhanced by
frequencies and leads to more realistic test conditions closer to
multiple logic elements which generates an unaffordable area
the functional chip behavior during stuck-at and at-speed tests.
overhead and the readout is done using tristate logic.
The proposed structure is applicable for pattern driven tests
The third group uses a row decoder and a column decoder to
and for BIST. The paper provides reasonable data but is not
address individual registers. Additionally the read/write mech-
limited to a frozen solution. It also discusses various trade-offs
anism is enhanced with two signals per column, driven by a
of different alternatives. Logic test is a wide field and different
tristate driver, connected to the internal latch cells of the reg-
users have different preferences. A reference example based on
isters via tristate logic and an individual sense amplifier per
992 registers is used and should guide through the paper.
column. Different variations are discussed. Hu et al. propose
This paper is structured as follows. In Section III, SCAh-
a single read/write signal in [21] called localized random ac-
Structure with Hold Mode the single cycle access test structure
cess scan. Saluja et al. [22] take advantage of basis vectors and
is explained. The feasibility, area, test cycles, power consump-
linear algebra to further significantly optimize test application in
tion, and debugging capabilities of this solution is compared to
RAS by performing the write operations on multiple bits con-
alternative state-of-the-art methods. In Section IV, SCA-Struc-
secutively. They also propose partitioned grid random access
ture without Hold Mode demonstrates further solutions to over-
scan in [23], progressive random scan in [24] and further mini-
come the area disadvantage of the proposed method. The ad-
mize test application time in [25]. Based on this, Voyiatzis et al.
vantages of the SCAh-structure and the lower area overhead of
present an output response compaction scheme which results in
the SCA-structure are combined and presented in Section V,
lower hardware overhead, while at the same time eliminates the
Gated SCA-Structure. Sections VI, Running Page Tests in
problem of unknown values in [26]. Baik et al. [27] enhance the
Parallel and VII, Address Controlled BIST, discuss solutions
register with a latch structure to test for path-delay faults. How-
which can be applied to todays test requirements. The num-
ever, next to the routing and area overhead compared to standard
bers in Section VIII, Results, demonstrate the advantage of
scan approaches, the enhanced read and write mechanism with
the lower test cycles per net resulting from the proposed solu-
tristate drivers, cell internal tristate logic, and sense amplifier
tion and Sections IX, Discussion, and X, Conclusion, finish
per column is very timing sensitive. This massive use of tristate
the paper.
logic connected to internal register cell-nets and sense ampli-
fiers generate timing critical signal slopes and is not easy to in-
tegrate in todays static timing analysis flows for multimillion III. SCAh-STRUCTURE WITH HOLD MODE
gate designs. Further on, launch-on-shift (LOS) based at-speed
testing is not possible for this group of RAS implementations. A. SCAh-FF
Built-in-self-test (BIST) is a solution to reduce test data The key element of the single cycle access structure with hold
volume and can further on reduce the test access pins for the mode (SCAhS) is the signal cycle access register (Flip-Flop,
CUT dramatically. The embedded logic test (EDT, [1]) method FF) with hold mode (SCAh-FF). It is based on a standard scan
is a well established method. BIST based on a RAS is examined register (S-FF) and uses two more 2-to-1 multiplexers. The new
by Yao et al. in [28]. A new test implementation must therefore SCAh-FF can be seen in Fig. 1.
be useable in a BIST environment. The SCAh-FF has one more input and one more output com-
Next to the aspects already mentioned, the debug capabilities pared to the standard shift register (S-FF). The inputs clock
of chips can have an impact on the bring-up-time and in-system {clk}, data-in {di}, and scan-in {si} still exists. The scan-enable
tests. Some techniques combine the test structure with debug is now a 2 bit bus {se[0:1]}. An additional scan output pin {so}
features, as shown in commercial available products [29]. How- is added. The reset input and inverse output pins are not shown.
ever, additional features for debugging provided by a new test The internal logic enables the register to run in one additional
880 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 5, MAY 2012

Fig. 1. SCAh-FF based on an S-FF.

TABLE I
TRUTH TABLE OF SCAh-FF

Fig. 2. SCAhS connectivity.

1) When { } is low and { } is 0, the design works in


normal functional mode (Table I, line 2).
hold mode, whereas the additional output multiplexer can by- 2) If a specific address is given (asynchronous read), the reg-
pass the register to directly drive the value of {si}. The resulting ister values on the selected line are passed to the scan-out
functionality is best explained by a truth table (see Table I). bus { }. This mode is called asynchronous read mode
In functional mode , the register cap- (Table I, line 3).
tures and { } follows { } (usually stable). In read mode 3) When { } is high and { } is 0 (no line selected), the
{ } has the value of { } so that { } can be read out asyn- design remains in hold mode and no register value changes
chronously. In the event of the relevant clock edge, the register during an clock edge (Table I, line 4).
captures { }. In hold mode, { } follows { }, and the reg- 4) If a specific address is given at a relevant clock edge
ister remains in the state { }, capturing its own value. When and { } is high, the scan-in values { } are captured
, the registers captures { } and { } by the registers on the selected line (synchronous write)
changes to the new value of { } (sync. write/read mode). and scan-out { } is driven by the captured register
The slave latch of a FF is usually connected to the output value (read). This mode is called synchronous write/read
driver of the data-out pin and/or an inverting driver for the (Table I, line 5).
inverse-data-out pin. The internal multiplexer for the SCAh-FF The structure is backward compatible to known shift scan
(shown in Fig. 1) can also be driven by this slave latch output. operations if { } is set to 1 at the beginning and automatically
The fan-out number of the data-out pin (or inverse-data-out incremented after each shift cycle. The shift-in data can be
pin) refers to the number of input pins which are driven by the written continuously throughout the scan area and the scan-out
SCAh-FF data output drivers. data can be read at the same time. A capture cycle can also be
applied to all registers at the same clock edge (functional mode).
B. SCAh-FF Connectivity The setup time of the SCAh-FF equals the one of the S-FF,
Fig. 2 shows the SCAh-FF and its connectivity. The two because no additional logic is added to the relevant timing path
major differences are, that the scan-in { } is now connected to through { }. The fan-out of { } is reduced by one because
a dedicated scan-out { } of the preceding register in the scan { } does not drive the { } of the succeeding register, which
chain and the register { } inputs on the same scan depth are is usually the case in SS. The new scan-out { } drives the { }
connected to the same line-select { } signal, which is driven of the succeeding register in the scan chain and has a constant
by a 1 out of decoder. SCAh-FF connected to the same fan-out of 1. The scan chain is decoupled from the functional
line-select signal are considered to be on one line. If { } is logic.
0, no line is selected. { } of each SCAh-FF is connected to
C. SCAh-FF Page Organization
the global scan enable signal { } (comparable to the global
scan enable signal of shift-scan structures). The output of the The SCAhS enables single cycle read/write accesses to the
address decoder is connected to the { } pin of the registers individual register line. The test structure is now organized in
on one particular line. Instead of shifting the data through the pages to achieve a read/write access at design speed or at a
scan chain, all registers on the same scan depth, enabled by reasonable test speed. The page depth equals the scan chain
the same line-select signal can be read or written with a single depth (SD = number of SCAh-FF connected to one chain on
cycle access. Additionally, unselected registers remain in hold one page). Assuming it is 31. Multiplied with the scan width
mode. (SW = number of scan chains on one page, for instance 32), the
From this structure four different kinds of cycle modes result. resulting number of SCAh-FF is 992 per page.
STRAUCH: SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST 881

In this rather extreme compact case, the page uses a global TABLE II
1-out-of-31 address line decoder. A page selector { } selects CELLS AND AREA OF CORES WITHOUT TEST INSERTION
the individual page and drives the scan input bus signals and
line select { } signals (AND-ed) only of this particular page.
{ } can be driven by a register which is set by a dedicated
test control logic. If not selected, the page remains inactive to
reduce activity. The scan output buses of all pages { } are
bit-wise XOR-ed with the { } of other pages to generate the
global scan-out bus { }. If the page is inactive, the XOR-tree
passes the value of previous pages unchanged since all { } bits
of an unselected page are 0.
With the page organization, the relevant timing paths become
clear. During a read, the registers are selected by the line-se-
lect signal and drive the scan-out bus { } through a multi-
plexer chain of the succeeding registers and the page-scan-out TABLE III
bus { } through the XOR-tree. During a write, the scan-in bus AREA AND CELL PINS PER LOGIC UNIT OF CORES WITH TEST INSERTION
{ } values are passed through the AND-selector and the multi-
plexer chain of the trailing register to the registers of the selected
line.
In order to achieve a high test speed, the test implementation
can be pipelined. The scan-in bus { } and the line-select { }
outputs of the global address-decoder can be registered. Also
the XOR-tree can be pipelined with buried register sets. For eight
pages a logic depth of three XOR-cells can be reached. If an op-
timal test speed cannot be achieved, the scan-depth SD can be
reduced (to any number). It is important to notice, that there is no
timing path between adjacent registers on the scan chain during
test mode ({ } { }). Therefore, no hold time problems ex-
ists, which are known from shift-scan-test, and no buffers must
be inserted for hold time fixes.
E. Area
The areas of various cores (see Table II) with the standard
D. Feasibility scan implementation and the areas of the cores with the pro-
posed structure are compared (see Table III) using the lsi10k
It can be assumed, that in todays standard flows, the chip is library. The cores are processors (CPU, OR1200), a DMA-core
designed without scan test insertion. The test structure is imple- and peripherals (AES, ETHER, PCI). They are taken from [30].
mented during the place and route (P&R) step. At that time, the For the calculation of the standard shift (SS) area, each reg-
standard registers (FF) are replaced with scan FF (S-FF) and the ister (FF) is replaced with the corresponding scan FF (S-FF). A
additional routing for the scan-in and scan-enable pins is done. FF with an area of nine logic units (lu) is replace with a S-FF
Supporting test logic (as in EDT, [1]) needs a parallel synthesis of 11 logic units as defined in the lsi10k library. The two addi-
step, but this task can be considered as unproblematic in todays tional pins and the 2-to-1 multiplexer result in an area difference
flows. of two logic units. The resulting core area includes a buffered
The flow for the proposed structure differs only slightly from scan-enable tree and a simple XOR-tree for scan-out decompres-
the one of SS. The standard FF is replaced by an SCAh-FF (in- sion and is listed in of Table III. Buffers for hold time
stead of an S-FF). The global scan-enable signal is identical. fixes of the scan chain are not considered.
The scan connection between registers is now done between the The page support area of an SCAh-FF based implementation
scan-in and the dedicated scan-out of the predecessor (compared for each core is listed in of Table III. This includes
to the data-out pin at the standard flow). The address wires of the XOR-tree and the two AND-selectors for scan-in and line-se-
the individual register lines must be routed from the address de- lect. Additionally one buffer per six registers is added for each
coder. The support logic such as the address decoder can be syn- line-select signal. The area for an SCAh-FF is set to 14 logic
thesized in a parallel task, but since they have are very regular units. Compared to an S-FF it has two more pins and two more
structure, they can also be elaborated during the test insertion 2-to-1 MUX, which results in an area difference of three logic
step. A scan reordering during the P&R step can be done without units. The calculations consider the buffered scan-enable tree
limitations within one page. for SCAhS and SS. As can be seen in Table III, the resulting
The proposed structure can therefore be implemented with area generates an area overhead of 33.78% compared
acceptable modifications to state-of-the-art P&R tools and is to non-test area ( ) and 17.27% compared to
feasible for todays standard design flows with standard STA the SS area ( ) of the core logic. It does not con-
tools. No tristate logic is used. sider memories, MBIST-logic, power-wells, spare-cells and pad
882 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 5, MAY 2012

Fig. 4. S-FF outputs toggle at clock edge.

Fig. 5. SCAh-FF chain, propagated scan data toggle, and selected register
output toggle at clock edge.

number of capture cycles (random fill) can follow with an op-


timal set of asynchronous reads.
In order to compare the effectiveness of the test structure, the
Fig. 3. SCAh-page, global scan enable not shown. indicator test cycles per net [TCPN(COV)] for a certain cov-
erage (COV) is introduced. It is the number of test cycles di-
vided by the number of nets (NETS) of the given netlist [see
logic. The number of cell pins per logic unit decreases from 1.25 (1)]. If COV is not mentioned it is defined as 100%
to 1.18, indicating less congestion than in an SS-based design. (1)
An average of 1375 registers (see Table II) generate an average
SCAhS area of 33.150 lu (see Table III). It is assumed, that a This indicator can be used to compare different test methods
page with 992 registers has an estimated average area of 23.916 and algorithms in terms of number of test cycles for equal design
lu. Adding a standard register (9 lu) to a page increases the area sizes. More important, it shows if and to which extent a test
by 0.037%. An example is shown in Fig. 3. method is dependent on the design size. If the TCPN on small
designs is low but increases dramatically when the design size
F. Test Cycles and TPG starts getting bigger than 10 k nets, it is questionable if the test is
useful for multimillion gate designs considering the often very
The number of test cycles (TC) to achieve a reasonable high computation time which comes with optimized TS.
stuck-at coverage (COV) is another criteria to compare var- In this paper, a simple algorithm is used to generate the TS for
ious test implementation structures. Assuming the method of the SCAhS. An existing TS is used, which has 100% coverage
shifting data through the scan chain followed by a capture and is based on a full-scan structure. To generate the reduced
cycle is used. Then for each capture cycle the test cycles are TS for the SCAhS, the test pattern simulator needs to be mod-
determined by the full scan chain length and therefore define ified. When simulating the TS, the stuck-at coverage is propa-
the test set (TS). Trivial logic can be tested very easily but it gated through the combinatorial logic. In the modified version,
generates a huge amount of dont cares for the remaining test the cone input indexes of the relevant logic cones must be prop-
sets. Someone can argue, that the most complicated and hard agated as well, if they are relevant for the fault propagation. The
to test combinatorial logic determines the number of TS, even cone inputs can be sequential elements or design inputs. Only if
when compression methods are used. their values are relevant (cares) and if their values have changed
The proposed structure has the advantage of a single cycle since the last capture cycle, they are added to the new TS. The
write and a single cycle read of selected register lines. This al- simulator dumps then the test pattern for the write and read cy-
lows the test method to concentrate on uncovered areas of the cles as reduced TS for the SCAhS. Additional capture cycles
combinatorial logic. The number of dont cares and test cycles can be simulated to check if further coverage can be read. TS
are dramatically reduced. which do not add coverage can be skipped.
Another aspect of shift based testing is, that the registers are Due to the single cycle access structure, the test cycles of an
overwritten at each scan shift by default. This means, each TS SCAh-FF-based test implementation are always equal or lower
can be seen independently. Applying multiple consecutive cap- compared to a full shift scan-based technique.
ture cycles results in the loss of coverage of the trailing cycles,
if the coverage is not propagated. G. Peak Power
The proposed structure can apply multiple capture cycles, The peak power consumption is another serious problem
since it can asynchronously read register lines in-between them. during shift scan. When data is shifted through the scan chain,
This means, while the logic remains in a hold state after a cap- the output of the register toggle with a certain probability (see
ture cycle (by stopping the clock or setting the registers in hold Fig. 4). This signal change is then propagated through the ad-
mode) the test procedure can read multiple lines asynchronously jacent logic. At each clock edge, the power consumption rises
without modifying the register content. Additionally, an optimal therefore to a level, which is much higher than in functional
STRAUCH: SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST 883

TABLE IV TABLE V
ACTIVITY OVERVIEW OF VARIOUS CORES ACTIVITY RELATIONS

mode and has the potential to destroy the device due to high
current and heat generation. This problem is discussed in the
literature. For instance Almukhaizim et al. [7] propose a peak Table V sets the activities listed in Table IV in relation. The
shift power reduction technique via dynamically partitioning maximum and average activities of the SCAhS is compared to
the scan chains into multiple groups. As Sde-Paz et al. [10] the equivalent numbers in function mode and shift-scan-test.
show, the peak power consumption during shift has a reasonable For both test implementations a random TS is simulated.
impact on the capture cycle for at-speed testing. The excessive The SCAhS has only a 164% maximum activity compared
current during shift edge leads to an excessive IR-drop, an un- to functional mode, and a 358% average activity during test.
realistic delay measurement and the CUTs miss-classification. The ETHER-core handles a serial datastream and is therefore
In the proposed SCAhS, the scan-in data and the scan-out relative inactive in functional mode. If this core is not consid-
data toggle asynchronously through a multiplexer chain and ered, the activity during SCAhS based tests is only 197% of
are therefore naturally propagated throughout the complete test the one in functional mode. The maximum activity compared
cycle (see Fig. 5). Only a certain number of signals potentially to SS is only 18% and 12% in average. A circuit with an
change at a time which is equal to the number of SW (for SCAhS generates much less activity during stuck-at test than
instance 32). The scan data is decoupled from the functional an S-FF-based implementation. If the average or maximum
logic. Only in case of a write the registers on the selected line activity is problematic, the test speed can be reduced compared
potentially change their output value, which is then propagated to functional mode or the TS pattern can be optimized.
only to the adjacent logic of the relevant registers. For a cap- If multiple pages are tested at the same time, the activity can
ture cycle, only some register inputs have changed since the be scheduled with the page selection logic. During write cycles
last capture cycle so that the activity can be limited easily. the order of scan-in cycles can be optimized to reduce activity.
In functional mode, the scan chain does not toggle. It can Since the average activity of the SCAhS is only 12% (or 18%
therefore be assumed, that the peak power is not higher than maximal) of the shift based design, a larger area can be tested at
in functional mode. It is therefore impossible to destroy or the same time during BIST.
harm the circuit due to peak power consumption and the circuit The shift-scan test has a reasonable impact on the capture
behavior is closer to the functional behavior during test. This cycle for at-speed testing and on the behavior of the design itself
can be achieved without any major computational effort during [10]. A reasonable lower activity of the test logic leads to more
pattern optimizations. realistic functional chip behavior during test.

H. Power Consumption I. Debugging

The power consumption of a digital sequential circuit can be In a tester environment, the CUT can be stimulated with con-
simplified in the sense, that the power consumption is directly trolled clock signals. Most tests can be repeated cycle accurate.
related to the activity (ACT) on the chip (zero delay model). In To view internal register values of a S-FF-based test insertion in
this paper the activity is defined as the number of nets in a netlist, normal operation (at-speed), the test is stopped after a defined
which have changed at the end of a clock cycle compared to the number of clock cycles and the register values are shifted out
end of the previous cycle. (one test per cycle). This procedure is repeated by adding one
In order to compare the different power consumptions of the cycle at a time [29]. The SCAhS supports this procedure.
standard shift scan test implementation and the SCAhS, the ac- With the SCAhS one particular line can be selected and con-
tivity of both solutions is measured during a netlist simulation on tinuously read out (one test per line). SW register values can
various cores. Also the activity of the cores in functional mode directly be streamed out during one test. In other words, the
is extracted. Fig. 6 shows an example. Table IV lists the max- SCAhS gives the same debug visibility as the shift structure,
imum and average activity in functional mode, SCAh-test, and but allows the user to concentrate on selected signals when de-
shift-scan-test of the cores. bugging extensive tests without stopping the test run.
884 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 5, MAY 2012

Fig. 7. SCA-FF based on an S-FF.

TABLE VII
AREA OF CORES WITH TEST INSERTION
Fig. 6. Activity of PCI core.

TABLE VI
TRUTH TABLE OF SCA-FF

This becomes a major advantage during functional debugging


of a chip embedded in a system or board, which is even more
challenging. Almost all cases do not allow to rerun scenarios
cycle accurate. Additionally, the failure could possibly appear
only after a very long run-time. If the S-FF based chip fails on a B. Area
board or within a system and must be debugged, register values
can be shifted out (snap-shot) but hardly accumulated to a con- The area of the SCAS is calculated in Table VII and com-
tinues waveform. The SCAhS has the same snap-shot capability pared to a standard S-FF based implementation. The area of a
but additionally allows a continuous at-speed readout of one par- register (FF) is nine logic unit. The corresponding area of a S-FF
ticular register line. This feature is sometimes embedded in the is 11 logic units and the area of a SCA-FF is set to 13 logic
functionality (processor debugging) but is now already imple- units. The support area of the SCAS equals the one
mented in the SCAhS and works for any set of SW registers. of a SCAhS without the buffered scan-enable tree. The SCAS
The continuously data stream can then be used by on-/off-chip area is calculated and the area overhead compared to
trigger units or stored in waveform files. non-test is 26.02% ( ) and compared to a S-FF
The asynchronous read feature of the SCAhS can become part based implementation ( ) in average 10.51%. The
of the chips functionality. The SCAhS organizes the registers cell pins per logic unit index drops from 1.25 for the SS design
like a memory which can be read by a processor for instance. to 1.12 for the SCAS. An average of 1375 registers (see Table II)
Also write cycles can be done if the remaining registers of a generate an average SCAS area of 31.316 lu (see Table VII). It
page accept a hold cycle by selecting page-select { }. is assumed, that a page with 992 registers has an estimated av-
The asynchronous continuously read feature of the SCAhS erage area of 22.593 lu. Adding a standard register (9 lu) to a
can already be implemented in a field-programmable gate array page increases the area by 0.039%.
(FPGA) debugging tool used for system prototyping. The ex-
C. Test Cycles and TPG
perienced gained with register streaming during system proto-
typing can be shared with the test group. The SCAS basically has the advantages of a single cycle syn-
chronous write and a single cycle asynchronous read of a reg-
IV. SCA-STRUCTURE WITHOUT HOLD MODE ister line. The pSCAhS and the SCAS have in common, that a
majority (pSCAhS) or all (SCAS) of the registers do not have
A. SCA-FF a hold mode and automatically capture the new value if a write
In order to reduce the area overhead of a SCAhS, a simpler to a register line is done. This fact makes the TPG an extremely
SCA-FF is discussed. It adds only one 2-to-1 MUX to the stan- complex task. Therefore, the TCPN for the SCAS are purely
dard S-FF (see Fig. 7). The truth table is shown in Table VI. It based on random pattern generation and does not always reach
only has one { } input, which is connected to the individual 100% coverage.
line-select signal. The pin which is connected to the global en-
able signal in the SCAhS is removed, so that the complete global D. Debugging
scan enable tree becomes obsolete. The SCA-structure (SCAS) One aspect for debugging is particular important. How many
connectivity and page organization equals the one of the SCAhS if not all register values can be directly probed and read out.
(see Figs. 2 and 3) without the global scan enable { }. Usually a scenario is created by running a test in functional
STRAUCH: SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST 885

TABLE VIII
TRUTH TABLE OF GATED CLOCK LOGIC

SS. One additional standard register increases the page area by


approximately .
Serious area improvements can be made, if the SCA-FF (13
lu) uses an inverting MUX (SCA-FF-I, 12.5 lu), which has a
Fig. 8. gSCAS connectivity.
lower area and only generates a gSCAS area overhead of 8.61%
compared to SS. The TPG must then consider the inverting
mode which is then stopped at a certain point of time. In the scan-chain. Also a two transistor MUX can be used (SCA-FF-P,
SCAS all registers values can then asynchronously be read. The 12 lu, 6.07% area overhead), which has the disadvantage of a
missing feature is that a certain test scenario cannot be accom- degrading slope over multiple pass transistors. A mixture of a
plished by scanning in data through a scan chain as someone few SCA-FF-P followed by a SCA-FF-I is applicable and an
can do it with full scan to set the CUT in a predefined state. The area overhead of 7% can be assumed.
additional feature of continuously tracing a register line in func- B. gSCAFeasibility, TCPN, Power, Debugging
tional mode on the tester or in-system as discussed in the SCAh
section still exists. Also the potential to combine this feature In Section III, SCAh-Structure with Hold Mode, various as-
with the chips functionality is doable. pects are discussed. Most of them are also valid for the gSCAS.
The insertion of gated clock logic is standard and any existing
static timing analysis tool can be used (feasibility). The TCPN
V. GATED SCA-STRUCTURE
number is identical to the one of the SCAhS, because these two
This section discusses the gated SCAS (gSCAS), which has implementations of the single cycle access method are func-
all the benefits of the SCAhS but only has the area overhead tionally identical. Any ATPG algorithm for SS can be used as
of the SCAS. The hold function of the SCAh-FF is missing in starting point and the TS is optimized for the single cycle access
the SCA-FF. It is instead built into the gated clock tree of the technique. The problems peak power and the power consump-
gSCAS. Fig. 8 shows the connectivity of the gSCA. The scan tion during tests are eliminated by using the gSCAS. The ad-
path reaches from the scan-in AND-selector over the SCA-FF vantages of streaming out internal registers during testing and
chain (by connecting the scan-out pins of each SCA-FF with the debugging are discussed in Section III, SCAh-Structure with
scan-in pins of the succeeding SCA-FF) and is connected with Hold Mode, I. Debugging. The same arguments can be made
the input of the XOR-tree. The individual line-select signals { } for the gSCAS.
are connected with the { } input of the SCA-FF in the rele-
vant line. All SCA-FF on a line are clocked by a gated clock C. gSCA-Routing
element (gcl). The gcl is driven by the clock and the line-select This section compares the routing effort of the SS with the one
signal. The gated clock element can be enhanced, if a clock en- of the gSCAS. The following differences of the routing effort
able signal { } generated by combinatorial logic exists. The can be observed.
global scan enable signal is connected with each gcl, which is In SS the data-out { } is connected to the scan-in { } of
already the case in SS if gated clock elements are used to prop- the succeeding register in the scan chain. The SCA-FF has now
agate the clock during shift. a new output scan-out { } for that. The pin { } is connected
The gated clock gcl elements functionality is shown in to the { } of the succeeding register in the scan chain. It can be
Table VIII. If the global scan enable pin { } is deselected, assumed, that this modification has only a minor impact on the
the clock is propagated depending on the clock enable signal routing effort.
(functional mode, Table VIII, line 2 and line 3). In test mode The global scan enable signal in the SS structure can be seen
, the clock is only propagated, if the address as a signal tree, which is connected to all S-FF in the design and
line is selected. This allows a hold function if { } is not se- to all gated clock elements if they exist. The fan-out of this tree
lected (hold mode, Table VIII, line 4) and a synchronous write can be calculated as number of S-FF plus the number of gated
if the address line is selected (write mode, Table VIII, line 5). clock elements .
In the gSCAS, the global signal enable signal is only con-
A. gSCAArea nected to the gated clock elements but not to each
Since the gSCAS uses an SCA-FF, the area overhead equals SCA-FF. The local line select signals { } must be connected
the one of the SCAS, which is 10.51%, or more precisely, if to each register on their register line and the
the area for the gated clock logic is added, 11.14% compared to gated clock element . They generate small local trees (see
886 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 5, MAY 2012

The most critical feature in terms of timing is the at-speed


data streaming of internal data values during debugging. If this
feature is used, ta3 must cope with the design speed.
In SS-based designs, the shift cycle timing is reduced to avoid
heat dissipation due to high activity. The peak power problem
and the power consumption due to high activity is solved in the
gSCAS. If pattern are streamed through the IOs of the design,
Fig. 9. Global scan enable routing in SS and global scan enable and line select
the limiting factor can be the automatic test equipment (ATE).
routing in gSCAS. The critical timing arc ta3 can be optimized for the ATE to
achieve the fastest possible test speed. In gSCAS-based BIST,
the test speed can also be reasonable improved, due the absence
Fig. 9). The fan-out of the global select in SS of power problems. If ta3 is fast enough, the BIST speed can
is only slightly lower compared to the sum of the fan-out run at design speed.
of the global select and line select signals in the gSCAS A realistic ultra fast high speed test implementation of a low
. activity gSCAS on two pages implies two registers for page
The gSCAS uses an AND-selector for the line select and the select, one register for scan enable, 5 registers for the address
scan-in data for each page. The AND-selector is only needed if bus, 32 registers for the scan-bus-in, and 32 registers for the
at-speed data streaming during debugging is used, if the activity scan-bus-out. With 72 registers, two pages can be extremely
during test should extremely be reduced or if at-speed test sce- pipelined. One additional standard register increases the area
narios can be generated more efficiently by a more specific reg- by approximately , which results in a 1.4% area
ister write. For stuck-at testing, it is recommended to test pages increase of a single page. This extreme case can be suggestively
in parallel setting all AND-selectors to pass the scan-in data. In reduced based on users test preferences.
case a page has a SW of 31 and a SD of 32, the page has 992
registers and approximately 4747 combinatorial cells (consid- E. gSCAAt-Speed Testing
ering the fact, that in Table II an average of 1375 registers have During at-speed tests, the path delay of critical paths is tested.
6581 combinatorial cells). The page selector adds one register In SS the patterns are scanned in to generate a certain stimuli
per page and the AND-selector uses 63 AND-cells per page, which scenario, followed by one or two capture cycles, and a scan out
are all connected to the page select signal. The additional net of the captured data. The last capture cycle is executed at func-
{ } connected to 65 cells does not limit the routing of a page tional speed to measure the path delay. Two problems are dis-
with 992 registers and 4747 combinatorial cells. If each page cussed.
uses an individual address-decoder, the AND-selector can also First, the power problem during shift which can result in
be located at the input of the address-decoder (5 bits), reducing wrong delay measurements. A scan in shift cycle usually re-
the number of used AND-cells to in the given ex- sults in an excessive current due to the high toggling rate of
ample. the register values. An excessive IR-drop can lead to an exces-
It is important to mention, that the overall structure of the sive delay and therefore to a miss-classification of the CUT. For
routing does not change, because the global scan enable signal gSCAS, the same stimuli scenario can be generated but the last
is now basically replaced with local line select signals and the write is limited to the registers on the selected line, all other
global scan select is already routed to gated clock elements, if registers remain in hold mode. This eliminates the peak power
exist. Regarding the congestion, the pins per area drop from 1.25 and power consumption problem due to high activity during
in the SS to 1.12 in the SCAS, which is almost identical to the at-speed testing as well, allowing a more precise measurement
one of the gSCAS. This number indicates, that the gSCAS does of the critical path.
not lead to additional congestion problems. Second, the slow global scan enable { } signal generates
an extended test cycle delay for each test. The scan enable
D. gSCASTiming signal must be disabled to enable a capture cycle. Fig. 10 shows
The gSCAS is used to evaluate the timing for testing and de- the timing. Therefore, the value of the global scan enable signal
bugging of the proposed single cycle access methods discussed must change during the at-speed capture cycle (Tlos). The
in this paper. Fig. 8 shows the relevant timing arcs of the pro- method is called launch-on-scan (LOS), because the signal of
posed test implementation. ta1 is the timing arc from the line the path-delay starts to be measured with the last scan cycle.
decoder to the individual registers on the line. These local trees This high speed approach requires a balanced global scan en-
can be very efficiently optimized and are therefore extremely able signal and is therefore very area intensive. The workaround
fast. The area overhead estimations for the gSCAS include 6 is to add an additional capture cycle to the test method. This
buffers per 32 registers. ta2 is the timing arc from a global allows the { } to be propagated through a slow global scan
scan enable register to the gated clock elements of the gSCAS. enable tree (Fig. 10, Tloc) followed by two capture cycles. This
This tree can efficiently be pipelined as discussed in [12], if nec- method is called launch-on-capture (LOC), because the signal
essary. ta3 is the timing arc through the MUX in the SCA-FF of the path-delay starts to be measured at an additional capture
chain. The chain depth can be adapted to the target test speed. cycle, which results in a more complex TPG process.
The scan-in can be pipelined before the AND-selector (if exists) Since the gSCAS is backward compatible to SS, the same pat-
and the XOR-tree can be pipelined within the XOR tree logic. tern can be applied to gSCAS as for the SS, for example LOC
STRAUCH: SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST 887

Fig. 10. LOC, LOS timing.

at-speed pattern. The difference to the SS is that the gSCAS pro-


vides an extremely fast scan-enable functionality. The global Fig. 11. Address controlled BIST.
scan enable signal is only connected to the gcl and can there-
fore be much better balanced. The relevant timing arc ta2 in
Fig. 8 can be as fast as the test speed, otherwise it can easily Since SCAhS and gSCAS are backward compatible to shift
be pipelined as discussed in [12]. Each standard FF adds only scan implementations (by automatically incrementing the ad-
0.039% area overhead per page. The second relevant timing is dress during shift), existing BIST solutions can be used on
the delay of the line select signals. These are only small and local them as well. An additional problem is the high number of test
trees and can therefore cope with the design speed. The relevant pins for testing. In this section a method is proposed to reduce
timing arc ta1 is shown in Fig. 8. The line select is driven by the pin count of the test IOs on an address controlled SCAhS-
the address decoder. The inputs can be driven by a pipelined ad- or gSCAS-BIST.
dress select, similar to the method discussed in [12]. A pipelined The principle idea of BIST is not to apply write values to the
address bus (5-bits) adds only 0.19% area overhead per page. DUT by the tester but instead taking them from the scan-out
Even if the timing requirements for LOS cannot be achieved, values of the DUT internally. These deterministic pattern gen-
the timing for LOC is reasonable improved. erated by the DUT are used with or without (de-)compression
In SS the pattern must be scanned in to generate a certain logic as the stimuli of the DUT itself.
stimuli scenario. The gSCAS allows a selective single cycle read This BIST mechanism is used in the proposed structure. Only
and a selective single cycle write to achieve the same result. A the address signals { } are controlled to optimize the process.
consecutive sequence of a single cycle write, a capture cycle With an address controlled read the internal values of a partic-
and single cycle read is doable to toggle and measure the inverse ular line of the enabled pages are then propagated through the
path within a few cycles. The gSCAS reduces the at-speed TS or XOR tree to the scan output. The scan-out pattern can also be
allows more tests within the same time. With a pipelined support used as scan-in pattern and an address controlled write can be
logic, the at-speed tests can be examined on a low-cost tester. done.
The address controlled BIST structure is as follows (see
Fig. 11). An on-chip test controller can be accessed by the ATE.
VI. RUNNING PAGE TESTS IN PARALLEL
The controller can enable and disable individual pages. It also
A common approach to reduce the TCPN is the method controls the register mode (functional/capture, write, read and
to partition the scan chain into multiple parallel and there- hold) by applying the right values to { }, { }, and { }.
fore shorter scan chains [1]. The problem of peak power The page scan-in data are taken from the scan-out data of the
consumption and high activity during scan shift requires ad- XOR tree. The second XOR tree minimizes the scan output to a
ditional enhancements. Especially peak power consumption single bit, which is then read by the ATE. The individual page
cannot be eliminated easily in shift-scan based designs. For select register can be set by a shift register.
SCAhS/gSCAS based designs, peak power consumption and The controller has two modes. In the special mode, the { },
activity during test is unproblematic. { }, and { } can be set and { } can be read through the
The same approach of testing with a higher number of par- ATE interface (ATE-IF) to have full control for special tests and
allel scan chains can be used in single cycle access structures, debugging. In the BIST mode, the information address and reg-
when multiple pages are enabled at the same time. The write is ister mode are passed through the ATE-IF. These roughly 8 bits
then done with the same write value to the same line index of (control- and address bits) are then passed through a data multi-
each enabled page. Also the same line is read at each enabled plexed (double edged signal) ATE-IF. The resulting test inputs
page and the results are then passed through the XOR tree to the are then 4 bits and 1 bit for the XOR-tree output. An example
output. The activity during the write or read cycles can be ad- of testing the five largest ISCAS89 designs (s9234, s13207,
justed by enabling only an optimal set of pages if it needs to be s15850, s38417, and s38584) using address controlled BIST is
reduced at all. An example of testing the five largest ISCAS89 given.
designs (s9234, s13207, s15850, s38417, and s38584) in par-
allel is given. VIII. RESULTS
This section lists the results of different implementations dis-
VII. ADDRESS CONTROLLED BIST
cussed in the previous sections. In Table IX, the numbers of reg-
A problem in testing is the high test data volume for the isters, combinatorial cells, design input, and design outputs as
ATE. This test amount is reduced by BIST implementations. well as the numbers of nets are listed. This is done for ISCAS89
888 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 5, MAY 2012

TABLE IX
NET COUNT AND TCPN FOR ISCAS89 AND VARIOUS CORES

designs and example cores taken from [30] using the lsi10k li- ences are all calculated based on a single scan chain .
brary. Table IX also shows the TCPN extracted from various [4], Table IX lists the TCPN which can be achieved with SCAhS
[5], [11], [31]. (gSCAS) and an SCAS using the algorithm shown in Section III,
In [11] test compaction for at-speed testing is used in order SCAh-Structure with Hold Mode. F. Test Cycles and TPG.
to reduce the test application time. The test cycles TC are taken All TCPN are extracted using a scan width of in order
from Table III of [11] to calculate the TCPN. The algorithm in to compare the results with the [4], [5], [11], [31]. A write to
[31] reduces switching activity during scan testing (test vectors a primary input or a read to a primary output also results in an
of Table III of [31] are taken) and simultaneous test time and additional test cycle.
power reduction are shown in [5] (see Table II, TA1 is used). Fig. 12 shows the TCPN over the number of nets listed in
In [4] the reduction of test application time using limited scan Table IX. Whereas the TCPN rises dramatically with increasing
operations is shown. The numbers of Table IV (column: pro- design sizes (NETS) for alternative approaches, the TCPN of the
cedure 3.5) in [4] are taken. The test cycles listed in the refer- proposed structures are independent of the design size and tend
STRAUCH: SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST 889

The TCPN indicator of gSCAS is reasonable lower for large


designs than the one of shift-scan-based solutions and is in-
dependent of the design size. The higher test frequency and
the lower TCPN of the proposed single cycle access structures
clearly dominate the shift-scan based implementations by one
or two orders of magnitude. Assuming the production test can
run with a five times higher test frequency and the test cycles are
five times less than shift-scan based designs, the runtime advan-
tage would be .
The logic area overhead for non partial solutions is 17.27%
higher in the SCAhS but only 10.51% (11.14%) higher for the
SCAS or gSCAS compared to SS. The area overhead is related
to the core logic. If memories, power wells and pad area are
considered and make up 50% of the die, the area overhead is
reduced to 8.64% and 5.26% (gSCAS: 5.57%) respectively. If
test costs due to test time becomes more and more dominant
Fig. 12. TCPN with SW = 1 over nets. over area costs, the area overhead becomes less critical.
The SCAhS and gSCAS are backward compatible to shift-
scan implementations, so that all known tests can be executed
TABLE X
COVERAGE AND TCPN OF VARIOUS SCAS IMPLEMENTATIONS on the proposed structure. A relative minor change to the test
simulator must be done to extract the single cycle access ori-
ented TS. They can also be used together with BIST methods.
Further on they do have additional features for debugging the
silicon on- and off-chip.

B. Selecting the Best Single Cycles Access Implementation


to decreases with rising design sizes. For smaller designs the
Two cases must be considered. The first case implies that the
TCPN of the proposed structures are not very much favorable
area overhead is acceptable. As can be seen in Table XI, gSCAS
because a simple TPG algorithm is used. The advantage of the
is the most preferable solution to replace an SS. The area over-
single cycle access technique over shift scan-based structures
head of 11.14% for the gSCAS can be reduced, if a SCA-FF with
becomes obvious for larger designs.
an inverting MUX is used (8.61%) or a MUX with 2 pass-tran-
All results listed so far are related to a single scan chain
sistors is used (6.07%). The numbers refer to the test relevant
. This is particular useful for the comparison of
logic. They do not consider memories, power wells, pad cells,
standard scan implementations and to directly calculate the test
etc., so that the area overhead relative to the die size is reason-
amount (TA). The TCPN for any single cycle access structure as
able less. The gSCAS has the following three differences to SS:
well as for the shift based structure will decrease with a higher
1) one additional MUX per register and a new scan-out pin; 2)
SW. The number of input address bits of the address decoder is
the global scan enable is only routed to the gated clock elements;
defined as ABIT. For any SCA structure, the TA increases by
and 3) an address decoder selects the register on a line and the
ABIT/SW. If SW is 32, which is a more realistic case, TA in-
gated clock element.
creases by due to the additional address data.
The second case implies, that the area overhead is not ac-
If pages are tested in parallel (see Section VII, Running Page
ceptable and a partial implementation should be used. This and
Tests in Parallel) the TCPN further decreases. For a parallel test
other reasons such as limited test IOs for instance can lead to
of the five largest ISCAS89 designs (s9234, s13207, s15850,
a BIST implementation. Table IX shows that the address con-
s38417, and s38584), a TCPN(99.3%) of 0.088 is achieved with
trolled BIST achieves a reasonable TCPN.
a scan width of 32 (see Table X).
For address controlled BIST (see Section VIII, Address Con- C. gSCAS Compared to RAS
trolled BIST) a TCPN (98.9%) of 0.14 is achieved for the
ISCAS89 design set (see Table X). This section compares the gSCAS with various RAS
schemes. Three different RAS schemes can be identified. The
IX. DISCUSSION first group uses basically one single address-decoder to select
each individual register in the design and an additional element
A. Summary of Single Cycles Access Implementations (MUX) per register cell enables a hold mode of each register.
As listed in Table XI, for the proposed solutions, the peak The fact that each cell can be accessed individually generates
power problem is eliminated and the activity for random stimuli an unaffordable routing overhead compared to gSCAS, which
pattern is simulated with only 12% average activity compared to has almost the same routing requirements as SS.
shift based tests and 358% maximum activity compared to func- The second group addresses each register individually by
tional mode on a single active page. This leads to more realistic using an (or row) and a (or column) address decoder with
test results because the tests are executed under more realistic an additional combinatorial element (for instance AND gate)
conditions closer to the chip behavior in functional mode. per register cell. The cell values can be individually read by
890 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 5, MAY 2012

TABLE XI compared. A guide is given how to select the best implemen-


OVERVIEW OF VARIOUS SINGLE CYCLE ACCESS STRUCTURES tation. The best solution (gSCAS) is compared to RAS imple-
mentations and is superior to all known RAS solutions. If BIST
is preferable due to limited chip IOs or partial scan implemen-
tation, an address controlled BIST is discussed.
The ATPG algorithms can be enhanced with the same
methods SS implementations are optimized. Future work is
related to algorithms for reducing the test cycles per net itself,
register reordering, pattern optimization for activity reduction
and de-/compression methods for BIST using the gSCAS.

REFERENCES
[1] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, Embedded deter-
ministic test, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.,
vol. 23, no. 5, pp. 776792, May 2004.
an additional signal driven by a tristate logic added to each [2] D. Czysz, G. Mrugalski, J. Rajski, and J. Tyszer, Low-power test data
application in EDT environment through decompression freeze, IEEE
register cell. These proposals have in common, that the - and Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 27, no. 7, pp.
-line select routing is unaffordable. The individual register 12781290, Jul. 2008.
cells are enhanced by multiple logic elements which generate [3] D. Czysz, M. Kassab, X. Lin, G. Mrugalski, J. Rajski, and J. Tyszer,
Low power scan shift and capture in the EDT environment, in Proc.
an unaffordable area overhead compared to gSCAS, which Int. Test Conf., 2008, pp. 110.
only adds one multiplexer to each register. The readout is done [4] Y. Cho, I. Pomeranz, and S. M. Reddy, On reducing test application
using tristate logic, which is problematic in todays standard time for scan circuits using limited scan operations and transfer se-
quences, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol.
design flows. The gSCAS uses no tristate logic. 24, no. 10, pp. 15941605, Oct. 2005.
The third group uses a row decoder and a column decoder to [5] J. Chen, C. Yand, and K. Lee, Test pattern generation and clock dis-
address individual registers. Additionally the read/write mecha- abling for simultaneous test time and power reduction, IEEE Trans.
Comput.-Aided Des. Integr. Circuits Syst., vol. 22, no. 3, pp. 363370,
nism is enhanced with two signals per column, driven by tristate Mar. 2003.
drives connected to the internal latch cells of the registers via [6] S. Wang, A BIST TPG for low power dissipation and high fault cov-
tristate logic and an individual sense amplifier per column. The erage, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 15, no.
7, pp. 777789, Jul. 2007.
routing is an overhead compared to gSCAS, if two signals per [7] S. Almukhaizim and O. Sinanoglu, Dynamic scan chain partitioning
column are used. The enhanced read and write mechanism with for reducing peak shift power during test, IEEE Trans. Comput.-Aided
tristate drivers, cell internal tristate logic and sense amplifier Des. Integr. Circuits Syst., vol. 28, no. 2, pp. 298302, Feb. 2009.
[8] A. Al-Yamani, N. Devta-Prasanna, E. Chmelar, M. Grinchuk, and A.
per column is very timing sensitive. A few registers on a single Gunda, Scan test cost and power reduction through systematic scan re-
row generate an immense area overhead due to the individual configuration, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.,
sense amplifier per tristate signal. A higher amount of registers vol. 26, no. 5, pp. 907917, May 2007.
[9] S. Lin, C. Lee, J. Chen, J. Chen, K. Luo, and W. Wu, A multilayer
on a single tristate signal, driven from the internal register logic data copy test data compression scheme for reducing shifting-in power
through a pass transistor generate an unacceptable slope of the for multiple scan design, IEEE Trans. Very Large Scale Integr. (VLSI)
signal during a read cycle. During a write cycle the latch-like Syst., vol. 15, no. 7, pp. 767776, Jul. 2007.
[10] S. Sde-Paz and E. Salomon, Frequency and power correlation between
timing is also very timing critical. The tristate data signals must at-speed scan and functional tests, presented at the Int. Test Conf.,
hold their values until the row select signal is disabled. The re- Santa Clara, CA, 2008, Paper 13.3.
quirement to balance all row select signals of the already crit- [11] I. Pomeranz and S. Reddy, Test compaction for at-speed testing of
scan circuits based on nonscan test sequences and removal of transfer
ical routing structure is hard to achieve and generates therefore sequences, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.,
a complex timing scenario. This sensitive tristate logic has been vol. 21, no. 6, pp. 706714, Jun. 2002.
successively removed in the last decade from standard logic li- [12] N. Ahmed, M. Tehranipoor, C. Ravikumar, and K. Butler, Local
at-speed scan enable generation for transition fault testing using
braries and is not applicable for todays well established static low-cost testers, IEEE Trans. Comput.-Aided Des. Integr. Circuits
timing analysis, which is key to achieve sign-off quality for in- Syst., vol. 26, no. 5, pp. 896906, May 2007.
dustrial test implementations for multimillion gate designs. The [13] H. Ando, Testing VLSI with random access scan, in Proc. Diag. Pa-
pers Compcon 80, 1980, pp. 5052.
gSCAS fits easily in todays STA flows. Further on, LOS-based [14] D. Baik and S. Kajthara, Random access scan: A solution to test
at-speed testing is not possible for this group of RAS imple- power, test data valume and test time, in Proc. 17th Int. Conf. VLSI
mentations, whereas the gSCAS supports LOC and LOS based Des., 2004, pp. 883888.
[15] S. Lin, C. Lee, and J. Chen, A cocktail approach on random access
at-speed testing due to its synchronous write capabilities and scan toward low power ad high efficiency test, in Proc. Conf. Comput.-
faster scan enable logic. Aided Des., 2005, pp. 9499.
[16] T. Chen, H. Liang, M. Zhang, and W. Wang, A scheme of test pattern
generation based on reseeding of segment-fixing counter, in Proc. 9th
X. CONCLUSION Int. Conf. for Young Comput. Scientists, 2008, pp. 22722277.
[17] Y. Hu, Y. Han, X. Li, H. Li, and X. Wen, Compression/scan co-design
A single cycle access structure is discussed. Various imple- for reducing test data volume, scan-in power dissipation and test appli-
mentations with and without hold mode as well as gated and cation time, in Proc. 11th Pacific Rim Int. Symp. Depend. Comput.,
partial implementation methods are presented. The aspects fea- 2006, pp. 18.
[18] Y. Hu, C. Li, Y. Han, X. Li, W. Wang, H. Li, L. Wang, and X. Wen,
sibility, peak power consumption, switching activity during test, Test data compression based on clustered random access scan, in
area, test cycles, at-speed testing and debugging features are Proc. 15th Asian Tests Symp., 2006, pp. 231236.
STRAUCH: SINGLE CYCLE ACCESS STRUCTURE FOR LOGIC TEST 891

[19] R. Adiga, G. Arpit, V. Singh, K. Saluja, and A. Singh, Modified [28] C. Yao, K. Saluja, and A. Sinkar, WOR-BIST: A complete test solu-
T-flip-flop based scan cell for RAS, in Proc. 5th IEEE Eur. Test tion for designs meeting power, area and performance requirements,
Symp., 2010, pp. 113118. in Proc. 22nd Int. Conf. VLSI Des., 2009, pp. 479484.
[20] A. Mudlapur, V. Agrawal, and A. Singh, A random access scan ar- [29] Mentor Graphics, San Jose, CA, Silicon test and yield analysis, 2010.
chitecture to reduce hardware overhead, in Proc. Int. Test Conf., 2006, [Online]. Available: www.mentor.com/products/silicon-yield/prod-
pp. 350358. ucts/diagnosis
[21] Y. Hu, X. Fu, X. Fan, and H. Fujiwara, Localized random access scan: [30] ORSoC AB, Stockholm, Sweden, Projects, 2007. [Online]. Avail-
Towards low area and routing overhead, in Proc. Asia South Pacific able: www.opencores.org/projects
Des. Autom. Conf., 2008, pp. 565570. [31] S. Wang and S. K. Gupta, An automatic test pattern generator for min-
[22] A. A. , A. Khan, V. Singh, K. Saluja, and A. Singh, Test applica- imizing switching activity during scan testing activity, IEEE Trans.
tion time minimization for RAS using basis optimization of column Comput.-Aided Des. Integr. Circuits Syst., vol. 21, no. 8, pp. 954968,
decoder, in Proc. IEEE Int. Symp. Circuits Syst., 2010, pp. 26142617. Aug. 2002.
[23] D. Baik and K. Saluja, Test cost reduction using partitioned grid
random access scan, in Proc. 19th Int. Conf. VLSI Des., 2006, pp.
16.
[24] D. Baik and K. Saluja, State-reuse test generation for progressive
random access scan: Solution to test power, application time and data
size, in Proc. 14th Asian Test Symp., 2006, pp. 272277. Tobias Strauch received the Diploma (FH) from the
[25] R. Adiga, G. Arpit, V. Singh, K. Saluja, H. Fujiwara, and A. Singh, University of Applied Science (FH), Furtwangen,
On minimization of test application time for RAS, in Proc. 32nd Int. Germany, in 1998.
Conf. VLSI Des., 2010, pp. 393398. He is with EDAptability, Munich, Germany. His
[26] I. Voyiatzis, H. Antonopoulou, and C. Efstathiou, Output response field of interests include hardware assisted verifica-
compaction in RAS-based schemes, in Proc. 4th Int. Conf. Des. tion, TLM, high level ATPG, FPGA debugging, and
Technol. Integrates Syst. Nanoscale Era, 2009, pp. 161166. wave-based data transfer.
[27] K. Le, D. Baik, and K. Saluja, Test time reduction to test for path-delay
faults using enhanced random-access scan, in Proc. 20th Int. Conf.
VLSI Des., 2007, pp. 769774.

You might also like