You are on page 1of 10

Abstract

With the advent of nanometer technologies, the


design size of integrated circuits is getting larger and
the operation speed is getting faster. As a consequence,
test cost is becoming unbearable with traditional test
methods.
The big challenge for design and test engineers is
how to guarantee the required high levels of test quality
and yield while keeping the test cost low.
From a scan-based ATPG point of view, there are two
main ways to reduce test cost. One way is to reduce test
pattern volume and test run time. The problem is how to
maintain the same test coverage with a smaller test pat-
tern set. The other way to reduce test cost is to use
lower-end testers, which are much cheaper but have
limited memory, data channels, and clocking capabili-
ties.
This paper shares the experiences of a real case of
how to significantly reduce scan test cost by using DFT
techniques.
1. Introduction
The design is the first Dual-Core programmable
CMOS digital signal processor (DSP) in a series of
products that target the audio market. Since this Audio
Processor is for an automotive application, it requires
the highest possible test coverage. It is also for the con-
sumer market so the cost needs to be as low as possible.
This combination presents a common quandary for
manufacturing test plans -- how to get high test quality
at low cost [1].
The design will be packaged into 4 types of parts.
The smallest package is an 80-pin quad flat pack, so
only 16 scan chains are available for the design. With
normal scan-based test methods, this requires more
than 3000 registers on each scan chain. During scan-
based testing, this configuration leads to a lot of time
shifting test data into and out of the device which drives
up the test cost. As the test time required for each
device goes up, the through-put of the manufacturing
test floor goes down.
High test quality is another challenge for test cost. As
we know, at-speed testing is generally required to
achieve high quality test levels [2][3]. The cost of
including at-speed test comes in two parts -- first is the
high test pattern volume needed and the second is pro-
viding the required high-speed clocking sequences. The
Audio Processor is targeted to operate at 200MHz and
potentially at even higher frequency with overdrive in
the future. A tester that can provide 1000MHz clock
driving capability costs almost twice as much as one
with 100MHz clock driving capability. The clock speed
for scan shifting is much lower and can be provided by
any tester.
The basic characteristics of this design are shown in
Table 1.

This paper describes how a commercial ATPG


compression tool was used in conjunction with the test
plan to get high levels of test quality for this design
while at the same time significantly reducing the cost of
that test. Section 2 explains how the at-speed clocks
were generated and provided for the at-speed test
patterns. Section 3 describes the logic and methodology
used to compress the test data volume. The results of the
compression are provided in section 4 and the paper is
concluded in section 5.
Table 1. Basic information of the audio
processor
Type Number
Process 90nm CMOS
Package Types 80pin, 208pin144pin_A, 144pin_B,
Frequency > 200 MHz
Area (um2) 53002
Transistors ~1,290,000
Registers ~49,000
Scan chains 16
A Real Case of Significant Scan Test Cost Reduction
Selina Sha, Freescale Semiconductor
selina.sha@freescale.com
Bruce Swanson, Mentor Graphics Corp.
bruce_swanson@mentor.com
IEEE Computer Society Annual Symposium on VLSI
978-0-7695-3170-0/08 $25.00 ? 2008 IEEE
DOI 10.1109/ISVLSI.2008.32
239
2. On-chip PLL clock generation for at-
speed test
This design includes an on-chip phase locked-loop
(PLL) for generating the high frequency functional
clocks. To get a high quality at-speed test, it is better to
use these on-chip clocks for test purposes instead of
having the high speed test clocks come from the tester
equipment. It is also a big cost savings because this
approach allows for the use of less sophisticated and
hence less expensive testers.
To take advantage of the on-chip functional clocks
during test mode, a small piece of logic in the form of a
clock chopper was added inside the design which can
generate qualified scan clocks during scan test opera-
tion. The clock control circuitry for this design is
shown in Figure 1. There are many other similar clock
control circuits described in the literature [4][5][6][7].
Figure 1. On-chip clock generation circuitry

To test a chip at-speed, only the capture clock is


required to run at-speed. In the Audio Processor the
capture clock is generated by the PLL circuitry while
the scan shift clock remains the same as the slower ref-
erence clock. The benefit of using the reference clock
instead of the clock divided from the PLL as the shift
clock is to avoid synchronization problems between the
shift clock and shift-in data.
As shown in the broadside mode timing waveform of
Figure 2, during scan shift cycles the external scan
enable (ext_scan_en) is set high and the scan clock
(scan_clk) comes from the reference clock (ref_clk).
Once ext_scan_en is cleared low for the capture cycles,
scan_clk comes from the clock gate which is only open
when the chopper controller outputs a capture enable
signal (cap_ena).
Figure 2. AC test clock waveform for
broadside mode

The chopper controller is actually a counter triggered


by the PLL output clock (pll_clk). It turns on the
cap_ena signal to enable the number of specified fast
pulses through the clock gate which is configured in the
capture count (cap_cnt) register. The clock gate ensures
the fast pulses from the PLL are filtered without
glitches. The values specified for the cap_cnt register
are determined for each test pattern and these values are
loaded into the register cells as part of the scan test data
that is shifted into the circuit.
The example diagram shown in Figure 2 has two at-
speed clock pulses for the launch and capture cycles.
This is a broadside or launch-off clock type of test pat-
tern and a sequential depth of two is sufficient to cap-
ture the vast majority of the transition faults in the
design. To detect transition faults around memories in
the design, more capture pulses are often necessary.
The clock control circuitry described here is flexible
enough to provide up to seven at-speed clock pulses for
those harder to detect transition faults. Targeting those
additional faults is sometimes required to reach the
required test coverage goals.
The scan enable circuitry of this design includes a
gated or pipe-lined piece of logic as shown in the bot-
tom of Figure 1. With this circuitry, the design is able to
use the launch-off shift type of transition pattern with
the on-chip PLL clocks. The timing waveform for
240
launch-off shift transition patterns for this design is
shown in Figure 3.
Figure 3. AC test clock waveform for launch-off
shift mode

In general, launch-off shift mode patterns have higher


transition test coverage than broadside because it is a
simpler test for the ATPG tool to create. However,
much of the additional coverage is due to testing non-
functional faults [8]. The coverage is similar to broad-
side once false and multicycle path analysis is per-
formed [9][10]. To get a higher test coverage for at-
speed test with fewer patterns, the internal scan enable
was implemented for this design so that both broadside
and launch-off shift transition patterns could be used.
The scan mode select is cap_mode and it is configured
per pattern and the value is loaded during shifting in of
the other scan pattern data. When this register is set to
1, launch-off shift mode is enabled.
Table 2 shows a comparison of the pattern count
required by the two types of transition patterns to reach
80% test coverage for this design.

To get the correct values loaded into the capture


counter and capture mode cells, the ATPG tool provides
a method to specify those values when those cells are
part of the scan chains. The way to accomplish this is to
use condition statements within the named capture pro-
cedures.
Named capture procedures are user defined and tell
the ATPG tool how the clock control circuitry around
the PLL logic works. An example is illustrated in Table
3.
Within the named capture procedure are two modes,
the external mode describes the external clocks and
controls from the primary inputs, and the internal mode
describes the clocks and controls on the internal chip
side of the clock control logic. The example shown in
Table 3 specifies to create a 2 clock pulse broadside
type of pattern because of the values specified in the
condition statements.
Multiple named capture procedures can be created
and used. To test for at-speed faults around the memo-
ries of this design, a minimum of 5 cycles is required,
so named capture procedures were written to accom-
plish that. Another way to test around memories is to
use multiple-load patterns. To create test patterns more
efficiently, the general pattern creation flow was to first
run the ATPG tool with the 2 at-speed clock pulse
named capture procedures on the whole fault list. Then
to target the remaining faults to get the highest possible
test coverage, turn on the 6 at-speed clock pulse named
capture procedure.
Since the use of the on-chip PLL clocks for at-speed
test worked so well, this design can be tested with a
tester that supplies clocks of 50 MHz or less instead of
requiring a 200 MHZ tester. This reduced the cost of
the test equipment substantially.
Table 2. AC Pattern volume for broadside vs.
launch-off shift
Item Broadside Launch-off shift
pattern count 5856 2816
test coverage 80% 80%
Table 3. Condition statement usage
set time scale 1 ps;
timeplate cap_ext =
force_pi 0;
measure_po 8000;
pulse extal 10000 20000;
period 40000;
end;
timeplate cap_int =
force_pi 0;
pulse pll_clk 1250 2500;
period 5000;
end;
procedure capture cap_broadside_dep2=
condition /dsp_top/ac_config/cap_mode/Q 0;
condition /dsp_top/ac_config/cap_cnt_2_/Q 0;
condition /dsp_top/ac_config/cap_cnt_1_/Q 1;
condition /dsp_top/ac_config/cap_cnt_0_/Q 0;
mode external =
timeplate cap_ext;
cycle =
force_pi;
force ref_clk 0;
force reset_b 1;
force ext_scan_en 0;
pulse ref_clk;
end;
cycle =
pulse ref_clk;
end;
cycle =
pulse ref_clk;
end;
cycle =
pulse ref_clk;
end;
end;
mode internal =
timeplate cap_int;
cycle =
force_pi;
force ref_clk 0;
force reset_b 1;
force ext_scan_en 0;
force int_scan_en 0;
force scan_clk 0;
end;
......
cycle =
pulse pll_clk;
end;
cycle =
pulse pll_clk;
end;
......
end;
241
3. On-chip test pattern compression
The Audio Processor will be taped out once but the
die will be packaged into 4 different products. So only
the common pins that are present in all the types of
packages can be used as scan control and data channels.
16 scan chains are all that are available. As seen in
Table 1, almost 49,000 registers share these16 scan
chains, so there are more than 3000 registers on a single
scan chain in a standard ATPG scan setup.
With standard ATPG, it takes 1176 test patterns to
achieve 96% stuck-at test coverage. This means that
3.81Mbit of data storage is required for each scan chain
input/output pin on the tester. About 152.2ms is
required to execute this test set if the clock shift rate is
at 25MHz. But thats only for the stuck-at faults! The
patterns for transition faults can easily be 3X (3 times)
larger than that.
Embedded Deterministic Test (EDT) is a non-intru-
sive DFT technology for reducing test data volume and
test time dramatically [11]. EDT accomplishes this
reduction by applying a patented type of compression
during deterministic test pattern generation. EDT also
requires a small amount of logic on the chip that resides
only in the scan path between the scan channel interface
and the internal scan chains. This on-chip logic receives
the compressed test pattern data from the tester and
feeds the scan chains. Then it compresses the captured
responses on the output side before sending that data
back to the tester to compare against the expected
results.
The benefits of using EDT on the design are twofold.
It reduces the volume of test data that is required for the
tester memory, allowing the use of less expensive
testers. Secondly, it also shortens the test application
time and so higher tester throughput is possible than
with traditional ATPG [12].
The way that it shortens the test time is by configur-
ing the internal scan chains differently than in standard
scan ATPG. With EDT logic on-chip, the scan chains
are re-configured to be much shorter in length. This
creates many more scan chains, but that is not a prob-
lem since they only interface to the decompressor and
compactor logic and not directly to the scan channel
pins on the I/O. The tester still sees the design as having
only 16 scan chains/channels, but each is much shorter
in length and the test patterns are loaded/unloaded
much faster.
Figure 4 shows a diagram of how the EDT logic and
scan chains were configured for the Audio Processor
design. It used 600 short internal scan chains.
Figure 4, Scan configuration with compression
logic

As mentioned before, only 16 scan input/outputs are


available in this design. 15 are used for EDT scan chan-
nels. The last one is used for the at-speed scan clock
configuration, which was discussed in Section 2. This is
optional because these scan cells can be made part of
the regular scan chains.
The EDT logic can also include optional bypass logic
to configure the many short scan chains back into 16
long scan chains. This used to be done for failing part
diagnosis, but the ATPG provider can also diagnose
compressed EDT test patterns too.
EDT requires 3 pins for controlling the logic. They
are edt_clock, edt_update and edt_bypass. They can be
shared with functional pins. The normal sequence for
these signals is shown in Figure 5 and described below:
- In the load_unload cycle, the edt_clock pulses with
edt_update asserted to clear the old stored data in the
EDT module.
- In shift cycles, edt_clock pulses with edt_update
cleared. The test pattern data is shifted into the EDT
decompressor via the scan-in channels. The
decompressor calculates the data which is distributed to
all the short scan chains. The captured output data from
the previous pattern is compressed via the compressor
and shifted out from the scan output channels.
- In capture cycle(s), edt_clock is held to a constraint 0,
the values on edt_bypass and edt_update are dont cares.
To get the EDT logic created for the design is quite
easy. In fact, there are three standard EDT flows that are
supported: the external flow, the internal flow, and the
skeleton flow.
600 scan chains 1
5
sc
an
ch
an
ne
ls
D
ec
om
pr
es
so
r
Co
m
pa
ct
or
242
Figure 5. Basic EDT signals waveform

The internal and external flows both need the core


design to be at the netlist level. For the external flow,
the basic netlist core is without any I/O pads or bound-
ary scan logic. Those elements are inserted after the
EDT logic is created and reside at the same level of
hierarchy. If the design netlist already includes I/O pads
and boundary scan logic, the internal flow is used when
creating the EDT logic. This will include a wrapper
with the EDT logic and core inside.
Since the Audio Processor is a new design project,
the core RTL design was still changing when the DFT
work needed to begin. Because of this, the skeleton
flow was used. This flow allows the use of a skeleton
netlist as input to the EDT logic creation step as shown
in Figure 6. The skeleton netlist only contains the basic
design information such as the number of scan chains
and their clock domains. With this information, the
EDT tool is able to create the necessary EDT logic at
the RTL level.
Figure 6. Skeleton DFT flow with EDT
The design team integrated this EDT logic into the
design before synthesis and scan insertion as shown in
the diagram. Compared to the core logic, the addition of
the EDT logic was less than 1%.
The use of the skeleton flow was a big advantage
because the DFT work was started before the first
netlist was ready. In addition, if the core netlist changes,
its not necessary that the EDT logic be re-generated.
Since netlist changes are common in the design phase,
this was another considerable cost savings to the
project.
4. Test pattern compression results
The test compression results obtained were quite
good. Table 4 shows the stuck-at test pattern volume
comparison for the design with and without the EDT
implemented. At 96% test coverage, 3.81Mbits are
required on the tester for each scan input/output for data
storage if the EDT logic is bypassed. With the EDT
logic in place, this tester memory per pin requirement
drops to 0.20Mbits.
NOTE: The coverage shown is not real high because we didnt
stress the tool and try all possible configurations for
comparing convenience. We can get > 98% stuck-at
coverage if we do so.
The test compression results for broadside transition
test patterns was slightly higher and are shown in Table
5. At 80% test coverage, 11.62Mbits are required for
each scan input/output for data storage in bypass mode
while only 0.52Mbits are required with the EDT logic in
place.
scan_en
scan_clk
edt_update
edt_clock
edt_bypass
load_unload
shift
load_unload
shift shiftcapture
Create EDT logic
RTL Design/Integration
Design Synthesis
Scan Insertion
ATPG with gate level netlist
RTL
EDT Skeleton
Netlist
Scan
With
netlist
Logic
EDT
Test Patterns
Table 4. DC pattern volume comparison
Item Without EDT With EDT
scan chains 16 601
scan cells per chain 3240 82
pattern count 1176 2278
tester memory needed
per pin 3.81M 0.20M
pattern volume 61.00M 3.21M
test time
(25MHz shift) 152.50ms 8.02ms
test coverage 96.02% 96.02%
compression ~ 19 X
Table 5. AC pattern volume comparison
Item Without EDT With EDT
scan chains 16 601
scan cells per chain 3240 82
pattern count 3584 5856
test memory
needed per pin 11.62M 0.52M
pattern volume 185.91M 8.25M
test time
(25MHz shift) 464.77ms 20.61ms
test coverage 80% 80%
compression ~ 22 X
243
NOTE: The pattern counts shown are both calculated for
broadside mode.
NOTE: The coverage shown is not real high because we didnt
stress the tool and try all possible configurations for
comparing convenience. We can get > 89% transition
coverage if we do so.
By combining the stuck-at and transition test patterns
with the on-chip compression logic, the amount of
required tester memory drops to just 5% of the memory
that would be required without the on-chip compression.
5. Conclusions
The Audio Processor utilized several DFT techniques
to reduce test cost significantly. The first methodology
was to add a small amount of logic to enable the use of
the on-chip PLL clocks for at-speed test purposes.
Since the shift clock frequency was just 25MHz during
test, we were able to use a less sophisticated, and much
less expensive tester while still testing the design at its
200MHz operational frequency. Considering that a
1000MHz Pinscale tester is almost double the cost of a
100Mhz J750 tester, the test cost is cut in half by using
this technique.
Another big test cost savings comes from the com-
pression of the test patterns which require much less
tester memory. For less than 1% logic area added to the
design for EDT logic, both the stuck-at and transition
test patterns take up much less tester memory as shown
in the previous section. Since the design has over half
of the total area consumed by memory, the EDT logic
amount drops to under 0.5% of total logic chip area.
Finally, the saying that time is money is really true,
especially on the manufacturing test floor. Figure 7
shows a comparison of the amount of tester time
required to test this design with the bypass (standard
ATPG) and the compressed test patterns using the EDT
logic.
Figure 7. Scan pattern test time comparison
The times in milliseconds to run the at-speed test
patterns are shown on the left and the stuck-at patterns
on the right. By using the EDT logic and test patterns,
we were able to dramatically reduce the test time per
device and increase the through-put of the production
test line.
6. References
[1]. J. Saxena, et al., Scan-Based Transition Fault
Testing - Implementation and Low Cost Test
Challenges, Proc. International Test Conference,
2002, pp. 1120-1129.
[2]. X. Lin, et al., High-frequency, At-speed Scan
Testing, IEEE Design & Test of Computers, Sept.-
Oct. 2003, pp. 17-25.
[3]. B. R. Benware, R. Madge, C. Lu, R. Daasch,
Effectiveness Comparisons of Outlier Screening
Methods for Frequency Dependent Defects on
Complex ASICs, Proc. IEEE VLSI Test Symposium,
2003, pp. 39-46.
[4]. N. Tendolkar, et al., Novel techniques for achieving
high at-speed transition fault test coverage for
Motorolas microprocessors based on PowerPC
instruction set architecture, Proc. IEEE VLSI Test
Symposium, 2002, pp. 3-8.
[5]. J. Boyer, R. Press, Easily Implement PLL Clock
Switching for At-Speed Test, Chip Design
Magazine, Feb.-March 2006.
[6]. M. Beck, et al., Logic design for on-chip test clock
generation - implementation details and impact on
delay, Proc. Design, Automation and Test in
Europe, 2005, pp. 56-61.
[7]. H. Nakamura, etal., Low Cost Delay Testing of
Nanometer SoCs Using On-Chip Clocking and Test
Compression, Proc. IEEE Asian Test Symposium,
2005, pp. 156-161.
[8]. K. S. Kim, S. Mitra, P. Ryan, Delay Defect
Characteristics and Testing Strategies, IEEE Design
& Test of Computers, Sept.-Oct. 2003, pp. 8-16.
[9]. V. Vorisek, B. Swanson, K.-H. Tsai, D. Goswami,
Improving Handling of False and Multicycle Paths
in ATPG, Proc. IEEE VLSI Test Symposium, 2006,
pp. 160-165.
[10]. D. Goswami, et al., At-Speed Testing with Timing
Exceptions and Constraints - Case Studies, Proc.
IEEE Asian Test Symposium, 2006, pp. 153-159.
[11]. J. Rajski, et al., Embedded Deterministic Test for
Low-Cost Manufacturing Test, Proc. International
Test Conference, 2002, pp. 1120-1129.
[12]. F. Poehl, et al, Industrial experience with adoption
of EDT for low-cost test without concessions,
International Test Conference, 2003, pp. 1211-1220.
244