Hierarchical DFT With Enhancements For AC Scan, Test Scheduling and On-Chip Compression - A Case Study

Hierarchical DFT with Enhancements for AC Scan, Test Scheduling
and On-chip Compression – A Case Study
Jeffrey Remmers Darin Lee Richard Fisette

Plexus Design Plexus Design Mentor Graphics Corp.
Solutions, Inc. Solutions, Inc. 1601 Trapelo Rd.
325-1 Boston Post Rd. 325-1 Boston Post Rd. Waltham, MA 02514
Sudbury, MA 01776 Sudbury, MA 01776
1
techniques to include TDF coverage at the boundaries
Abstract of the cores.
The DFT architecture, pattern generation and The addition of TDF vectors resulted in a larger vector
application, and economic issues encountered in large volume that would not fit on the target tester. To this
ASIC designs are increased when at-speed testing is end, vector compression DFT was added to the chip to
introduced. This case study shows enhancements over reduce the memory requirements for the target tester.
the current state of the art hierarchical methodology The most efficient method of adding compression
by improving transition testing, test scheduling, and DFT, at design time, was to implement it at the
using compression on a production design. individual core level and to then integrate it at the top-
level.
Once the DFT was in place, there were concerns about
1.0 Introduction both test application time and test power consumption.
System-on-Chip (SoC) design focusing on the DFT techniques to allow the testing of multiple cores
integration of IP Cores is now a mainstream at once were included to reduce test time. However, to
technology and is becoming quite mature in provide a level of guarantee against power problems at
application. Because of the core based nature in the test, enhancements were added that allow flexibility in
design of SoC devices, the hierarchical methodology choosing which cores and how many to test
employed for DFT makes use of the natural simultaneously. This design methodology is currently
partitioning and follows the same core-based flow. being employed in multiple designs, across multiple
DFT and test architectures can be segmented and sites.
related to the different hierarchical logic groupings.
However, due to the increasing size of modern designs This paper is organized as follows: section 2 presents
plus the inclusion of AC scan tests used to detect the the motivation for the work; section 3 presents the
more prevalent speed related defects, the pattern details and requirements of the design; section 4
counts are growing at a higher rate than economics covers the pattern-sizing estimate that drove the DFT
and tester memory can match. design; section 5 presents the core-level DFT; section
6 details the top-level DFT; section 7 briefly outlines
The key technologies investigated with this case study the role of 1149.1 in the on-chip DFT; section 8
design were exactly the management of vector sizing presents the specifics of TDF testing; section 9
by both hierarchical partitioning and the use of discusses the role of the clock-chopping PLL; section
embedded on-chip compression; the inclusion of an 10 reviews some of the challenges encountered in the
enhanced set of AC transition delay vectors; and an case study design; section 11 concludes the paper; and
evaluation of the methodology needed to allow post- finally section 12 outlines future work to be
design test scheduling. considered.
The design that is the focus of the case study was a
large SoC, with multiple instances of one core. The
end customer had very high quality requirements. 2.0 Motivation
Because it included the detection of speed-related In this case study design, the size of the device posed a
defects, AC scan type testing using transition delay problem with high pattern count. Even considering
(TDF) was added to traditional single stuck-at fault the inherent compression benefits found in the
(SSF) testing. The AC scan methodology that was hierarchical methodology, the introduction of
applied was enhanced from previously implemented transition fault patterns caused the total tester memory
requirements to exceed tester limitations. The target
Paper 29.3 INTERNATIONAL TEST CONFERENCE 1

0-7803-9039-3/$20.00 © 2005 IEEE
tester was an Agilent 93K configured with 28M of • Transition Faults- Because 0.13µ processes have
vector memory (per pin). Since the decisions on the exhibited an increased number of at-speed related
DFT methodology needed to be made early in the defects, it was deemed necessary to target transition
process, an estimate was made for the total pattern faults. While the core is running at frequencies of
count based on previous design experiences – this
130 to 240 MHz, and is achievable by available
estimate will be explained further in the paper.
testers, at-speed clocks were generated using the on-
chip PLL. This was done to prove the concept for
3.0 Case Study Design future planned designs that will require frequencies
The focus of this case study is on the multi-phase scan beyond the tester’s capability. It also provides a
1
methodology developed by Plexus Design Solutions, capability to target lower cost testers if desired. The
Inc., which was implemented during the design of the goal for transition coverage is >80%, as reported by
Cobra chip. The Cobra design (Figure 1) is a 13.5M the tools.
gate, 0.13µ TSMC process, having embedded-core • Compression will be used at the core level for both
frequencies of 133 MHz and 233 MHz. Our customer
TDF and SSF patterns.
recognized the importance of DFT and has made a
significant effort to achieve the highest possible test • Path Delay – Path delay tests are defined for the
quality. There are a number of DFT features designed device as determined by Static Timing tools. The
into the Cobra chip, such as Full Scan, Boundary number of paths is initially limited to 100 per core.
Scan, Memory BIST (both with and w/o repair), and
some Functional Test. • Boundary Scan - Cobra has Boundary Scan on all of
the I/Os except for the tri-state enable pin. That pin is
used to isolate the device in board and MCM
assembly.
• Memory BIST - Memory BIST was used on all the
Artisan memories. Soft repair is implemented on
selected, larger memories and will be executed at
power-up of the device.
4.0 Vector Sizing Estimation
The DFT architecture had to be developed prior to the
generation and understanding of vectors, and there was
a very real concern that the vectors would exceed the
capabilities of the target tester. The design, at the
planning stage, consisted of a DSP core that was
replicated eight times, plus some interface logic and
other computation blocks. An estimation of vector
impact was possible, because a previous generation of
the product family included a similar architecture. To
more accurately estimate the pattern count and the
required tester memory, some experiments were run
on the previous generation core (DSP1).
Again, this case study design was targeted to 0.13µ
Figure 1 - Case Study Design
technology and therefore, required both SSF patterns
and TDF tests. So, the tests for this generation would
be significantly larger than those from previous
The following is a list of the DFT features and designs.
requirements for Cobra: The first step was to run a baseline ATPG for SSF
• Full scan - Cobra is a nearly “full scan” design using (Single Stuck-at Fault), which established the final
SSF coverage goal (Table 1).
Mux-D flops. With the exception of a small number
DSP 1
of registers in the clock control logic, all of the 350K SSF Baseline
flops were scanned. Scan Chain Number of Test Pattern Test Vector
Length Chains Coverage Count Memory*
1000 28 99.35% 1320 ̃1.3M per pin
• Stuck-at - The target coverage for stuck-at faults was *Test Vector Memory = Longest Chain x Pattern Count
99.6+%.
Table 1

1
The second step was to create transition-delay patterns the tester . In addition, this methodology allowed us
against DSP1. In addition, those patterns were then to efficiently use tester memory, which was planned
fault graded for SSF coverage, and an ATPG (SSF) early in the process. Because the design is partitioned
top-off was run to achieve the final coverage, as along functional boundaries, scan planning is done to
determined in the baseline run (Table 1). balance the test patterns from each block. The number
of chains (channels for the compression flow), the
In other words, by understanding the SSF content of
depth of the chains and rough pattern counts are
the transition delay vectors, it was possible to replace
determined early in the flow.
the majority of the stuck-at vectors with the transition-
delay vectors. The total pattern set is a combination of When compression was introduced in the DSP1 core,
TDF patterns plus SSF top off vectors (Table 2). it was configured to have 280 internal chains of length
DSP1 100. Twelve external channels were brought out from
AC Scan plus SSF top off the core. Running patterns with the same flow
Transition Transition SSF Top Off SSF Test Vector
Patterns Coverage Patterns Coverage Memory* (transition, Fault sim, top off), the total test vector
3780 98.35% 24 99.35% 3.8M per pin memory was reduced to 600K per pin. Multiple
*Test Vector Memory = Longest Chain x Pattern Count
schemes for bringing out the core level channels to the
Table 2 chip level could be used.
Taking the data from the work done on the previous 1
In the existing Hierarchical DFT Methodology , multi-
generation core, we determined a scaling factor for the phase scan chain muxing was used. With that flow,
full chip pattern count, based on the DSP1 tests. In the chains from the cores are fed into muxes in the top
that previous design, the scaling factor was 2.73x. In level. Based on scheduling plans and pin availability,
other words, when the DSP1 was integrated into the those past flows used a single core per phase.
top level design, it required 2.73 times the number of
patterns that were used to test it without integration Using that same single phase per core approach, the
into the total chip. This was due to the fact that all total memory was approximately (10 cores) *
primary pins of the DSP1 core were not visible at the (600K/core) = 6M per pin. This was well within the
chip level. To deliver patterns to the core, it required limitations.
traversing through some top level logic. In the first core implementation, a number other than
Using this scaling factor on the transition-delay plus 12 channels was chosen due to constraints in one of
top-up SSF patterns (3804 patterns x 2.73 = 10,385), the first chip designs. In that first tape-out, which
resulted in the estimated chip level patterns. In that used the DSP core, only sixteen pins were allocated
previous generation chip, the eight core level chains for scan in/scan out channels. However, the analysis
were concatenated (8 scan architectures at 1000 bits of performed for the selection of the ‘Hierarchical DFT
depth), so the total memory in the tester for that with compression’ flow was still valid.
architecture would have been 10,385 x 8000 = 83M of The production design studied here was comprised of
vectors per pin. eight DSP cores (Hard Macro), some computational
Adopting the Hierarchical DFT Methodology , and
1
blocks, interface sub-blocks and other glue logic. The
using a single phase per core, the vector memory DSP core was hardened and reused in multiple designs
would be reduced to (number of cores) x (pattern across multiple design sites, so the methodology
count) x (length of longest chain). The plan was to required the block to have DFT that was easily ported
have 10 cores in the device, with a chain length of and integrated.
1000 registers. With 3804 patterns per core, this was Due to the high level of ac testing and the subsequent
approximately 38M per pin. It is a significant savings increase in overall pattern count, the methodology
over the historical methodology, but still exceeded the chosen was enhanced to include compression
per pin memory of the tester. technology from a commercial EDA supplier. In this
The next option was to use compression technology on case study, Mentor Graphics’ TestKompress was used.
the cores for pattern count reduction. Using the
Hierarchical DFT Methodology with compression in
the cores was the option under consideration. 5.0 Core
The hierarchical methodology follows a similar flow
The hierarchical DFT methodology used in previous 1
used in previous designs . In this design, compression
designs provided benefits of faster runtime, no tool or was added at the core level to reduce the size of the
machine capacity problems, early detection of patterns stored on the tester. Using the terminology of
testability issues, simplified verification, isolation of the EDA tool, the channels were defined as the input
potential problem blocks, a universally applicable connections to the decompressor and the output
methodology for managing scan chains at the chip connections from the compactor (Figure 2).
level, and simplified debug of scan chain failures on

The number of chip level pins available for scan While the design team initially committed to register
determined the channels defined in each block. Since the ports in the cores, when the DSP was first checked
the DSP (Hard Macro) core will be used in designs (early synthesized netlist), there were many ports
across many design groups, the first two “customers” without registration. One solution was to add registers
to use the core determined the number of channels. In to all ports, combined with muxes to achieve the
this case study design, eight scan in and eight scan out isolation required. However, this was not deemed
pins were defined. The channel to chain ratio was possible due to the latency introduced in the system.
determined by the target compression combined with a
An alternate solution for registering the ports to the
target chain length. From an overall planning point of
core was to use DFT tools to identify registers within
view, the chain lengths in each core were defined to be
the design. Using this technique, the tools traced
approximately equal, with a range of 150 to 200
through the inputs to the block through the
registers deep. The compression logic was added at
combinational logic until the first level of registration
the block level and then the reduced number of
was identified (Figure 3). However, in many cases,
channels was fed to the top and multiplexed out to the
this resulted in an identification of multiple registers
pins of the device through the multi-phase scan muxes
per input port, which caused two issues to arise. First,
(Figure 1).
the number of registers identified for the partition
chain increased significantly. In the case study design,
it caused the length of the partition chain to be 6318
(number of identified partition registers) instead of
848 (number of ports into the block).
Figure 2 – Channels in and out of the core
One key element in the use of a hierarchical Figure 3 – Ports fan out to multiple Input Partition Registers
methodology is the registration of Inputs and Outputs
(I/Os) in the top level blocks of the chip. In many Secondly, in some cases, the combinational logic had
designs, the ports are registered to have accurately feedbacks to input partition registers from logic
characterized blocks (for reuse). Registration is also internal to the core (within the partition registers,
used to minimize the constraints on global routing. Figure 4). This caused a problem in testing the final
Since the core performance is characterized at the phase, or the top level logic. In the testing of the top
block level, it is not dependant on asynchronous level logic, all partition scan chains were placed in
timing from other blocks. This reduces the timing scan mode. Patterns were fed into those registers
constraints on those inter-block routes. In this while the cores (inside the partition chains) were in
hierarchical methodology, the DFT flow takes “don’t care” states. However, when there was
advantage of those registers to isolate the blocks. This feedback from an internal node back to the input of a
isolation provides the ability to generate patterns for partition scan chain, it caused an unknown to be
the block that are independent of the logic in other present on the partition register. This caused a loss of
blocks within the design. In the design used with this test coverage at that register and the cones of logic fed
case study, one major core, the DSP, did not register by that register.
all I/Os. This posed a challenge for the methodology
requiring special handling for those unregistered ports.

not employed in testing the top level logic (final phase
in the multi-phase muxing).
The other major enhancement made in this chip level
testing was the ability to test multiple cores
simultaneously (selectable through JTAG). This
resulted in the minimization of the overall test time, by
exercising multiple cores at once. With a selectable
option, it can be tuned to ensure that the maximum
power dissipation of the package-die combination is
not exceeded (testing all cores at once could exceed
Figure 4 – Feedback to Partition Register from Internal Cells the power capabilities). Table 3 describes how the
The solution to this problem required the insertion of a muxes are controlled in each phase.
small number of test points (Figure 5) to control those Pttn Phase
Mux1 Mux2 Mux3 Mux4
HM0 HM4 RX HM1 HM5 TX HM2 HM6 CT Top HM3 HM7 SA Top
feedback loops for use in the final phase of testing (top Top Top
level logic testing). P0

P1
0
1
1
1
1
1
1
1
1
1
P2 2 1 1 1 1
P3 3 1 1 1 1
P4 4 1
P5 5 1
P6 6 1
P7 7 1
P8 8 1
P9 9 1
P10 10 1
P11 11 1
P12 12 1
P13 13 1
P14 14 1
P15 15 1
Note: HM – HardMacro, SA – Sand, CT - CompoundTop, Top

– Top Level
Table 3: Phase Test
• Phase 0 - four HardMacro cores are tested in parallel.

Figure 5 – Test Points Added for Observation • Phase 1 - four other HardMacro cores are tested in
parallel.
6.0 Chip Level • Phase 2 - the RX/TX cores, the Sand and
In the chip level testing, the strategy of using multi- CompoundTop cores are tested in parallel.
1
phase muxes was used . In multi-phase muxing, the • Phase 3 - the top level logic with all partition scan
scan in and scan out from multiple cores were
chains from each core are tested at the same time.
multiplexed into and out of the chip. Using this
technique, multiple cores can use the same few I/O • Phases 4 to 15 - each core can be tested individually.
pins available for use in test. In previous designs, the
selection of cores under test was controlled through A decision was made to have the default operation test
primary inputs to the chip. four cores at once. As the table describes, in Phase 0, four
of the DSP Hard Macros were tested in parallel. Patterns
This methodology was enhanced by using JTAG
for Hard Macro 0 are delivered through Mux1. Patterns
instructions to control the selection of the blocks under
for Hard Macro 2 are delivered through Mux2, Mux3
test. Using JTAG for control reduced the number of
dedicated package pins required for test. It does add a provides patterns to Hard Macro4 and Mux4 supplies
level of complexity in the application of test patterns patterns to Hard Macro6.
to the device, but this is essentially a test procedure The one exception to this strategy is the top level testing.
that is setup and automated in the EDA tools. In the final phase, the scan chains from all of the top level
During the chip level test, each of the cores was tested logic is brought out to package pins along with all
independently in each phase. The final phase focused partition chains from the cores. This is the technique
on testing the top level logic, which is outside of the which picks up the combinational logic that resides
defined cores. In this case study design, compression outside the partition cells in the cores. It also covers the
technology was used in all the major blocks but was cones of logic that reside between the registers in the top
level and the ports in the cores. In this final phase, all 32

chains are from either partition chains or scan chains technique, power will not be consumed in cores not
within the top level logic. This phase is the only one that targeted in the Phase-mux select approach described
does not include compression technology. in Table 3.
To test one core at a time, the target core will have its • Sets up pin parametric testing.
channels described to the EDA tool (chain identification,
• Tests connectivity of the package to the board
etc) and patterns will be generated for that core only. As
(standard implementation).
an added capability, when the device is in “single core
test” mode, the cores not under test will have clocks
disabled and no patterns delivered to them.
8.0 Hierarchical Transition Fault Testing
The structure that is in place will also allow for testing of In the case study design, transition fault patterns were
2, 3 or up to 4 cores simultaneously. The only change will created using launch off capture (as opposed to launch
be how those cores are defined to the EDA tool. off last shift). This was selected to reduce the
constraint on the Scan Enable signals. When using
To test in the non-default mode, the EDA tools must launch off last shift, SE becomes a high speed pin,
create patterns for each different option. This will require requiring it to be routed like a clock, or to use retiming
different pattern sets to be generated and delivered to the logic (pipeline registers) for the SE signal. Launch off
operations department if the device needs to be tested in capture requires a larger number of patterns, but with
any mode other than the default. the design using embedded compression technology,
this is not a concern.
Following receipt of the device from fabrication, the
initial tests will be run with the default mode. If power Multiple scan_enable's are necessary to implement
dissipation issues are encountered (IR drop, etc), different more complete transition fault testing in a hierarchical
manner. Also, to support hierarchical transition fault
options for the test sequencing will be used.
testing, a lock-up flop positioned at the beginning of
the input partition scan chain and at the beginning of
the output partition chain is included.
7.0 JTAG and its Role
The IEEE 1149.1 standard was followed in the case To correctly describe the reason, it is first necessary to
study design. As well as the standard functionality, review the requirements for detecting a transition
addition capabilities were included employing user fault. The following figure illustrates how a scan
defined instructions. pattern is used to initialize a transition fault site and
deliver a transition vector.
JTAG’s functionality was used to setup the chip in
Transition Origin Launch Flop Capture Flop
different test modes. Below is the list of tasks the
V1 R(V1) 1) Last shift loads initialization
JTAG controller performs. D Q V2 D Q D Q vector (V1) into the Launch
FF3 FF1 FF2 Flop. Transition vector (V2)
originates from the flop driving
• JTAG selects the phase scan muxes at the top level to SI SI SI
the D input of the Launch Flop.
select the core under test according to Table 3. R(V2)

V2 2) With scan shifting turned off
D Q D Q D Q
• It sets up the PLL in the different clock frequencies FF3
SI
FF1
SI
FF2
SI
(SE =0), launch V2 from D input
of FF1.
and controls the clock chopper to test the target core

accordingly, both for transition fault patterns and R(V2)
D Q D Q D Q 3) Capture result of V2 at D input of
single stuck-at patterns. FF3 FF1 FF2 FF2 with a second clock pulse.
SI SI SI
• Instructions schedule the Memory BIST test events

including test and repair (soft) capabilities for some Figure 6 – At Speed Scan Vectors
memories. In the soft repair for the large memories, As illustrated above, it is necessary to have three
JTAG instructions are used. The requirement is to registers to execute an “at-speed” scan vector. When
sequentially run their MBIST and soft repair at power testing a block in a hierarchical manner, the normal
up. Instructions are used to schedule and execute Transition Origin (another flop in the design) is no
these events. longer present. The following figure illustrates how to
use the existing scan chain to supply the Transition
• ‘Simple’ MBIST is initiated through JTAG Origin.
instructions. These tests are run in manufacturing test
only.
• It disables the clocks delivered to each core that is not
being tested for power saving purpose. Using this

Launch Flop Capture Flop
1) Last shift loads initialization
9.0 PLL Clock Chopping
V1 R(V1) vector (V1) into the Launch
X D Q D Q Flop. Since The D input of FF1
The clocking used for detection of the speed related
FF1 FF2 is driven by “X” (outside of
Transition Origin SI SI block), the transition vector
defects in the interconnect was based on work
(V2) must originate from the
V2 flop driving the scan input. described by Teresa McLaurin, et al in the ITC2000
D Q
X
FF3 proceedings. Internal clock choppers were used for
SI
each of two clock domains. The basic design included
R(V2)
edge detect logic, for starting the clock sequences;
V2
X D
FF1
Q D
FF2
Q 2) In order to launch V2, scan mask registers, to set the clock positions; a shift stage,
shifting must remain enabled
SI SI (SE=1). to drive the gating logic; and the gating logic, to
X D Q
deliver the clock pulses. The source for the clock
FF3
choppers came from the PLL’s used in each functional
SI
domain. The only change required was an accurate
R(V2)
clock source on the load board of the tester. This
X D
FF1
Q D
FF2
Q
3) Capture result of V2 at D input of reference clock was used by the internal PLL’s for
FF2.
SI SI
locking the frequency.
X D Q REF_CLK
FF3 CLK_1/2
PLL CLOCK CLK_1/4
SI X8 DIVIDERS CLK_1/8
CLK
Figure 7 – Transition Pattern Origin
MASK0[7:0] CLK_EN0 GATED_CLK
The reason for the scan_enable signals and a lock-up MASK
MASK1[7:0]
SHIFT
CLK_EN1 GATED_CLK_1/2
SCAN_CHAIN VALUES STAGE CLOCK
flop is to insure that there is a valid state during the MASK2[7:0]
MASK3[7:0]
CLK_EN2
CLK_EN3
GATING
GATED_CLK_1/4
GATED_CLK_1/8
two clock pulses required for at-speed scan testing. In
LAUNCH_CAPTURE
a hierarchical implementation, it is necessarily
assumed that Xs propagate from outside the block. EDGE
SHIFT_IN_BIT
DETECT LD_SHFT
After loading the scan chain for a transition fault BEGIN_AC &
CTRL
pattern, the scan_enable signal for the partition chains TEST_EN
is held active (shift only) during the launch/capture SCAN_SHIFT_CLK_SEL

pulses to insure that there is a register driving a known SCAN_SHIFT_CLK
state into the partition cell during the second pulse.

When testing the logic inside the targeted block, this Figure 8 – PLL Clock Chopper
strategy is required to handle the input partition cells. 10.0 Challenges Encountered in the Case Study
The output partition cells, however, need to capture Design
data on the D pin, when testing inside the block, which In the course of working on the design, there were a
means their scan_enable signal must be inactive number of issues that were encountered. Some
(shift/capture controlled by the tools). When testing required design modifications; others caused an
outside the block (targeting top level logic and block enhancement to the methodology, and yet others were
interconnects), the roles of the partition cells reverse. addressed with modifications to the EDA tools.
The output partition cells must maintain a valid state
for the second clock pulse (meaning One challenge occurred with the use of the PLL clock
outputreg_scan_en must be held in shift mode), but chopper. In this design, the PLL reference clock was
input partition cells must capture data on the D pin also used as the scan shift clock. This allowed for a
(therefore inputreg_scan_en inactive). Without this single pin as the source of the clock input during test.
technique, transition coverage would be lost at the In SAF and TDF testing, the shift clock was the same
partition chains, both input and output. This could as the reference clock. When the mode was changed
represent a significant loss in total coverage. from shift to capture, the output of the PLL clock
chopper was the source for the capture clocks.
In the case study design, the TDF testing was However, the operation of the EDA tool caused faulty
performed inside the de-compressor/compactor logic. behavior. A report was provided to the customer
The tests were created using a simple switch within support group at the vendor and within days, a patch
the ATPG tools. However, the clocking for the was made available. This capability is now supported
transition delay fault testing was generated on chip in a production release of the EDA tool.
using a PLL Clock Chopping circuit. While the clocks
used in this design are low enough speeds to allow the Another challenge happened with transition delay fault
tester to supply them, the clock chopper provides testing. When transition patterns were created for the
flexibility in the future to test the devices on lower core only, the coverage was on the order of 84%,
cost equipment. which was within our target goal. However, when the
total chip was included, and the test was directed at

that same core, the coverage dropped to 74%. When it
was investigated, the source of the problem was due to
transition patterns that were launched from the input 11.0 Conclusion
partition register, traversed through combinational In this case study design, there were two major goals
path and captured into another input partition register. that drove the team to use a hierarchical DFT
But, when the core level block was tested, the Scan methodology with ac scan enhancements and on-chip
Enable for the input partition register was held in compression. One was the creation of a core that
“shift” mode. This prevented the capture of the faults would be used across multiple divisions. Another was
in the second input partition register. to allow for other cores to be delivered by outside
design groups that would be readily integrated into the
An enhancement to the methodology was made to total chip.
cover the logic that appears between two input
partition registers that are part of a launch-capture In addition, other design elements required us to
pair. The functional logic resides within the modify the previous methodology to meet the DFT
hierarchical block of logic, but appears between needs. With nanometer design (the target technology
partition registers, because the target core did not was 0.13µ), at-speed testing was required to achieve
register inputs to the block. In the process of the highest quality of test. By adding transition fault
identifying and inserting partition registers into patterns at the target frequencies of the device, the
partition chains, there were cases where some logic volume of test data grew too large for the target tester.
was present between two input partition registers. As a result, compression was added to the design in
the hierarchical flow.
That logic is not covered in the normal operation of
transition fault pattern generation with this We also enhanced the methodology to pick up
hierarchical methodology. The solution is to perform transition coverage at the boundaries of the blocks,
two passes of pattern generation for the target core. In and to provide some flexibility in the selection of
the first pass, the input scan enable is held to a “1” which cores to test simultaneously. While testing all
(shift). Patterns are generated. In the second pass, the cores simultaneously would exceed the power
pin is left unconstrained and allows the tool to control dissipation capabilities of the device (and potentially
the shift/capture for the input chain. In that case, those cause IR drop issues), a selectable technique for
transition faults that are in the cloud of logic between testing one or multiple cores at once was used.
input partition registers are covered. The results showed that this enhanced methodology
Note that this enhancement is not required when all met the quality goals of our customer while allowing
inputs to the block are registered. the tests to run on existing test equipment in a
reasonable amount of time.
Another issue is encountered with the enhancement of
testing multiple cores at once. In the motivation for
using the hierarchical methodology, one of the reasons 12.0 Future work
is to reduce the size of the problem solved by the In follow on designs, more automation of the flow is
ATPG tools. The tools run significantly faster planned. Between working with the EDA vendor and
operating on one core at a time. When four cores are creation of more scripting capabilities, much of the
tested simultaneously, the pattern generation time hand crafting that was required in this case study
grows, the requirements on compute resources design will be improved.
increases, the validation cycle grows (simulation
times), and the debug process for patterns on the tester Other future enhancements will involve the capability
is more difficult. However, this enhanced to deliver not just the core with scripts, but the core
methodology provides the ability to test the cores with a set of test patterns that can be reused in other
individually. The requirement is to generate new devices.
pattern sets targeted at each core independently. This Also we will be adding the capability to have more
is accomplished by loading the fault list for only the selection of cores in testing. In this case study design,
target core and running the ATPG tool against that list. a trade-off was made to hard wire specific
The downside to this process is the development of configurations of cores for simultaneous testing.
multiple pattern sets for the device. Future efforts will be made to have more options
In the initial bring-up of first silicon, it may be easier available.
to debug problems by targeting a single core per test Finally we will offer more automation in the handling
phase. As soon as the devices are successfully tested of cores that do not have registered I/Os by identifying
(and the load board is verified), a ‘final’ production the feedback from internal logic into partition scan
pattern set can be generated to minimize the overall registers.
test time by targeting multiple cores per test phase.

Acknowledgements 12. Eric Larsson, Zebo Peng “A Reconfigurable Power-
Conscious Core Wrapper and its Application to SOC Test
Special thanks are in order for Parag Sheth, Tony
Scheduling”, International Test Conference 2003
Kulesa, Ronen Habot, and Minjing Wang from
Conexant Corporation. Martin Buehring from Mentor 13. Sandeep Kumar Goel, Erik Jan Marinissen
Graphics was also a significant contributor to our “Effective and Efficient Test Architecture Design for
success on the project. Al Crouch also helped in the SOCs”, International Test Conference 2002
review and submission of this case study. His
experience and insight were keys in the development 14. Rainer Dorsch, Ramón Huerta Rivera, Hans-Joachim
and delivery of this information. Wunderlich, Martin Fischer “Adapting an SoC to ATE
Concurrent Test Capabilitie”s, International Test
Conference 2002
References
1. Jeffrey Remmers, Rick Fisette, Mauricio Villalba,
“Hierarchical DFT Methodology – A Case Study”,
International Test Conference 2004
2. Alfred L. Crouch, “Design for Test for Digital IC’s
and Embedded Core Systems”.
3. Sengupta, Kundu, Chakravarty, Parvathala,
Galivanche, Kosonocky, Rodgers, Mak, “Defect-Based
Test: A Key Enabler for Successful Migration to
Structural Test”, Intel Technology Journal Q1’99.
4. Ron Press, Janusz Rajski, “Heading off test problems
posed by SoC”, EE Times, October 16, 2000.
5. Hari Balachandran, Kenneth M Butler, Neil
Simpson, “Facilitating Rapid First Silicon Debug”,
International Test Conference 2002.
6. Bailey, Metayer, Svrcek, Tendolkar, Wolf, Fiene,
Alexander, Woltenberg, Raina, “Test Methodology for
Motorola’s High Performance e500 Core Based on
PowerPC Instruction Set Architecture”, International Test
Conference 2002.
7. Wilson, Ron, “DFT Takes on Test Cost in Final
Combat”, Integrated System Design, October 2001
8. Sematech “Overview of Quality and Reliability
Issues in the National Technology Roadmap for
Semiconductors” 98013448A-TR
9. Sabada, Sudhakar “Keynote: Nanometer Design
Challenges” Magma Nanometer Design Seminar October
2002
10. Rick Fisette, Gary Gebenlian, Jeff Remmers
“Efficient DFT Verification Methodology Using Skeleton
Netlist and ATPG tool, a Case Study” International Test
Synthesis Workshop 2003
11. Vikram Iyengar y , Krishnendu Chakrabarty and
Erik Jan Marinissen “Test Wrapper and Test Access
Mechanism Co-Optimization for System-On-Chip”,
International Test Conference 2001

Hierarchical DFT With Enhancements For AC Scan, Test Scheduling and On-Chip Compression - A Case Study

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hierarchical DFT With Enhancements For AC Scan, Test Scheduling and On-Chip Compression - A Case Study

Uploaded by

Copyright:

Available Formats

Hierarchical DFT with Enhancements for AC Scan, Test Scheduling

and On-chip Compression – A Case Study

Jeffrey Remmers Darin Lee Richard Fisette

Paper 29.3 INTERNATIONAL TEST CONFERENCE 1

Paper 29.3 INTERNATIONAL TEST CONFERENCE 2

Paper 29.3 INTERNATIONAL TEST CONFERENCE 3

Figure 2 – Channels in and out of the core

Paper 29.3 INTERNATIONAL TEST CONFERENCE 4

level logic testing). P0

Note: HM – HardMacro, SA – Sand, CT - CompoundTop, Top

• Phase 0 - four HardMacro cores are tested in parallel.

Paper 29.3 INTERNATIONAL TEST CONFERENCE 5

select the core under test according to Table 3. R(V2)

and controls the clock chopper to test the target core

• Instructions schedule the Memory BIST test events

Paper 29.3 INTERNATIONAL TEST CONFERENCE 6

is held active (shift only) during the launch/capture SCAN_SHIFT_CLK_SEL

state into the partition cell during the second pulse.

Paper 29.3 INTERNATIONAL TEST CONFERENCE 7

Paper 29.3 INTERNATIONAL TEST CONFERENCE 8

Paper 29.3 INTERNATIONAL TEST CONFERENCE 9

You might also like