You are on page 1of 15

DFT NOTES

WHAT IS DFT AND WHY DO WE NEED IT?


• A simple answer is DFT is a technique, which facilitates a design to become testable after production. Its the
extra logic which we put in the normal design, during the design process, which helps its post-production testing.
Post-production testing is necessary because, the process of manufacturing is not 100% error free. There are
defects in silicon which contribute towards the errors introduced in the physical device. Of course a chip will not
work as per the specifications if there are any errors introduced in the production process. But the question is
how to detect that. Since, to run all the functional tests on each of say a million physical devices produced or
manufactured, is very time consuming, there was a need to device some method, which can make us believe
without running full exhaustive tests on the physical device, that the device has been manufactured correctly.
DFT is the answer for that. It is a technique which only detects that a physical is faulty or is not faulty. After the
post-production test is done on a device, if it is found faulty, trash it, don’t ship to customers, if it is found to be
good, ship it to customers. Since it is a production fault, there is assumed to be no cure. So it is just a detection,
not even a localization of the fault. That is our intended purpose of DFT. For the end customer, the DFT logic
present on the device is a redundant logic. To further justify the need of DFT logic, consider an example
where a company needs to provide 1 Million chips to its customer. If there isn’t any DFT logic in the chip,
and it takes for example, 10 seconds (Its very kind and liberal to take 10 seconds as an example, in fact it
can be much larger than that) to test a physical device, then it will take approx. three and a half months
just to test the devices before shipping. So the DFT is all about reducing three and a half months to may be
three and a half days. Of course practically many testers will be employed to test the chips in parallel to help
reduce the test time.
DFT VS VERIFICATION

• Definitions
Design synthesis:
Given an I/O function, develop a procedure to manufacture a
device using known materials and processes.
Verification:
Predictive analysis to ensure that the synthesized design,
when manufactured, will perform the given I/O function.
Design for Test:
A manufacturing step that ensures that the physical device,
manufactured from the synthesized design, has no
manufacturing defect.
SERIAL/PARALLEL SIMULATION

• In serial simulation pattern is shifted in to FFs in scan chain through scan-ports. So, more the no. of
FFs more time it will take to shift in the pattern. Note that, here, simulator will apply patterns at
scan-ports.

In parallel simulation values that should have reached the D input of FFs in a scan-chain by
shifting through scan-inputs are directly forced at the D input of FFs by simulator. So here, simulator
will save time to load-unload the patterns as it is directly forcing patterns at D input of flops.

So, as parallel simulation reduces the time for shifting the pattern you can easily decide which
mode to use.
Use parallel simulation when you want to reduce the simulation time.
SETUP TIME & HOLD TIME

• Setup time is defined as the minimum amount of time BEFORE the clock’s active edge by which
the data must be stable for it to be latched correctly. Any violation in this minimum required
time causes incorrect data to be captured and is known as setup violation.

• Hold time is defined as the minimum amount of time AFTER the clock’s active edge during
which the data must be stable. Any violation in this required time causes incorrect data to be
latched and is known as hold violation.

• The setup time in a design determines the maximum frequency at which the chip can run
without any timing failures.
WHAT IS AN HDL?

• HDL(Hardware Description Language):- A hardware description language or HDL is any


language from a class of computer languages for formal description of electronic circuits. It
can describe the circuit's operation, its design, and tests to verify its operation by means of
simulation.
• HDL specifies a model for the expected behavior of a circuit before that circuit is designed
and built. The end result is a silicon chip that would be manufactured in a fab.
• A simulation program, designed to implement the underlying semantics of the language
statements, coupled with simulating the progress of time, provides the hardware designer with
the ability to model a piece of hardware before it is physically created.
• Two applications of HDL processing: Simulation and Synthesis.
WHY SIMULATE FIRST?
• Physical bread-boarding is not possible as designs reach higher levels of integration.
• A simulator interprets the HDL description and produces a readable output, such as a
timing diagram, that predicts how the hardware will behave before it is actually
fabricated.
• Simulation allows the detection of functional errors in a design without having to
physically create the circuit.
Logic Simulation
• The stimulus that tests the functionality of the design is called a test bench.
• To simulate, the design is first described in HDL, verified by simulating the design and
checking it with a test bench which is also written in HDL.
• Logic simulation is a fast, accurate method of analyzing a circuit by checking functionality
using
WHAT IS SEQUENTIAL DEPTH IN DFT?
HOW DOES IT IMPROVE COVERAGE?
• sequential depth is the number of capture cycles executed before unloading your scan chains.
so if your sequential depth was one, you would have a pattern sequence as follows:
1) set scan_enable, load scan chain, unset scan_enable
2) execute 1 capture clock
3) set scan_enable, unload scan chains.
• Increasing your sequential depth allows you to get better fault coverage. Sequential depth is calculated
as the maximum number of FFs encountered from PI line to PO line.

• Sequential depth of a circuit = the distance of the longest path. The maximum allowable sequential depth
is 255. Typical depth would range from 2 to 5.
Coverage Improvement :
Testable faults become ATPG_untestable faults because of constraints, or limitations, placed on the ATPG
tool (such as a pin constraint or an insufficient sequential depth). These faults may be possibly-detectable,
if you remove some (pin/cell)constraint or change some limitation(ex: sequential depth) on the test
generator. Also,(while using Named Capture Procedure) gradually add more capture procedures with
higher sequential depth until the test coverage goal is achieved or the pattern count limit is reached.
DEBUGGING LOW TEST-COVERAGE SITUATIONS
• Scan is a structured test approach in which the overall function of an integrated circuit (IC) is broken
into smaller structures and tested individually. Every state element (D flip-flop or latch) is replaced
with a scan cell that operates as an equivalent state element and is concatenated into long shift
registers called “scan chains” in scan mode. All the internal state elements can be converted into
controllable and observable logic. This greatly simplifies the complexity of testing an IC by testing
small combinational logic segments between scan cells. Automatic test pattern generation (ATPG)
tools take advantage of scan to produce high-quality scan patterns.
• The combination of scan and ATPG tools has been shown to successfully detect the vast majority of
manufacturing defects. When you use an ATPG tool, your goal should be to achieve the highest
coverage of defects as possible. Because high test coverage directly correlates to the quality of the
parts shipped, many companies demand that the coverage for single stuck-at faults be at least 99%
and transition delay faults be at least 90%.
• When the coverage report falls short of these goals, your task is to figure out why the coverage is
not high enough and perform corrective actions where possible. Debugging low defect coverage
historically requires a significant amount of manual technique and intimate knowledge of the ATPG
tool, as well as design experience especially when device complexity increases.
CONTINUED.. INTERPRETING THE MYSTERIES OF ATPG STATISTICS

• The ATPG tool generates a “statistics report” that tells you what the tool has done and provides the fault
category information that you have to interpret to debug coverage problems. If you’re an expert at
using an ATPG tool, you’ll probably have little problem understanding the fault categories listed in the
statistics report.
• When debugging low coverage, you’ll need to understand some of the basic fault categories that are
listed in most typical ATPG statistics reports. The first and broadest category is what is sometimes
referred to as the “fault universe.” This is the total number of faults in a design. For example, when
dealing with single stuck-at faults, you have two faults for each instance/pin, stuck_at logic 1 and
stuck_at logic 0, where the instance is the full hierarchical path name to a library cell instantiated in the
design netlist.
• This number of total faults really is only important when comparing different ATPG tools against each
other. The total number can vary if “internal” faulting is turned on and whether or not “collapsed” faults
are used. Internal faulting extends the fault site down to the ATPG-model level, rather than limiting it to
the library-cell level. ATPG tools, for efficiency purposes, are designed to collapse equivalent faults
whenever possible. Typically, you’ll want to have the internal faults setting turned off and uncollapsed
faults setting turned on. These settings most closely match the faults represented in the design netlist.
CONTINUED.. NEED TO CARE UNTESTABLE/UNDETECTABLE FAULTS?
• Faults that cannot possibly be tested are reported as untestable or undetectable. This includes faults that are
typically referred to as unused, tied, blocked, and redundant. For example, a tied fault is one in which the
designer has purposely tied a pin to logic high or logic low. If a stuck-at-1 defect were to occur on a pin that
is tied high, you could not test for it because that would require the tool to be able to toggle the pin to logic
low. This cannot be done because of the design restriction, so the fault is categorized as “untestable.”
• Untestable/undetectable faults are significant for two reasons. First, they distinguish “fault coverage” from
“test coverage,” both of which are reported by ATPG tools. When most tools calculate coverage, fault
coverage includes all the faults in the design.
• Test coverage subtracts the untestable/undetectable faults from the total number of faults when calculating
coverage. For this reason, the reported number for test coverage is typically higher than fault coverage.
• The second reason that untestable/undetectable faults are important is that nothing can be done to improve
the coverage of these faults; therefore, you should direct your debugging efforts elsewhere.
• One last thing to be aware of regarding untestable/undetectable faults is that ATPG-tool vendors vary in
how they categorize these faults. These differences can result in coverage discrepancies when comparing the
results of each tool.
CONTINUED.. WHAT IS MORE IMPORTANT -TEST COVERAGE OR FAULT COVERAGE?
• This begs a question as to which is the more critical figure: test coverage or fault coverage? Most
engineers, but not all, rely on the higher test coverage number. The justification for ignoring
untestable/undetectable faults is that any defect that occurs at one of those fault locations will not
cause the device to functionally fail. For example, if a stuck-at 1 defect occurred on a pin that is tied
high by design, the part will not fail in functional operation. Others would argue that fault coverage is
more important because any defect, even an untestable defect, is significant because it represents a
problem in the manufacturing of the device. That debate won’t be explored here though.
• Some faults are testable, meaning that a defect at these fault sites would result in a functional failure.
Unfortunately, ATPG tools cannot produce patterns to detect all of the testable faults. These testable
but undetected faults are called “ATPG_untestable” (AU).
• Of all the fault categories listed in an ATPG statistics report, AU is the most significant category that
negatively affects test coverage and fault coverage. Determining the reasons why ATPG is unable to
produce a pattern to detect these faults and coming up with a strategy to improve the coverage is the
biggest challenge to debugging low-coverage problems.
• most common reasons why faults may be ATPG_untestable: Pin constraints, Black-box models, RAMs,
Cell constraints, ATPG constraints, False/multicycle paths etc..
CONTINUED.. MOST COMMON REASONS WHY FAULTS MAY BE ATPG_UNTESTABLE:
• Pin constraints: At least one input signal (usually more than one) is required to be constrained to a constant value to enable
test mode. While this constraint makes testing possible, it also results in blocking the propagation of some faults because the
logic is held in a constant state. Unless you have special knowledge to the contrary, these pin constraints must be adhered to,
which means you cannot recover this coverage loss.
• Determining the effect on coverage loss is not as simple as counting the number of constrained faults on the net. The effect on
defect coverage also extends to all the logic gates that have an input tied and whatever upstream faults are blocked by
that constraint. Faults downstream from the tied logic have limited control, which further affects coverage.
• Black-box models: When an ATPG model is not available for a module, a library cell, or more commonly a memory, ATPG
tools treat them as “black boxes,” which propagates a fixed value (often an “X” or unknown value). Faults in the “shadow” of
these black boxes (i.e., faults whose control and observation are affected by their proximity to the black box), will not be
detected. This includes faults in the logic cone driving each black-box input as well as the logic cones driven by the outputs.
Obtaining an exact number of undetected faults is complicated by the fact that some of those faults may also be in other
overlapping cones that are detected. The solution is to ensure that everything is modeled in the design.
• Random access memory: In the absence of either bypass logic or the ability to write/read through RAMs, faults in the
shadow of the RAM may be undetected. Like black-box faults, it is difficult determine exactly which faults are not detected
because of potentially overlapping cones of logic.
• If you make design changes, adding bypass logic may address this problem. Some ATPG tools are capable of special
“RAM-sequential” patterns that can propagate faults through memories so long as the applicable design rule checks (DRCs)
are satisfied. This may be an option to get around having to modify the design to improve coverage.
CONTINUED.. MOST COMMON REASONS WHY FAULTS MAY BE ATPG_UNTESTABLE:
• Cell constraints: Sometimes you need to constrain scan cells regarding what values they are capable of loading and
capturing (usually for timing-related reasons). These constraints imposed on the ATPG tool will prevent some faults from
being detected. If the cell constraint is one that limits capturing, then to determine the effect, you’ll need to look at the cone
of logic that drives the scan cell and sift out faults that are detected by overlapping cones.
• If found early enough in the design cycle, the underlying timing issue can possibly be corrected, which makes cell constraints
unnecessary. However, this type of timing problem is often found too late in the design cycle to be changed. Using cell
constraints is a bandage approach to getting patterns to pass, and the resulting test coverage loss is the price to be paid.
• ATPG constraints: You may impose additional constraints on the ATPG tool to ensure that certain areas of the design are
held in a desired state. For example, let’s say you need to hold an internal bus driving in one direction. As with all types of
constraints, parts of the design will be prevented from toggling, which limits test coverage. Similar to pin constraints, if the
assumption is that these are necessary for the test to work, the coverage loss cannot be addressed.
• False/multicycle paths: Some limitations to test coverage are specific to at-speed testing. False paths cannot be tested at
functional frequencies; therefore, ATPG must be prevented from doing so to avoid failures on the automatic test equipment.
Because transition-delay fault (TDF) patterns use only one at-speed cycle to propagate faults, multicycle paths (which
require more than one cycle) must also be masked out. Determining which faults are not detected in false paths is
complicated by the manner in which false paths are defined.
• Delay-constraint files usually specify a path by designating “-from”, “-to” and possibly “-through” to describe a start and
end point of the path. In between those points, there can be a significant amount of logic to trace and potentially multiple
paths if you don’t use “-through” to specify the exact path.
CONTINUED.. STEPS TO IDENTIFY AND QUANTIFY COVERAGE ISSUES
There are three aspects of the debug challenge:

• How you identify which coverage issues (as described above) exist,
• How you determine the effect each issue has on the coverage, and
• What, if anything, you can do to improve the coverage.
Typically, we have had to rely on a significant amount of design experience as well as ATPG tool proficiency to manually determine and quantify the
effects of design characteristics or ATPG settings that limit coverage. The usual steps that are required to manually debug fault coverage are:

• Identify a common thread in the AU faults.


• Investigate a single representative fault.
• Rely on your experience to recognize trends.
• Determine the effect of the issue on test coverage.
Let’s look at these steps in turn. When it comes to identifying a common thread in the AU faults, it is extremely difficult to identify a single problem by
looking at a list of AU faults. You must recognize trends in either the text listing of faults or graphical view of faults relative to the design hierarchy.
For example, a long list of faults that are obviously contained in the design hierarchy of the boundary scan logic may be caused by a single problem.

At some point, you’ll need to focus your analysis efforts on one fault at a time, so pick one you think might represent a larger group of faults. For
example, if a significant number of boundary scan faults are listed as AU, this may be an indication that the boundary-scan logic has been initialized
to a certain desired state and must be held in that state to operate properly.

Once an issue is identified, how you determine its significance will be different depending on the issue. As previously described, you often need to
keep track of backward and forward cones of logic fanning out from a single constrained point to determine the potential group of affected faults.
From there, you also need to evaluate each of those potential faults to assess if it is possibly observed in another overlapping cone of logic.

You might also like