You are on page 1of 19

Slide 1

We have studied various processes involved in IC (integrated chips) fabrication so far. Now,
in this chapter, we will be focusing on testing of IC.

Basics: In this chapter we will learn the details of testing a chip during and after fabrication. We
will also learn that some of the chips which are sold with different speeds such as 2.6 GHz or 2.8
GHz may come from the same wafer. At the end of manufacturing line, each chip will be
thoroughly tested. There is no one test which says that the chip is functioning or not functioning.
We know that a chip contains millions of transistors. These transistors are not tested one by one,
but instead the chip is divided into blocks and each block is tested thoroughly.

For example a processor chip may be divided into few blocks called memory, logic, timer or
clock. And in few other chips there is also a part called analog. This logic block or the memory
block can also be subdivided into smaller blocks and tested separately. We can use an example to
understand the testing and sorting or separating process. Consider a T-shirt manufacturing
facility for a particular design. In this facility, for a particular design, large, medium and small T-
shirts may be made. The large will need a lot of cloth, medium, less amount of cloth and the
small, the least cloth. Hence the large will be sold at a higher price, the medium at intermediate
price and the small at a lowest price.

If thousands of large sized shirts are made, all of them will be checked for quality and the ones
with slight damage may be sold under discounted price. Similarly, quality check will be done for
the small and the medium sized T-shirts. The shirts which are passing all the tests (i.e. without
even the slightest damage) will be sold at high price; the ones with slight damage will be sold at
discounted price. If any shirt has lot of damage, it has to be thrown away. For example, if a large
size T-shirt is severely damaged, it cannot be sold as a medium size T-shirt (even though the
medium size T-shirt is lower priced). A “poor quality large sized T-shirt” cannot be sold as an
“acceptable quality medium sized T-shirt”. Next, if we find that, on a particular day, many T-
shirts are damaged, then we have to find out the cause of this. Then we can take necessary action
to eliminate or at least minimize the damage.

Now let us consider testing in the context of IC manufacturing. A company such as Intel or AMD
makes processor chips. They are sold with specifications such as a speed of 3 GHz, 3.2 GHz or
2.8 GHz. A chip may be designed to run at 3.2 GHz. During testing, if it runs as per design, it
will be sold as 3.2 GHz. If it does not run well but it is able to run at 3GHz, then the same chip
will be sold as a 3 GHz. It is equivalent to selling as slightly damaged T-shirt, but still of
acceptable quality, at the discounted price. But remember that one cannot sell a severely
damaged T-shirts. Similarly if the 3.2 GHz chip does not work at 3.2 or 3, then the
manufacturing company cannot sell it as 2.8 GHz chip. It has to thrown away. The company will
have a separate design for a chip running at 2.8 GHz. There good chips will be sold as 2.8
GHz, and slightly poorer quality will be sold as 2.6 GHz and anything which does not satisfy
either one speed will be thrown away. Please note that the speed values in GHz are given as
examples. The manufacturing companies may not reveal the details.

Now the good chips can be packaged and sold. What is the use of the severely damaged chip?
While one cannot use it as a processor, one can analyze it and find the cause of the failure. In the
T-shirt analogy, if we find that the damage occur mainly one area, then we can go to the section
that area is manufactured, monitor it and find out the source of the problem. Similarly, in the
failed chips, we have to find whether any particular block is failing and if it is a design problem
or a manufacturing problem (i.e. if some process is not run correctly).

Failure Analysis:
In an IC manufacturing facility, there is a separate section called “failure analysis” (FA) section.
There, very sophisticated instruments are used to remove the materials from the failed chips
layer by layer and find the exact location of the failure. To do FA, it may take one or two days for
a single chip. Thus, many chips cannot be analyzed by this method and just by analyzing few
chips, one has to find the major cause of failure.

Courtesy: Prof. S. Ramanathan, IIT Madras.


Slide 2

Introduction
Scope
Testing (quality control) is done at various steps
Our focus: Only electrical tests
Other tests: Defectivity test, etc
Electrical Testing
Process check (Only overall condition)
PARAMETRIC or E-Test
Product check (Pass/Fail for each chip)
BIN or SORT
RELIABILITY

2
4-May-22

Testing is not only done after it is fabricated. As so many steps are involved in fabrication, we
have to test the chips at various stages of fabrication as well. So that, if you find some issues in
the middle of the fabrication, you could take the corrective measure and then move on with the
other steps.

In this chapter, we will be focusing only on electrical testing. Optical testing such as defectivity
(Is there any particles/residues found on the wafer, scratches on the wafer) tests ate not covered.

The remaining things which are mentioned in this slide will be discussed in the upcoming slides.
Slide 3

Review from Mask: Basics


A mask may be 100 mm x 100 mm (for example)
So, one ‘print’ will be about 25 mm by 25 mm, on the
wafer.
A chip may be only 5 mm by 4 mm
So, one mask will have perhaps 20 chips, if the chip is
small

The gaps between the ‘real chips’


are used for
alignment marks (TBD)
test structures (TBD)

3
4-May-22

Before seeing the testing procedure, remember that we discussed this slide in chapter 3 while
talking about mask. I told that “the test structures will be there in between the actual chips in the
mask which I will discuss later”. So, now we will discuss about it.
Slide 4

Process Test (parametric)


Process Check
Simple structures, made between the ‘real’ chips
If the structures are very bad (all shorted, all open/ broken)
Chips are also likely to be ‘dead’
Not worth processing further (if the wafers are in the midst
of processing)
Not worth testing the chips (if the wafers have completed
processing)
So, process check is done
in the ‘line’, at some standard steps (after M1 CMP and so on...)
and at the end, just before the chips are tested for pass/fail
The ‘simple structures’ used for process checks are normally called as
‘Scribe line’ or ‘Kerf’
Process check is called “scribeline test’ or ‘kerf test’
4
4-May-22

The process tests are also called parametric tests. It is mainly intended to check the processes
employed in the IC fabrication.

For this test, we make simple structures (as shown in next slide) b/w the real chips.

If the structure printed on the wafers are bad, then definitely real chips will not be printed
correctly as the chips designs are complicated than these test structures.

Will all the wafers and chips go through the complete testing? The answer is “no”. Here again an
example will help explain the logic behind this answer. Assume that you go to a shop to buy a
computer. Let us assume that all the computers are at the table and switched off and that you do
not have a firm idea on the model that you want to buy. How would one make a decision about
the purchase? First you will switch on the computer, run perhaps an application, or play a movie,
or connect to internet and open a website. (You will also look into the specifications and then
decide). We can call the initial stages as the superficial testing. If the computer or the monitor or
the keyboard looks broken or old, you will not test it at all because you will assume that if it is
broken at the outside, it is probably not in good condition. In case you see that all the computers
in display look old or broken, even if they are functioning, you will not evaluate the computers
there. You will move on to the next shop.

In the same way, for microelectronic chips, a superficial testing will be done at the wafer level
based on these test structures. These structures are sometimes called kerf or scribe lines and
these tests are called parametric tests or process tests. These structures are frequently tested
during the process. If these structures fail, then the complete test on the chips will not be done.
A ‘batch of wafers’ may contain 25 or 10 or 13 wafers (depending on the size of the wafer. In
case of 200 mm wafer, 25 wafers are called a batch. In 300 mm wafers, 10 or 13 wafers are
called a batch). If the first two or three wafers are tested parametrically, and if all of them fail,
the remaining wafers will not be even tested parametrically or completely. The testing equipment
is extremely expensive and there are limited numbers of equipment available in each fab. In
order to test each chip and each wafer completely, it takes a significant amount of time. Thus if
superficial testing (parametric testing) shows that the structure are likely to be severely
damaged, the company will not spend much time and use expensive testing equipment on each
and every individual chips because they are likely to be failing chips. The parametric testing is
done frequently (e.g. after the transistors are made, then again M1 is made, perhaps after M2 is
made and at the end of the manufacturing steps, just before complete test)
Slide 5

Process Test (parametric)


Basic check for shots/opens:
Metal lines: Snakes, Combs
Snake & Comb for both open and
shorts
At each layer
Many structures
==> many tests, time
few structures
==> lower ability to predict fails
low resolution

5
4-May-22

Shorts and opens:


The basic tests in the scribe line structures consist of checks for shorts and open. “Shorts” means
two metal lines which are supposed to be separated electrically are actually short circuited.
“Open” means lines which are supposed to be electrically connected are actually disconnected.
The following figures shows the examples of the electrical structures used to check for this. This
are commonly called snake or comb and these are made at each metal layer (e.g. poly silicon
snakes, M1 snakes or combs, but not via or contact). Only a few such structures are made, which
means the ability to estimate the failure is also low. i.e. The process may actually be able to make
these simple structures easily but it may not be able to make all the complicated structures in the
actual chips fully. But since it is only a superficial test, this estimate is sufficient. A snake
structure consists of a single metal line with winds like a snake and this is used for testing for
opens. If the line is not continuous because of some problem in the process, then the resistance
will be too high or it can be even infinity (if the line is completely broken). There might be tens
or hundreds of the test structures and if, 99 out 100 pass, then the quality is considered as
‘acceptable’. If 10 or more out of 100 fail, then the conclusion will be that the process is not
good; the logic is, if even for these simple structures 10 out of 100 fail, then we try to make
complicated structures (in the actual chip), most likely most of the chips will fail.

To measure the shorts, the structure shown in Fig (bottom one) is used. This looks like a comb
kept next to another comb, and this whole structure is frequently called comb structure. If the
processes are run correctly, then structure will be made properly and there will not be any shorts
(current leakage) between these two combs. If there is any short circuit, then there will be
leakage currents between these combs.
We have to perform these tests at many stages and at many times to predict the failure of the
chip.
Slide 6

Process Test (parametric)


Example of shorts:

What is expected in
the electrical test
Very low resistance (or high leakage © micro magazine
current)
Plot of leakage current (CDF) CDF plot
sorted values
log scale to identify ppm or ppb level defects
Stop processing the wafers with
shorts/opens
Helps isolate problematic modules 6
4-May-22

In the typical fab, these tests will be conducted and the results will be plotted in a particular
format, for example as a cumulative density function plot (CDF). These plots are helpful in
identifying even a small number of failures. Logarithmic scale is used to identify very small
levels such as ppm or ppb levels of defects. When these analyses are performed for test results
corresponding to individual modules (such as M1 or M2), they also help in identifying where a
problem lies.
Slide 7

Process Test (parametric)


VDP

7
4-May-22

If we know that a chip fails, then it is not sufficiently detailed information. If we know that the
chips fails at M1 level for shorts, then it helps in identifying likely sources of the problem and in
taking remedial action to correct this. Few other structures are also used in the parametric or the
process test. One is called VDP, which is an acronym for Van-der-Pauw. Using this structure, the
thickness or the sheet resistance of a deposited material can be determined. Using additional
structures, even the width of a line can be estimated. The resistance of a metal line is normally
expressed in terms of the resistivity of the material and the length and the cross sectional area of
the wire. However in semiconductor processing, we learnt that sheet resistance is used more
frequently. At M1, the metal thickness is same for all the lines but the lengths and widths vary.
Similarly at M2, they have a different thickness compared to M1 but among all the M2lines, the
thickness is the same (but the width and the length vary again). Thus it is clear that in the
semiconductor processing the thickness is the same for the given layer, but the width and the
length vary.

VDP structure is also called Greek cross.

The sheet resistance is calculated using the formula given in the slide. Here the voltage is applied
b/w C and D and the current is measured b/w A and B.
Slide 8

Tester
©Advantest
Ability to test at high temp
some may have low temp capability also

Die level vs package level

8
4-May-22
Slide 9

Test Program
C or script like
Normally well commented, reasonably readable
information on voltage applied (for example), store
the data in test name etc..
DUT (Device Under Test)
alignment and x,y movement

C-V test (for oxide) OR I-V results


time consuming
usually one result in a DUT in a die
TDDB, NBTI, ESD, EM
Temp. Dependent Dielectric Breakdown, Negative Bias Temperature Instability
Summary file (for water, for lot ...)
9
4-May-22

Complete test: The complete testing program looks similar to a C program or a script. The
testing programs are written in their own language, but most of them are easy to read. They are
also normally well commented and will have the information such as the name of the test, the
voltage applied, and the device under test (DUT). Apart from the test information, they also have
information on the alignment and the x-y movement of the tester. The test pins (wires to pass
potential or current to the chip) must be properly aligned. After a chip is tested, the wafer must
be moved slightly so that the next chip is placed correctly below the pins. Thus the alignment
and x-y movement information are also important. Some of the test may be capacitance-voltage
(CV) test for oxide or current-voltage (IV) test. These types of tests are very time consuming and
are not done frequently. Most of the tests done on modern chips are digital tests. Some of them
are usually referred to by acronyms such as TDDB (temperature dependence dielectric break
down), NBTI (negative bias temperature instability) and IDDQ (quiescent supply current) and so
on.
Slide 10

Product Test
Binning
Soft bin, hard bin
initial stages, test the least failing part
some times COF (complete on fail0
production stage, test the most failing part
always SOF (stop on fail)
Continuity (leakage, open)
Built In Self Test (BIST)
Functional
SCAN
Memory
Repair (yield can improve dramatically)
Fail Bit Map (FBM)
WL ==> M1 short, BL fail ==> via fail (for example) (see next
slide for figure)
10
4-May-22

Binning (Sort test) : The tested chips (also called dies) are sorted or separated based on the
failure modes. This process is called binning or sorting. Remember that a die (i.e. chip) will have
blocks such as memory, scan, clock etc. In the production stage, the block which tends to fail
more will always be tested first. If it fails, there is no use in testing the chip further. Hence, once
a chip fails, it will not be tested further. Based on the block where the failure occurs, and the
type of failure, it will be marked. One can imagine that it is thrown into a waste bin or waste
bucket; different buckets would be available and each bucket will correspond to a block and a
failure mode. There may be one bucket for chips failing in memory. There may be a second
bucket for chips failing in the scan region and a third bucket for chips having opens in the analog
region. There may be a fourth bucket for a chip failing in the clock region. Even within the
buckets, there may be different compartments. Within the scan region, if it fails for too much
current, it may go to one compartment, and if it fails for too little current, it may go to another
compartment. If a chip passes all the tests, then it is thrown in a ‘good chip’ bucket (i.e. bin).

However in practice, none of the chips are physically thrown in buckets. Instead, a computer
program will keep track of the mode of failure of each. This process is called binning, because
one can imagine that the chips are thrown in different buckets or bins. It we say, it fails in scan
region, then we know overall it has gone into the scan bucket. If we also know the compartment
within the bucket it has gone into, then we have more detailed data. The buckets are called hard
bin and the bucket with compartment information is called soft bin. Hard bin means we know the
overall reason about the failure. Soft bin means we also know the compartment in which it has
failed or compartment in which it has been placed (i.e. we know the detailed information on
the failure).
The binning of failing chips helps to determine the block which fails frequently. Based on the
data, the engineer will try to determine the cause of the issue and make the improvements
necessary. The actual troubleshooting requires in depth analysis and a lot of experience.
Typically, a fab will have a team of experts to perform the ‘yield analysis’ and suggest
improvements.
Typical test sequences: Normally the very first tests are the continuity tests to test the short and
the open, which will catch the problem if there is a severe shortage or severe open issue. The
next test is called built-in-self-test or BIST. Third is typically the sequence of tests which looks
into the logic area (also called functional area) and these tests are usually called scan. Most of the
chips now have built in memory or embedded memory. The testing for memory is different from
the testing for logic in one sense. That is the memory chip can be repaired if there is a failure and
the repair will improve the yield dramatically. The memory contains lot of identical circuits
called bits. If many bits fail in a memory part, the location of all failing bits can be marked as a
pattern. This is called failed bit map (FBM). FBM can help identify the problem in a process,
without actually doing failure analysis.
Slide 11

11
4-May-22

FBM: In the memory region, the bits are arranged in rows and columns. If one or two bits fail
randomly in a chip, then it is difficult to identify the cause of the failure. However, if a row of
bits have failed, based on the design, the engineer will be able to conclude that M1 line has
shorted (as an example). If a column of bits have failed, the engineer can conclude that a
particular metal line is open (i.e. the line is broken). The ‘row’s are called word lines and the
columns are called bit lines as shown in the schematic

If we find that in many chips, the ‘column failures’ are frequently observed, it is possible to
conclude that the metal process is sub optimal and that it has to be improved. In other areas of
the chip, such as the logic, the design is not repetitive. The design is complicated in those areas
and it is not easy to identify the cause of failures in those regions.
Memory Repair: In a memory chip, millions are bits are created and there is a good chance that
at least one of them may have a defect. A set of 1024 bits is called one kilobit, and a set of 1024
kilobits are called 1 megabit. If one designs the memory with exactly the required number of
bits, then it is likely that at least 1 in 10 chips will have a failure. i.e. the yield will be 90%.
However, most of the fabs achieve much more than 90% yield on a regular basis, for the
memory.
This is achieved by the following strategy. Instead of making exactly 1024 bits (for example) in
the design, the design will incorporate a few extra bits. If any of the bits in the main array fail,
then the connections to them is removed by a laser. i.e. the lines connecting the bits will be
melted (fused). The lines connecting the extra bits will be left alone. The first test is called “pre-
fuse” test. If all the bits of the main array are created properly and pass the ‘pre-fuse’ test, then
the lines connecting the extra bits will be disconnected using the laser. This is because the circuit
should have the exact number of bits and can’t have either shortage of bits or excess bits. After
fusing, the memory will be tested again (post-fuse test). The yield in post-fuse testing is usually
more than 95%. This process is called ‘repair’. If many bits in the memory part fail, then the
chip cannot be repaired. It is not worth adding many extra bits in the chip. Each bit will occupy
certain area, and adding too many extra bits will cause problem. Thus, there is a balance between
adding extra bits to enhance the memory yield and occupying extra space.
A processor in the logic area will have very complicated structures and only when the structures
are very repetitive like memory, it is possible to have excess bits and repair it. In the logic area, it
is not possible to have backup or redundant circuits. In few areas it may be possible to have a
little redundancy, but majority of the transistors will not be redundant and it will not be possible
to repair them.
Slide 12

Others

❑ High temperature test


❑ Optical testing

12
4-May-22

High temperature test: The chip will be subjected to high temperature and tested. In case of
military applications, the chip may be subjected to low temperature and tested also. (e.g. the
chips used in satellites or missiles will have to encounter both high and low temperatures and
hence these chips should be tested at harsh conditions before they are packaged and used).
Optical testing: The tests discussed above are called “electrical tests” since they are based on
electrical measurements. Other types of tests involve electromagnetic waves. Although they used
visible light waves many years ago, the modern equipment also use electromagnetic waves in
UV region. However, the name ‘optical’ testing is applied for all these tests regardless of the
wavelength used. KLA-Tencor is a leading manufacturer of the equipment used to identify the
defects. They have a branch in India (as of 2011), to cater to the needs of semiconductor industry
in south east asia.
The optical tests are used at various stages of processing, (i.e. before the chip is completely
fabricated). Specifically, they are used to determine if there are any defects on the wafers.
Defects may arise due to dust particles falling on wafer, or due to poor process. For example,
during CVD, dust particles present in the chamber may fall on the wafer. If the dust particles are
observed frequently, then the CVD chamber must be cleaned before it is used. In another
example, during CMP, deep scratches may be formed on the wafer due to large abrasive particles
present in the slurry. In that case, the slurry must be filtered so that only small abrasive particles
are present.
The sizes of defects and dust particles that must be observed are very small (< 100 nm). It must
be noted that the features present in the chip, such as interconnect copper lines, are also of
similar size. Hence the equipment must be capable of distinguishing between the desired features
(such as interconnect lines) and unwanted ones, such as dusts. If the circuits were simple, it
would be possible to compare the actual image with the ideal image (based on the layout).
However, in complicated circuits, it is not possible to load all the layout details and compare
with the image, in a short time. One way of handling this is to image two neighboring chips
simultaneously and compare them. If both images are the same, then it can be assumed that there
are no defects. When they are different, the presence of defect can be identified.

You might also like