You are on page 1of 9

http://www.techonline.com/community/ed_resource/feature_article/20421?

print

TechOnLine Publication Date:  Apr. 12, 2002

Electromigration for Designers: An Introduction for the


Non-Specialist
J.R. Lloyd
IBM TJ Watson Research Center

Reliability is as much a key to success in the microelectronics industry as is performance. Not only
must a product perform as desired, it must also work for an extended period of time without fail,
typically 10 years or more. It does little good to make the world's fastest microprocessor, if after two
weeks of operation it fails. Except for very few applications, such as missile guidance systems that
only operate for a few seconds, anything other than superb long-term reliability would be
unacceptable.

With the complexity of today's microelectronics, a phenomenal level of reliability must be maintained.
For instance, if the probability of failure for a transistor is one in a million, and you have a million
transistors, failure is very near certainty. And yet, a modern IC can have more than 10 million circuit
elements. Therefore, for any acceptable reliability on the chip level, today's circuit elements must be
among the most reliable things ever built. In addition, reliability must continue to increase as the
complexity increases.

The reliability we have enjoyed thus far has not come without considerable cost. Billions of dollars and
the equivalent in Yen, Francs, Deutschmarks, and so on have been expended to solve the daunting
problems facing reliability engineers designing integrated circuits. The few wear-out failure
mechanisms that exist (hot carrier, time-dependent dielectric breakdown, and electromigration) have
become understood well enough that we can incorporate them into design tools.

We know the limitations to apply in order to delay any wear-out issues to long past the useful life.
However, to apply the limitations effectively, one must understand the limitations of the materials used
to manufacture ICs, and work around them. Overestimating the capabilities of the materials and the
process could spell disaster, and underestimating them could limit designs so severely that nothing of
commercial interest could be made. Striking a balance between conservatism and judicious use of the
process capabilities is necessary for continuous advancements.

ICs must work rather hard. High currents, high temperatures, and many thermal cycles eventually take
their toll. Just as any mechanical device, like an old car, boat or airplane, eventually fails from
repeated exposure to the everyday stress of operation, electrical stresses cause similar problems in
electronic components. Two types of reliability issues plague the industry: defect-related problems and
wear-out. Defect-related problems are caused by manufacturing defects, such as a missing process
step, dirt, or other unavoidable calamities. Even the best, most efficient process lines suffer from an
occasional defect related problem. Wear-out is due to the circuit or the product just wearing out,
without any initial defects being present.

Although redundancy and insensitivity to a failure mechanism may be up to designers, defects are in
the realm of the process engineer. Improved processes and statistical process control efforts often
reduce such failures to a minimum. Wear-out, on the other hand, which occurs due to limitations in the
"perfect" material, is a problem that lies squarely with the designer. One of the principal wear-out
failure mechanisms is electromigration. Fortunately, although not completely understood in all its
subtleties, it is controllable by proper design and a firm appreciation of where one can get into trouble.

Electromigration History
Electromigration is the mass transport of a metal due to the momentum transfer between conducting
electrons and diffusing metal atoms. Discovered more than 100 years ago, it became a concern only
when the relatively severe conditions necessary for operation of integrated circuits made it painfully
http://www.techonline.com/community/ed_resource/feature_article/20421?print

visible. Although electromigration, in principle, exists whenever current flows through a metal wire, the
conditions necessary for electromigration to be a problem simply did not exist back then. In bulk wires,
such as those used for home circuitry, the maximum current density is to about 10,000 A/cm 2 due to
Joule heating. Any current density even modestly exceeding this value will produce enough heat to
melt a metal wire; however, the driving force from electrons colliding into diffusing metal atoms would
be insufficient to make electromigration a significant problem. Only a research scientist would pass
enough current through a bulk metal wire to observe the effects of electromigration, and only with
great experimental difficulty. Therefore, for at least 100 years, electromigration was an interesting
problem in solid state physics, fascinating grist for the research mills at universities, but of no interest
whatsoever commercially.

All of this changed in 1966 when the IC made its commercial appearance. Electromigration was
rediscovered by a much larger audience, and with a vengeance. In ICs, electricity is conducted via thin
film stripes that are in direct contact with an effective heat sink. Because most of the heat generated
by the current is conducted away into the chip, thin film conductors can withstand current densities at
least two orders of magnitude greater than traditional bulk wires. This allows current densities of nearly
106 A/cm2 with minimal Joule heating. At these current densities electromigration becomes significant.

The first ICs were constructed with metal lines that were 10 mm in width or more—wide by today's
standards. At the same time they were exceedingly thin, on the order of 3000A. Furthermore, the
conductors were made of pure Aluminum, a material with a low melting temperature, which implies
fast diffusion at low temperatures. Very thin film contains small grains and thus many grain boundaries
that are conduits for even more rapid diffusion. This
combination of high current density and fast diffusion at low "A billion here and a billion there, pretty
temperatures was a recipe for disaster. soon……."

At IBM it was estimated that close to a billion


ICs were supposed to be very reliable and great hope was   1966 dollars were spent in the effort to  
placed in their use. When the first ICs were placed into service, understand and fix the problem of
they failed within weeks. The shock to the industry was electromigration failure. This was when a
tremendous. IC manufacturers were in a panic to understand billion dollars was a lot of money.
why they failed.

When parts returned from the field were subsequently examined, there was nothing visible, even
under a microscope. A relatively new research tool, the scanning electron microscope, was used and
failure sites were identified. The open circuits were very fine "cracks" in the metal, sometimes only a
few hundred angstroms wide. When the culprit was identified, the immediate fix was simple: make the
metal thicker. Easy with 10 mm wide lines, but not so easy today.

Since then, electromigration has not gone away, but it has come under control. The first solution was
to make the metal conductors more resistant to electromigration by alloying the Al with Copper (Cu),
initially up to 4%. This has changed due to processing considerations but today generally 0.5% Cu is
still alloyed with Al. The addition of Cu, of course, had a deleterious effect on the resistivity and low
resistance was available only by using relatively thick metal, 1.0 mm or so. Today, fine pitch circuits
cannot tolerate such thick metal, and other schemes are used to insure reliability.

The Physics of Electromigration


Electromigration is due to the momentum exchange between conducting electrons and diffusing metal
atoms. Simply stated, perhaps, but how does it happen?
Designers Beware.
In a perfect lattice, there is no resistance. Electrons move about
in a periodic potential with no other interaction with the metal Many reliability engineers working in
atoms. This may sound like superconductivity, but it isn't. The electromigration define the current exactly
opposite to the way you do. To them current
problem here is that a perfect lattice cannot exist above   is electron flow and positive current flow is in  
absolute zero due to missing atoms ("vacancies"), impurities, the direction the electrons are traveling.
boundaries between crystals of different orientation ("grain
boundaries"), and regions of imperfection ("dislocations"). Ben Franklin had a 50-50 chance of getting it
right.
http://www.techonline.com/community/ed_resource/feature_article/20421?print

Perhaps even more important, at any temperature above 0ºK, atomic vibrations occur. These
vibrations ("phonons") put a metal atom out its of perfect position about 10 13 times each second and
disturb the periodic potential, causing electron scattering. The scattering event makes the electron
change direction; any change in direction is accompanied by an acceleration; and for every
acceleration there is a force. After many collisions (another word for the scattering event), the force
averages out in the direction of electron flow.

The force due to collisions of electrons to metal atoms is called the momentum exchange. In
electromigration, momentum is exchanged between the electrons and the metal atoms and a change
in momentum with time is called a force. To provide sufficient momentum exchange to cause
measurable effects, many electrons must be available to collide
with the atoms. This can only happen in a metal. In metals, Sign of the Charge Carriers
many electrons are easily accelerated in an electric field.
Heavily doped polycrystalline silicon was
used to illustrate an interesting property of
Semiconductors have far fewer electrons and in a true electromigration physics. Both p-type and n-
type polysilicon resistors doped to
semiconductor, electromigration does not exist because there
  approximately 1% were stressed until failure  
just aren't enough charge carriers. However, electromigration in strong Joule heating induced temperature
can occur in semiconductor-like materials, such as silicon, gradients. In the n-type material, failure was
when they are so heavily doped that they act as if they were near the cathode and in p-type material
failure was near the anode, thus
metals. At dopant levels of around 1%, electromigration has demonstrating the role of the sign of the
been observed in polycrystalline silicon, but then the charge carrier in electromigration.
temperature coefficient of resistance (TCR) is positive. A
positive TCR is probably the best definition of a metal.

The size of the momentum exchange will be proportional to the distortion in the lattice at any given
point. This distortion is greatest when there is a vacancy nearby, or in the region of a grain boundary.
This is also where diffusion occurs. Vacancies or grain boundaries must be present for metal atoms to
move from their fixed positions in the crystal lattice ("diffuse"). You can't have two things in the same
place at the same time, so for an atom to move from site A to site B, site B must be vacant. In grain
boundaries the problem is less well defined, but the concept still applies. However, a boundary is a
region of distortion and open space, and the diffusion of atoms can be accommodated in these regions
rather easily as compared to the lattice. This creates a fortuitous situation where the greatest
momentum exchange occurs only at the sites where it is possible for atoms to move.

For the design engineer, electromigration physics can be simply stated. Electrons flow through a metal
film and collide with metal atoms. The collisions produce a force on the metal atoms in the direction of
electron flow (for n-type materials, opposite for p-type materials). Electromigration is only significant at
high current densities and only in metals. The magnitude of the electromigration force is proportional
to the current density.

Materials Science
The flux of metal atoms due to electromigration can be expressed rather simply, using an electrostatic
analogue and Einstein's equation for diffusion in a potential field.

where J is the atomic flux, D is the diffusion coefficient for the appropriate mass transport mechanism,
Z* is a quantity called the effective valence or the effective charge (although it is neither a charge nor a
valence) that represents the sign and the magnitude of the momentum exchange, r is the resistivity
and j is the current density. kT is the average thermal energy per atom. The important observation
from Equation 1 is that the electromigration-induced mass flux is directly proportional to the current
density, to the diffusion coefficient and to the concentration of diffusing atoms.
http://www.techonline.com/community/ed_resource/feature_article/20421?print

Just having an electromigration-induced mass flux is not enough to cause a problem. For a problem to
exist, either more or less mass must be entering a region than leaving it. If more mass is leaving than
arriving, we can form voids and open circuits. If more mass is entering than leaving, extrusions will
form short circuits or breaks in the passivation and provide an opportunity for corrosion. These regions
are called flux divergences. Unfortunately, many opportunities exist for flux divergences in a typical IC.

A principal source of trouble is in the unavoidable contact to silicon. The diffusion of Al from Silicon (Si)
is zero, and, hopefully, the diffusion of Si into Al is the same. Therefore, since electromigration will be
driving the Al away from the Si contact and attempting to stuff it into another, a serious problem can
result. Under the right circumstances, metal atoms will leave and none will replace them, so voids will
form at contacts where electron current is entering the metal from the Si. Conversely, extrusions will
be generated where the electrons are entering the Si.

Since contacts and other similar structures are unavoidable, the


potential for electromigration failure exists in any real circuit. All
we can do is design our circuits such that this inevitable Black's Law
problem is delayed until it no longer matters—and this is the In the late sixties, Jim Black of Motorola was
circuit designer's responsibility. heavily involved in understanding the
"cracked stripe" problem that was later
identified as electromigration. Jim's
pioneering work included the first careful
systematic investigations of electromigration
failure kinetics. His experiments uncovered
Effect of Current Density on Conductor the curious behavior that electromigration
Lifetime failures followed kinetics that depended not
on the inverse of the current density, but on
the inverse square.
From Equation 1 we see that the electromigration driving force
is proportional to the current density. It could be assumed that
   
electromigration failure would scale in the same way—linearly
with the current—but that is not always the case. Traditionally, it
has been observed that electromigration failure followed a 1/j 2 where t50 is the median time to failure in an
law rather than 1/j. This has become known as Black's Law. ensemble of samples, A is a constant that
needs to be empirically determined and DH is
However, whether this empirical law holds or not depends the activation energy for failure. The
entirely on whether the failures are nucleation or growth experimental values found for the activation
dominated. This, in turn, depends heavily on the process used energy suggested grain boundary diffusion as
to construct the metal lines. If there is no refractory "shunt the mass transport mechanism. For
layer" such as TiN or TiW under the Al line, failure is nucleation nucleation dominated failure, this equation
has proven to be adequate even to the
dominated and Black's Law holds. If, however, the failures are present day. Only small corrections, often too
growth dominated, such as is usually the case for W via failure small to be detected experimentally have
in narrow lines with shunt layers, Black's Law is not followed been needed to keep Black's Law consistent
and failure times are dependent on 1/j kinetics. Often, as might with the latest theoretical developments.
be expected, the failure process involves both nucleation and
growth of damage, and the behavior is more complicated and cannot be described by a simple power
law in j.

Wherever growth dominates or is a significant part of the failure time, we assume that 1/j kinetics hold.
Most recent experimental data where contacts or vias have been examined in the presence of
refractory conductive shunt layers has supported the use of 1/j kinetics, whereas most data on
conductor lines attached to bond pads has supported 1/j 2 kinetics.

To ensure that electromigration failure does not occur in the field, we need to limit the current density
such that electromigration failure will not become significant until long after the projected useful lifetime
of the circuit. This is a function of not only the current density in the metal lines and contacts, which
may behave differently, but also of temperature and often process variations.

Effect of Temperature on Current Density Limits


http://www.techonline.com/community/ed_resource/feature_article/20421?print

The major effect of temperature on electromigration is in the


diffusion coefficient. Diffusion is a thermally activated process
characterized by the Arrhenius relation and it possesses an Activation Energy
activation energy. The activation energy for self diffusion
depends strongly on the diffusion
mechanism. Diffusion can proceed through
the lattice, or grain boundaries, and along
interfaces or the surface. The lattice is the
most difficult path with the highest activation
energy (for Al DHlattice is about 1.4 eV),
followed by the grain boundary (for Al, DHgrain
  boundary is about 0.6 eV ) and then the surface.  
where D0 is a pre-exponential factor that depends on the
In Al, the surface is generally not available
diffusion mechanism and DH is the activation energy, also due to the presence of a coherent oxide film.
dependent on the diffusion mechanism. Interfacial diffusion activation energies differ
for every interface and can be either greater
or less than that for grain boundary diffusion.
Equation 3 shows that electromigration is very sensitive to Adding alloying elements generally has the
temperature. For Al, generally a change in temperature of 20 paradoxical effect of decreasing the lattice
degrees can double the rate of electromigration. Therefore, the and increasing the grain boundary activation
current permitted in a thin film conductor is a function of energies. The effect on interfaces is unclear.
temperature. The higher the temperature, the less current can
be permitted and still remain safe from electromigration failure.

Just how much current can be permitted and still maintain reliability as the temperature is changed will
depend on whether you have nucleation or growth dominated failure and what the dominant diffusion
mechanism is. If we have growth-dominated diffusion and we increase the temperature such that we
double the diffusion coefficient (approximately 20 degrees for Al alloys and grain boundary diffusion),
we must reduce the current density by half. Conversely, if we want to increase the current density by a
factor of two, we must ensure that the temperature is at least 20 degrees cooler. If failure is nucleation
dominated, an approximate 30% reduction in current is needed for a similar temperature increase to
maintain equal reliability.

Whether failure is nucleation or growth dominated is a matter of the process used to deposit the metal
and the overlying dielectric. Almost everything that happens consists of an initiation followed by a
continuation. Electromigration is no exception. First the damage must be initiated, a void nucleated or
an extrusion formed, then the damage proceeds, such as void growth or continuing the extrusion, until
failure occurs. Sometimes nucleation is slow and takes a long time and growth is fast. When this
happens we have nucleation dominated failure. Sometimes we have the converse, and the nucleation
is either very short or non-existent, and we then have growth-dominated failure. Electromigration
exhibits both types of behavior.

Nucleation Dominated Failure


Nucleation-dominated failure will be most common in processes that do not contain a redundant
"shunt" layer. Void nucleation occurs when sufficient stress is generated. To generate stress,
significant mass transport must take place. This takes time. At a critical stress level, a void will form to
reduce the stress in the system. When the void forms, a tremendous release of strain energy occurs
that promotes very rapid void growth. In the absence of a shunt layer, an open circuit develops almost
immediately, and failure follows 1/j2 kinetics. At least two other nucleation dominated failure
mechanisms have been identified: the stress buildup following Cu depletion in Al/Cu alloys, and
passivation cracking induced by compressive stresses which produce extrusions. In all three
scenarios, 1/j2 kinetics prevail.

Growth Dominated Failure


If there is a redundant shunt layer, the initial rapid growth of the void will not produce an open circuit.
The shunt layer, usually of a refractory material such as W or TiN, can conduct electricity even if a void
http://www.techonline.com/community/ed_resource/feature_article/20421?print

exists in the primary Al conductor. These metals can withstand The Blech Length
extremely high current densities at high temperatures for very
long times. If failure is defined as an open circuit, they don't fail. In the 1970's Ilan Blech of the Technion in
However, for most realistic situations, an open circuit is not a Israel performed one of the most important
series of experiments in the history of
realistic definition of failure. Since a resistance change of about electromigration science and technology. In
10% in global wiring can produce timing errors, the 10% these experiments he had created a test
increase has often been chosen as a failure criterion. structure that consisted of islands of gold
(Au) deposited onto a refractory underlay.
When current was passed through these
Using a percentage increase as a failure criterion during a test samples, the upstream side of the islands
has some problems. The actual damage that causes a failure moved in the direction of electron flow and
will be a function of the precise geometry of the test structure the downstream edge stayed stationary. If the
island was long enough, extrusions formed
and the initial resistance. This is unsatisfying for evaluating real on the downstream edge, but if the island
circuits that don't look like test structures. It is recommended, was short enough, electromigration
therefore, that failure criteria be based on an absolute change essentially stopped. Electromigration also
in resistance, the maximum that a particular circuit can stopped when the longer islands shrunk to a
critical level. He discovered that there is a
withstand before problems arise. critical product of the current density and the
length of the island, below which
It is necessary to use test structures that can measure a electromigration ceases. This is the origin of
  the "Blech Length." For any given current  
resistance change without geometric effects, such as the Blech density, there is a length below which
Length to affect the data. electromigration will not occur.

This behavior occurred because a


The growth of a void depends on the rate that metal atoms mechanical back stress, generated by
leave the void, or, equivalently, the rate at which vacancies electromigration, resisted the electromigration
enter it. The flux of vacancies or atoms is linearly dependent on force. The back stress exists only in the
the current density, and therefore the time required to attain a presence of a flux divergence and it is greater
in the presence of a mechanically strong
certain void size will obey 1/j kinetics. Care must be taken in confining passivation layer. For this reason,
experimental measurements, however, since inappropriate test the Blech Length cannot be easily pre-
structures can result in just about any value for the current determined. It is a strong function of the
exponent. process and the physical design of the chip.

In principle, one could make a circuit


For a given metallization, growth dominated failure must take immortal by designing all the lines to be
longer than nucleation dominated failure, since the damage shorter than a Blech Length. However, the
Blech Product jxl is only on the order of a few
needs to nucleate before it can grow. However, the nucleation thousand and is a strong function of the
phase can be very short, approaching zero. The kinetics of thermal history, so this idea has not been
failure must be evaluated experimentally and applied properly. seriously considered.
This means that for electromigration damage in real
conductors, we can have either 1/j or 1/j2 kinetics. It has been observed that for wide lines, defined as
those where the average grain size is smaller than the line width, 1/j 2 kinetics usually dominates,
whereas for narrow lines, 1/j kinetics dominate.

RMS Current and Temperature Gradients


When current is passed through a conductor, the interaction of the electrons with the lattice produces
a thermal energy equal to the product of the square of the current and the resistance. This is called
Joule heating. Metal lines will heat up whenever current is passed through them. If the current is low,
the heat is effectively conducted away, but there must be some temperature increase even if it is not
detectable. If the current density approaches 106 A/cm2, Joule heating can produce enough energy to
make the conductor lines heat up appreciably. At first this does not appear to be a problem, since
current densities are almost always lower than this due to limitations induced by electromigration.
However, one must realize that Joule heating is caused by root mean square (RMS) current and not
by the average current, as is electromigration. For a narrow pulse, the RMS current can be much
higher than the average current. The average current can be well within any guidelines that may be
set for electromigration considerations, yet significant Joule heating can result. This can be more
prevalent on upper level metallization, where heat must be conducted through several layers of
interlevel dielectric, which is a poor thermal conductor.
http://www.techonline.com/community/ed_resource/feature_article/20421?print

The problem with Joule heating is not the modest temperature increase, but the temperature gradients
that result. Typically, at the current densities found in modern circuitry, temperature increases would
range between a few and a few tens of degrees Celsius. This produces temperature profiles that
decay within a few microns, so that temperature gradients of
104 to 105 degrees Celsius/cm will be found. Since
electromigration is thermally activated, the temperature Al/Cu
gradients produce flux divergences that approach that found at One of the first applications of
absolute divergences such as at contacts or at microstructural electromigration engineering to solve
features. reliability problems came about 1970. At that
time, thin films were usually deposited by the
high temperature evaporation of metal films.
RMS current density must then be limited to about 2 x 10 6
Legend has it that when IBM was trying to
A/cm2 for lower level lines and about half that for upper level solve the electromigration problem, one
lines. Unfortunately, the reliability of metal lines in the presence evaporator was producing better material
than any other. It was a mystery. After weeks
of temperature gradients cannot be accurately estimated. of study, someone found out that the electron
Temperature gradients can vary tremendously throughout a beam used to melt the Al used for the
real structure, depending on subtleties of the geometry and on conductors was misaligned. Instead of
the use of the underlying silicon devices. The only way to deal impacting directly onto the Al charge placed
in a Cu container for that purpose, the e-
with these issues is to take a conservative approach and forbid beam was hitting the Cu and causing some of
temperature gradients by limiting the RMS current density to it to melt and be deposited along with the Al.
the levels suggested above. The resulting Al/Cu alloy proved to be
remarkably resistant to electromigration
failure, increasing the median time to failure
  by more than an order of magnitude. It was  
determined that Cu slowed down the diffusion
of Aluminum in the Al/Cu grain boundaries.
Microstructure and Electromigration: Line After this effect was understood, it was
exploited.
Width Effects
This, however, did not eliminate
electromigration failure, but served as a
Electromigration is a form of mass diffusion, where the driving band-aid until the technology caught up with
force is provided by the electron flow. Therefore, things that the capabilities of Al/Cu. However, the use of
affect diffusion will affect electromigration. Metals are Cu was a great breakthrough in
composed of atomic crystals where atoms are lined up very electromigration technology, buying several
years of performance and making the high
nearly perfectly in only a few allowable configurations. The size performance IC possible. Today we live
of these crystals ("grains") is finite. Where the grains meet, they within the limitations of Cu in Al by making
form a region of disorder ("grain boundary"), and provide a intelligent compromises and choices.
pathway for easy diffusion as compared to the nearly perfect Searches for other alloys in a process
reminiscent of alchemists looking for the
metal lattices. Philosopher's Stone have not turned up
anything that works better.
In the early days of ICs, the thin film conductors used in
Sometimes you just get lucky!
manufacturing were relatively wide, fine grained, and composed
of many grains. These were referred to as polycrystalline. The
grain size was about the thickness of the film, generally about one micron. Across the width of a
typical conductor several microns wide, many grain boundary pathways were available to
accommodate the electromigrating atoms. It came as no surprise that electromigration failure was
inversely proportional to the grain size of the films: the more grain boundaries present, the more atoms
that can be transported along them, and the earlier the failure time.

As line widths became smaller, the grain size of the metal films became larger. Conductor lines
became comparable in width to the grain size and took on a "bamboo" like appearance where most of
the grains spanned the line width, providing no continuous grain boundary pathway in the direction of
the current flow. When this occurred, a peculiar effect was found: failure times were strongly
dependent on line width. Narrow lines at the same current density became substantially more reliable
than wider lines, as long as the grain size was uniform.

The reason for this behavior was not hard to figure out. The lack of easy grain boundary pathways
meant that the atoms had to take more arduous paths such as the lattice or various interfaces in their
journeys. The activation energy for failure was found to be a function of line width, since the diffusion
process changed. What became even more interesting and important to reliability engineers was that
the precise arrangement and orientation of the grains had a large effect on the lifetime of the
conductor. In fact, as the ratio of grain size to line width increased, the reliability became poorer before
it got better, and then got worse again as lines entered sub-micron widths.
http://www.techonline.com/community/ed_resource/feature_article/20421?print

Today, we understand this behavior and can predict the Failure Distribution
reliability from test data, grain structure, and particulars of the The distribution of electromigration failures
metal deposition process. New effects, due to the presence of has recently been the subject of much
refractory shunt layers and W plugs, have surfaced and have discussion. Traditionally the lognormal
also been explained well enough that they can be tamed. distribution was used, where the logarithms of
the failure times are normally distributed. But
However, a fundamental understanding of the process of solid this has conceptual and practical problems,
state diffusion and what affects it are essential in interpreting the most important of which is that the
test results. For this reason, conservative default values for lognormal distribution is not extendable. This
parameters used in relating electromigration test data to real means that given an ensemble of n
components and a lognormal failure
circuits should be employed until careful testing and data distribution, if we make up a new ensemble of
interpretation justify a change. combinations of the components in series so
that the weakest of these "links" produces
failure, the resulting distribution cannot be
The choice of test structures and test conditions are of critical lognormal. Mathematically, the probability of
importance in extracting meaningful parameters to be used in failure, Pf, for a chain of n links, given that the
interpreting the test data as it relates to actual chip probability of failure of a single link is known,
performance. The wrong test or the wrong test structure can is:
produce fatal results. The test structure must be designed to
reflect the process and usually a single structure cannot.

 If Pf (1,t) is lognormal Pf (n,t) cannot be for  


n>1. Therefore, this earlier way of estimating
the reliability of n components must be
Optimizing for and Ensuring Reliability incorrect.

We can estimate the value of Pf(1,t) from test


The challenge to IC designers is to ensure reliability while structures and define that the chip would
squeezing as much performance out of the process as consist of n effective failure elements.
Determining this is not a trivial exercise,
possible. Unfortunately, the requirements for these two goals however. The number of failure elements in a
are conflicting. Higher performance means higher currents in test structure must be estimated. The good
smaller conductors, whereas reliability demands lower current news is that once we have defined what a
densities. failure element is, we can, in principle, decide
what the probability of failure for each
element is. The probability of failure of the
In the past, the custom has been to generate design rules chip then can be estimated more accurately
based on "worst case" scenarios. In this strategy, current than Equation 4 by substituting the failure
probability for each element.
densities were limited to a certain value assuming that all the
lines on the chip were to be used at this high current density.
This was patently silly. The limiting values were determined
from extrapolating the failure times, usually fitted to a lognormal
failure distribution, to some required level of reliability based on where Pi is the probability of failure for each
the chip complexity. This approach was too confining and failure element.
designers of today's ultra-high performance microprocessors
have begun to use a strategy known as "Reliability Budgeting." All one needs to do is calculate how
much power is dissipated by a chip running with every wire at the electromigration limit. It is often
kilowatts.

To perform reliability budgeting, we need to know how much current is going through each element. In
today's complex microcircuits, this is a daunting task, but the payback is significant. The allowable
current density for critical circuit paths can be increased substantially while maintaining reliability,
since the majority of circuit elements have little to no current flowing though them and are thus
effectively immortal. In addition, if Pi can be located in the circuit, trouble spots can be eliminated and
a more reliable circuit can be designed.

Great care must be taken to ensure that the information fed into the calculation of Equation 5 is
correct. If the failure statistics are incorrect, or the input parameters such as lifetime and current
exponent are wrong, a disaster can unfold. However, optimizing for performance and reliability can be
done successfully and, in fact, the successful design and manufacture of high performance
microprocessors has been possible only by employing some form of reliability budgeting.

Summary
http://www.techonline.com/community/ed_resource/feature_article/20421?print

Electromigration has been with us since the early days of solid state devices, even before ICs took
center stage. Like an old soldier, electromigration never dies, and unfortunately it does not have the
good taste to fade away. Whenever we "conquer" electromigration, we enter new regimes where the
demands of increased performance require that interconnect be more and more reliable under
conditions where metallization is inherently less reliable. The promise of developing future
metallization schemes that will erase the problem has so far eluded us and there is no guarantee that
the future holds a panacea. Copper may help a little, but not nearly enough as was hoped for and it
still only buys a little time. Eventually the capabilities of Cu will be seriously challenged, and this is
assuming we can solve the daunting processing problems that have confronted us over ten years of
development.

Recent advance have given us hope that although electromigration will always exist and cause
problems, we can control it such that advanced microcircuits can still be designed with the reliability
we need. The use of reliability budgeting, if coupled with a detailed knowledge of manufacturing
process capabilities, can allow advances without compromising long-term performance. This
complicated task can only be accomplished with the right tools and talents.

Electromigration as a design issue will be with us until we develop a room temperature superconductor
with a critical current density of millions of amps per square centimeter that is compatible with
semiconductor processing. Such a development is far in the future, and we must exercise diligence in
controlling the beast and respect its potential.

Where is it written that life is to be easy?

About the Author


J.R. "Jim" Lloyd specializes in electromigration and metallization reliability for chip and packaging applications, reliability testing
and analysis, qualification plans, and electromigration failure modeling. His industrial experience includes reliability engineering
and R&D positions at IBM and Digital Equipment Corporation. In addition, he was visiting scientist at Max-Planck-Institut in
Stuttgart, Germany. He has published more than 60 papers on semiconductor materials science and reliability engineering, has
been invited to speak to audiences throughout the world, and has taught courses and workshops at Stevens Institute of
Technology, New York Polytechnic, MRS, ASM, IBM, Digital Equipment, IRPS and ESREF (Europe). He holds the Ph.D., M.S.,
and B.S. degrees in materials science and engineering from Stevens Institute of Technology. He can be reached through email
at jrlloyd@vinfiz.net.

You might also like