You are on page 1of 5

FIT rate Aware EM analysis

Govind Saraswat1 , William Au1 , Qing He1 and Subramanian Venkateswaran1

Abstract— We present a novel method for performing Elec- II. T HEORETICAL C ONSIDERATIONS
tromigration (EM) verification for VLSI interconnects. It pro-
vides an accurate metric for EM reliability for the entire design The relation between the EM current in a wire segment
thus it is not affected by the inherent pessimism existing in to the corresponding failure rate, under given environment
the current state-of-art methods used by the industry. In this conditions is outlined next.
method, failure rates are computed for each wire interconnect
and accumulated in the design. It then becomes possible to A. Reliability Engineering Basics
determine the overall failure rate margin (the real pass/fail
criteria). This paper delves into the relationship between EM
Reliability engineering deals with managing the ability of
current density, technology parameters and the failure rate a system or component to function properly within the target
for each interconnect, which culminates in EM reliability conditions over its planned lifetime. Reliability management
equation. This equation is used to calculate failure rate for of this kind requires analysis, measurements, and predictions
each interconnect. of failures over time. Reliability can be measured as a proba-
bility distribution of cumulative failures as well as by failure
I. INTRODUCTION rates. Here, Time to Failure (TTF) is the random variable
Electromigration (EM) is a physical phenomenon, where whose probability distribution is studied for reliability. There
metal atoms undergo migration in the direction of applied are four important stochastic functions which are used to
electrical field[1]. This gradual migration is a result of quantify the reliability. These functions are, namely[3]:
momentum transfer due to random bombardment of the • The probability that the unit will fail within time t is
conducting electrons, as shown in Figure 1. EM is emerging called the Cumulative Distribution Function (CDF) or
as a significant problem in modern integrated circuits. The Failure Function and denoted by F(t). It is defined as:
rampant increase in total wire length and the current densities
while decreasing wire widths is making it difficult to guar- F(t) = P(T T F < t) (1)
antee EM reliability[2]. As a result, EM sign-off is going to
• Conversely, the probability that the unit will survive
be increasingly difficult and the number of design iterations
beyond time t is called the Reliability Function and
required will be more. Reliability of an IC is measured in
denoted by R(t). It is defined as:
FIT (Failure-In-Time) rate where 1 FIT means 1 failure in 1
billion device hours. Traditionally, FIT rates are budgeted R(t) = P(T T F > t) = 1 − F(t) (2)
in advance across all wire segments somewhat uniformly
(static-FIT) and a reference current limit for this fractional • Also, the Probability Density Function (PDF), denoted
rate is imposed. The process of converting product Failure as f(t) is defined as:
rate to a current limit on individual metal segment is a form dF(t)
of abstraction. When the failure rate is completely abstracted f (t) = (3)
dt
away it is impossible to determine exactly how much margin
• The failure rate, denoted as λ (t), is the rate of change in
is left after each EM check. This leads to excess pessimism
failure probability over the survival probability at time
in the EM analysis which leads to over design.
t and is given by:
dF(t)
f (t) (4)
λ (t) = dt =
R(t) R(t)
λ (t) is a conditional probability of the PDF f (t), assuming
failure has not occurred at time t. One of the important
quantity of interest in Reliability analysis is Average Failure
Rate (AFR), which is given by,
Fig. 1. Electromigration phenomenon is depicted where metal ions are
displaced by conducting electrons.  t
1
AFR(t) = λ (t)dt (5)
t 0
1 All with Oracle Inc, correspondence at govind.saraswat at By some abuse of notation, let us represent this AFR with
oracle dot com λ only. Given there are no failures at time t = 0 we can solve

978-1-5386-2231-5/17/$31.00 ©2017 IEEE 786


Ea
for λ as a function of the failure probability for any time t, where θ = . Earlier we were able to relate equation
which comes out to be: kB Tm
(6), the failure rate of a wire, to the Failure function F(t).
− ln(1 − F(t)) So we now have a relation between at time t with the EM
λ = (6) current density of that wire, given by:
t
Which can also be written as: t jn
ln( )−θ
 A z2
− √   (14)
F(t) = 1 − e−λt (7) σ e 2 dz = 2π 1 − e−λt
−∞
B. EM Reliability C. EM Reliability Equation
EM failure PDF is heuristically modeled using a log- As mentioned earlier, EM goals are established to maintain
normal distribution, given by: an average failure rate over a planned lifetime of a chip. Thus
for a desired lifetime (τ) of a design:
−(ln(t) − ln(T50 ))2
e 2σ 2 (8)
f (t)log−normal = √ .  ln(α j − θ )
n
z2
x 2πσ 2 − √   (15)
σ e 2 dz = 2π 1 − e−λ τ
Here x is the random variable T T F. A log-normal distri- −∞
bution is characterized by the median (T50 ) of the random τ
variable and standard deviation (σ ) of its natural logarithm where α = . Here, the equation (15) can be further sim-
A
which is normally distributed. Integration of the PDF f (x) plified by using the failure function Φ of a standard normal
from 0 to t gives the Failure Function F(t); distribution, giving the EM reliability equation as:
 x ln(α jn − θ )
F(t) = f (x)dx (9) Φ( ) = 1 − e−λ τ (16)
0
σ
It can be shown that for a certain range of σ values, the The EM reliability equation (16) is the relation between
failure rate is constant for an important range of failure time the EM current density of a wire with its failure rate. For
values. Thus our assumption of constant average effective practical application, foundry provides the various param-
failure rate actually has a theoretical basis. Failure distribu- eters and we will use those parameters to calculate the
tion parameters are found empirically using reference wires failure rate for each wire segment by using the EM reliability
on test chips. The median T50 can be calculated using one of equation. Φ can be calculated using look-up tables.
the Acceleration Models, also known as Black’s Equation[4],
D. Chip Reliability
which is an abstract mathematical model, give by:
In VLSI, the reliability concerns are measured in the
Ea
(10) failure rate (FIT) of the entire chip. Mathematically, 1 FIT
T50 = Ae B Tm j−n
k = 1 failure in 1 billion device hours. We establish EM
Here A is Black’s coefficient, Ea is the Activation en- design goals in order to maintain an overall average product
ergy, kB is the Boltzmann’s coefficient, Tm is the absolute failure rate below a target FIT over the planned lifetime of
temperature in K, j is the current density and n is a the chip. Here, we make the assumption that if one wire
Process parameter. For simplicity, we transform the log- fails, the whole system fails, it may be pessimistic, but thats
normal distribution to a standard normal (or Z) distribution the safe assumption. Then the chip failure rate is the sum
by doing a change of variables from x to z by: of the failure rates of all wires given the wires can fail
independently. That is,
ln(x) − ln(T50 )
Z(x) = (11)
σ λchip = ∑ λwire (17)
Then the Failure function F(t) in terms of z can be given
by: E. Effect of temperature
2
 Z(t) z Temperature plays an important role in meeting the FIT
1 − (12)
F(t) = Φ(Z(t)) = √ e 2 dz rate targets as a small change in temperature can have
2π −∞ a significant effect on FIT rate. Temperature of a wire
By combining equations (7), (10), (11) and (12), we get a interconnect can change depending on its own Joule’s heating
relation between the Failure function F(t) and the current as well as on heating due to nearby wires and devices. If Tre f
density j. is the reference temperature and Tdes is the actual temperature
of the wire at the time of operation, than by using equation
t jn
ln( )−θ (11), we can write:
 A z2 (13)
1 − ln(T50,re f ) − ln(T50,des )
F(t) = √ σ e 2 dz Zdes − Zre f = (18)
2π −∞ σ

787
Fig. 2. Here the hierarchical FIT aware EM analysis is outlined. FIT is calculated for each library cell and then propagated up the hierarchy while
calculating total FIT for every block/cluster of the chip. Similarly cluster/block FIT is propagated up while calculating total FIT of the chip.

Fig. 3. FIT is calculated and accumulated for each block and then compared with the set target FIT rate for that block. Here it is assumed that the target
FIT rate for the entire chip is 10.

Further using (10), we get: vice versa. But to know optimum LRcrit , we have to
Ea 1 1 know the jmax,re f , and we need LRcrit to find jmax,re f ,
Zdes = Zre f + ( − ) (19) thus presenting a classic ‘chicken vs egg’ problem.
σ Tre f Tdes
The wires which are not designed at jmax,re f still have
Now we can calculate the design FIT rate λdes correspond- contributions to the total failure rate. Thus, this method
ing to the operating temperature of Tdes by using equations is very approximate in nature and does not provide the
(6), (12) and (19): real picture. Furthermore, a lot of design iterations are
III. S TATE - OF - ART EM ANALYSIS needed to know optimum LRcrit .
Main aim of EM analysis is to achieve some fix target Other strategy proposed in [5], [6] and [7] gives an
FIT rate (λchip ) for the entire chip. There are two strategies approximate value of total reliability of the chip. There
predominantly used: wire segments are classified in discrete classes, and then
total length of wires of specific class is calculated (similar
• Static FIT: Traditionally, FIT rates are budgeted in
to the Critical Length Ratio method). Thus the strategies
advance across all wire segments somewhat uniformly
which are used in industry are either overly pessimistic or
(static-FIT) and a reference current limit for this frac-
very approximate. We next outline the new method for EM
tional rate is imposed for each wire.
analysis.
• Critical Length Ratio: A critical length ratio (LRcrit )
is chosen for the chip. Here it is assumed that the IV. N EW METHOD
number of wires operating at maximum allowed current
density jmax,re f is LRcrit . This corresponds to a failure We propose a new FIT aware EM analysis where failure
rate of λre f which depends on LRcrit , and is given rate is computed for each wire segment and accumulated
λchip in the design. With this method, we can determine the
by . Thus for a given target failure rate λchip , overall failure rate margin (the real pass/fail criteria). No
LRcrit
we can calculate λre f and thus calculate jmax,re f or assumptions are made on the reliability performance (as is

788
the case in the Critical Length Ratio method), and thus an
accurate picture is provided by this method. FIT aware EM
analysis is implemented in a hierarchical manner as depicted
in Figure 2
An EM analysis which is FIT aware also allows for
dynamic FIT budgeting and taking account of the thermal
map of the chip easily. The dynamic FIT budgeting rec-
ognizes that not all wires require the same FIT target. It
attempts to redistribute the remaining balance from wires
that do not need more margin to wires that need thus
reducing pessimism from the analysis. The use of thermal
map provides further relaxation to the EM analysis. Dynamic
FIT budgeting happens during analysis instead of before, by
performing real time failure rate calculation and rate target
budgeting for each block of the chip (see Figure 3). If total
FIT of the chip is more than the target FIT rate (10, for
example), target FIT rate for each block can be dynamically
changed depending on the calculated FIT rate of the block.
As mentioned earlier, effect of temperature on failure
rate is of great importance and thermal map of a chip is
extremely useful in quantifying the actual failure rate of
the device [8], [9]. We can use the method outlined in
the subsection E to relate the changes in FIT rate with
changes in the temperature of a wire. These changes are
calculated using some of the parameter values provided by
foundry and plotted in Figure 4. Figure 4 clearly depicts near-
exponential dependence of failure rate on the temperature,
while everything else is kept constant. A mere change of
around 7°C results in two order of magnitude change in
FIT rate. Thus the thermal map of a chip can be used for
calculating the accurate FIT rate for each wire segment.
Fig. 5. Figure shows log-log graph of number of interconnects plotted
against FIT rate of interconnects for two examples where (a) Total FIT rate
is less than 10 and (b) total FIT rate is greater than 10.

case current simulation for two cases. In both the cases, it is


evident that a lot of wires (500) fail the Static-FIT threshold
(10/10, 000 = 1e − 3), which, as explained in the previous
section is an extremely pessimistic method. If we calculate
the LRcrit for this example chip for both test-cases, we get
that to be around 30, i.e. there are 30 wires which have a
FIT rate of (≈ 10/30 = 0.33). With this critical length ratio,
there are 5 wires in the first case, which are higher than 0.33
and are failing. Now if we use the proposed method of FIT
accumulation, we get the total FIT rate for this case to be
9.98 which is within the limit.
Fig. 4. Figure shows variation of FIT with Temperature when everything In the second test-case (different data as plotted in Figure 5
else is kept constant. Clearly almost exponential relation. (b)), the LRcrit method shows that all the critical wires are
within the limit, i.e. no interconnect has FIT rate more than
0.33. But for this example, if we use our method and do
V. S IMULATION R ESULTS
FIT accumulation, we see, that total FIT rate of chip is
We now present the simulation results for two examples 10.44 which is above the limit. This means that the LRcrit
where we show the efficacy of the new method. Lets assume method is inaccurately concluding that chip meets the target
we have a chip design which has 10, 000 interconnects and FIT rate. These two examples illustrate how the proposed
the chip has a target FIT rate of 10. Figure 5 (a) and (b) method is helping in reducing pessimism inherent in the
show the histograms of FIT rates for all wires in a worst Static-FIT method but at the same time does not suffer from

789
inaccuracies of the LRcrit method. of the FIT rate calculated for each wire segment.

VI. C ONCLUSION AND F UTURE D IRECTIONS R EFERENCES


The new method is summarized as follows: [1] James R Black. Mass transport of aluminum by momentum exchange
with conducting electrons. In Reliability Physics Symposium, 1967.
• Failure (FIT) rate to be calculated for each wire seg- Sixth Annual, pages 148–159. IEEE, 1967.
ment. [2] William R Hunter. The implications of self-consistent current density
design guidelines comprehending electromigration and joule heating
• FIT rate to be accumulated and rolled up the hierarchy for interconnect technology evolution. In Electron Devices Meeting,
for the entire chip. 1995. IEDM’95., International, pages 483–486. IEEE, 1995.
• FIT rate then compared with the target FIT rate for the [3] Paul A Tobias and David Trindade. Applied reliability. CRC Press,
2011.
chip. This will be the final EM sign-off tool. [4] James R Black. Electromigration failure modes in aluminum metalliza-
This is the first time, FIT rate is computed for each tion for semiconductor devices. Proceedings of the IEEE, 57(9):1587–
1594, 1969.
wire segment in a chip as well as the total FIT rate being [5] John Kitchin. Design for reliability in the alpha 21164 microprocessor.
accumulated and reported for the entire chip. This method In Reliability Symposium, 1996. Reliability-Investing in the Future.,
can be used for any Integrated Circuit and can incorporate IEEE 34th Annual Spring, pages 64–69. IEEE, 1996.
[6] John Kitchin. Statistical electromigration budgeting for reliable design
any EM reliability model[10], as long as we can calculate the and verification in a 300-mhz microprocessor. In VLSI Circuits, 1995.
FIT rate for each wire. The complexity of the new method is Digest of Technical Papers., 1995 Symposium on, pages 115–116.
slightly more than the current methods, as for EM analysis, IEEE, 1995.
[7] Chanhee Oh, Haldun Haznedar, Martin Gall, Amir Grinshpon,
the bottle neck is the circuit simulation step. This step is Vladimir Zolotov, Pon Ku, and Rajendran Panda. A methodology for
where current density for each wire segment is calculated. chip-level electromigration risk assessment and product qualification.
Traditionally, EM ratio (J/Jmax ) for a wire segment is defined In Quality Electronic Design, 2004. Proceedings. 5th International
Symposium on, pages 232–237. IEEE, 2004.
as the ratio of current density in the wire over the maximum [8] Syed M Alam, Donald E Troxel, and Carl V Thompson. Thermal
allowed current density. Currently this result is reported aware cell-based full-chip electromigration reliability analysis. In
in the reports generated by EM analysis tools. The new Proceedings of the 15th ACM Great Lakes symposium on VLSI, pages
26–31. ACM, 2005.
technique includes the FIT rate for each wire along with [9] Ted Sun, Ayhan Mutlu, and Mahmud Rahman. A new statistical
the EM ratio in the generated report. The accumulated FIT electromigration analysis methodology that incorporates across-chip
rate for each cell/block and the entire chip is also reported. temperature variation. In Quality Electronic Design (ASQED), 2011
3rd Asia Symposium on, pages 115–118. IEEE, 2011.
Improvements over existing methods are summarized below: [10] RL De Orio, Hajdin Ceric, and Siegfried Selberherr. Physically based
• New method is able to significantly reduce pessimism models of electromigration: From blacks equation to modern tcad
models. Microelectronics Reliability, 50(6):775–789, 2010.
in EM analysis inherent in the Static FIT method. This
is extremely important as the chip design is becoming
more and more EM limited.
• New method significantly improves the accuracy of EM
analysis compared to the Critical Length Ratio method.
• New method also includes incorporating the thermal
map of the chip, which further increases the accuracy

790

You might also like