Professional Documents
Culture Documents
4B.5.2
Here is an example of SDDV (Stress Driven Diffusive
Voiding) or SM: field failure, 1~2 year operating time, failure
rate ~5%, no function no output, Tc (case temperature) is over
70ºC under room temperature. Failure mechanism is SDDV or
Stress Migration/Voiding resulting in via resistance increasing
Figure 4. Abnormal mold compound and local high temperature. Root cause is Al-Si metal line and
via sensitive to stress. Improvement measurement is adding
Stressed test: component level TC, HTSL, Humidity test
Cu to Al-Si. Test and verification, over 150ºC and thousands
and then validate on PCB. Notes, maybe suppliers have
of hours stressed test show no failure with Al-Si-Cu.
already done these tests in qualification phase, but they do
these tests in process family group. We, as telecom equipment
manufacturer, do these tests with specified code to be used on
our product. Another reason is some suppliers are not able to
or do not want to do these tests.
E. IC process control
Many ways are used to control IC process at supplier site,
like regular audit of IC production line, regular review of Figure 5. Metal via void
supplier process report, remote online monitor of real time
process data, and sometimes sending people to supplier site if IV. BOARD LEVEL RELIABILITY APPROACHES
necessary.
Board level reliability approaches come to 3 ways:
Cover key processes and parameters, like Vt, Poly-CD
(Critical Dimension) and metal line resistance for wafer • Accepting-Absorbing: accept the reality that IC
process, and die attachment, bond strength of pull and shear parameter variation is inevitable, absorb and tolerate IC
and molding void for package process. Monitor processes and process variation by board level design.
parameters under control, Cpk (Process capability Index) level
• Avoiding- Compensating: any IC has its weakness,
is over 1.33 or 1.67 for key process. Keep enough check point
avoid it, or else improving it by adding a small circuit
and online sampling.
or designing in special way to compensate this
weakness.
F. IC incoming quality inspection and monitor
• Derating-Improving: improve reliability and extend the
After IC is qualified, it can be use in any telecom product.
life by derating environmental stress, assumed that IC
To make sure every incoming lot is acceptable and under
is poor in quality even though it is strength enough
control, lot by lot quality inspection and monitor are
actually.
necessary.
Samples, 5 pcs (pieces)/lot for instance, will be inspected. A. Board level design for IC process variation
If there is one or more samples failed inspection, the lot will
be rejected, and the supplier related must review its process. IC process and its all kinds of parameters vary lot by lot;
Tightened, reduced or skip-lot sampling will be used based on theoretically, they should keep far from spec limit.
component type, lot size and AQL (Acceptable Quality Level) Unfortunately, there are probabilities in practice that some lots
level. or individual units are very near to spec limit (SL-spec limit,
LSL-Low spec limit, USL-Upper spec limit), without enough
Incoming inspection and monitor items also cover margins, and there even are some outliers that are out of spec
electrical, mechanical and physical, same as those in limit. If IC process control at supplier site and incoming
qualification phase. inspection in telecom system manufacture site do not work
well, there outliers will result in board or equipment down.
Seasonal reliability monitor reports from IC suppliers will
be reviewed to make sure IC reliability is under control.
Normally, monitor items are also same as those in
qualification, but with a reduced sample size.
4B.5.3
FFF with low temperature and high voltage, are
recommended.
Corner device is for internal use only at supplier site. C. Board level stress derating design for IC reliability
Telecom manufacturer can also use corner device in board
level to verify circuit design margin. This method works well For electrical component and equipment, low failure rate
for the circuit with both analog and digital ICs. It is better if and long lifetime can be achieved through reducing stress, like
temperature and voltage stress involved synchronously. temperature, voltage and mechanical stress.
Combinations of SSS with high temperature and low voltage,
4B.5.4
In addition, learning from failure, feedback information to
board level and form reliability methodology for later
equipment version are also targets in this phase.
• FMEA (Failure Mode Effective Analysis): reliability
prediction, FMEA, reliability assignment from
Figure 10. Failure rate with time with different level stress equipment, board, module to IC level.
Field data show that, with temperature stress increased, • FIT: fault injection test.
failure rate of board or module in telecom equipment will • Reliability growing: involved to find board weakness
increased. Industry data of Telcordia standard SR-332 [2]show and then improve it in R&D phase, like HALT (High
that IC failure rate will also increase with high temperature Accelerated Life Test) or other destructive tests.
stress.
• Environmental test: involved to make sure board or
equipment are adaptive to objective market
environment, like high or low temperature, high
humidity, low air pressure, salt fog, vibration and so
on.
• Assembly inline monitor: keep variation is under
control.
• Burn-in and screening: screen out early phase failure.
a. Board failure rate from 2000+ sites, temperature measured at equipment air
exhaust vent • Ongoing reliability test: with equipment shipped out lot
by lot, samples are tested to make sure reliability is
monitored.
There are also higher level approaches, not described in
detail here since less related with IC component reliability, like
backup, from equipment to system, and then to telecom
network level backup (Two or more ones mirror each other, if
the active one goes down, the standby one takes over).
4B.5.5
A. Temperature spec definition resistance varies with air flow; Ta and Tc is not definitely
defined as mentioned in the sections above. With all these
Almost all ICs’ temperature spec is expressed as Ta uncertainties, accurate calculation of Tj is impossible for
(ambient), Tc (case) or Tj (junction) in datasheet; sometimes users.
Tb (ball or board) is used. They are widely used and well
accepted. Accordingly, thermal resistances θja, θjc and θjb are
defined and measure methods introduced in JESD51
standards.
Electronic equipment gets integrated, resulting in IC
ambient temperature increased, and there is little margin
between IC real time operating temperature and its spec limit.
Sometimes, IC may run out of its spec limit.
Figure 14. IC with Tj calculate model
1) Engineering challenge:
Without Ta, Tc or Tj definitely defined, it is difficult to
accurately know IC operating conditions, and it is not easy to
judge whether or not IC runs out of its temperature spec limit.
Below is a running board, temperature scanning with
infrared thermal imager shows that it has temperature gradient
on its surface, especially around power module and IC
components. Some suppliers only provide Ta as IC
temperature spec, and roughly defined Ta as ‘air temperature Figure 15. Thermal resistance and air flow
surrounding device’, but which point can be defined as
‘ambient’, point A, B, C or D in this case? For most ICs, there are two parameters used as
temperature spec, like Ta and Tc, or Ta and Tj. Users may be
confused if Ta runs out of spec but Tj is still far below spec,
since they always educated the important parameter is Tj that
reflects and links with IC reliability.
Additionally, some ICs do not definitely define or
distinguish between recommended operating rating and
absolute max rating.
2) Requirement for IC:
Figure 12. Board temperature scanning with infrared thermal imager
Temperature spec should be definitely defined AND can
Below is a MOSFET, thermal simulation result and real
be easily used.
test data show that it has a large temperature gradient on its
surface. If Tc is defined as point A on package mold Principles for ‘definitely defined and easily used’:
compound, it is far below its spec limit; but if defined as point
B on its metal pad, with about 30ºC temperature rising, it may • Only one parameter can be used as temperature spec
be out of its spec limit. for user.
• What user test is what he wants, no calculation or as
little as possible calculation for user.
Good choices following the principles above:
• Tc is the best one and can be defined as IC package top
surface center’s temperature. In case heatsink is used, it
can be defined by supplier case by case, at any point
only when the point is definite for user;
(MOSFET: TO220AB, Rg=24.3Ω, Vd=20V, Vg=3.37V, Id=0.05A, P=1.0W;
Simulation result, Tc=67ºC at point A and Tc=100ºC at point B; Real time test
data, Tc=71ºC at point A and Tc=104ºC at point B)
Below is widely used model to calculate Tj, that is, a. Tc test point b. Tc test point with heatsink
Tj=Tc+θjc×Qc or Tj=Ta+θja×Qc, here Qc is part P of power
Figure 16. Tc definition and test point
dissipated from IC top case. Problems are, accurate proportion
of Qc from power dissipation is not known; θja is gotten per • Tj can be considered for any highly complicated and
JESD51 and does not reflect real environment, since thermal integrated ICs, like CPU and FPGA, only when there is
4B.5.6
a temperature sensor integrated in the IC and It is difficult to cool down equipment; thermal design cost
temperature can be directly read out with board level is getting higher, IC under high temperature stress shares high
circuit. failure rate.
• Ta can be considered only when it is defined by Board level approaches employed to reduce power density
industry standards and widely accepted by suppliers. and make thermal design easy. Here are approaches IC related,
SR-332 defines Ta as “the temperature 0.5 inch above shut down the modules of IC, if they do not need to work;
the surface of the device or …the average board reduce operating frequency as low as possible; use low supply
temperature may be used…”. It will be better for SR- voltage IC; use IC with lower thermal resistance package; use
332 if further definite information added, for example, IC with wide operating temperature range.
definitions below as fig b or c can be used, if there is a
2) Requirement for IC:
heatsink attached on IC or forced air flow in
equipment; Distance d is proportional to airflow, like
10mm with 1m/s downwind airflow, or simply makes IC can suffer higher temperature stress or has wide
it a constant. operating temperature range. Has high intrinsic reliability even
operated close to it spec limit for extended term.
IC level approaches can be employed at supplier site, like
improving IC design and process to get low power
consumption, reducing static and leakage current/power with
substrate insulator technology.
a. Ta in SR-332
B. Temperature Stress
With temperature rising in equipment, some ICs become a. Heatsink dropping off b. Package lid dropping off
bottleneck in board level thermal design, like most
commercial grade FPGAs, with junction temperature only Figure 20. Poor adhesive failures
85ºC, or industry grade only 100ºC.
1) Engineering challenge:
1) Engineering challenge:
4B.5.7
Select and evaluate adhesive between IC and heatsink in E. Mechanical Stress- Bending strain
board level thermal design, avoid mechanical stress during
board assembly, equipment transportation and installation. IC may suffer bending strain stress if compressive force
does not uniformly loaded on IC. PCB bending also causes
2) Requirement for IC: strain on IC.
Enough adhesive strength as below.
• Enough surface adhesive strength given that user may
attach a heatsink on IC to avoid heatsink dropping off
under mechanical stress. Adhesive strength is a
function of package molding surface energy, a. Force not uniformly loaded
roughness, Logo/Mark process and so on.
• Enough adhesive strength between lid and die/substrate
to avoid lid dropping off.
• Lid with four pillars is not a good design; it should be b. PCB bending
attached around all along substrate, and do not be
attached on die to avoid die crack failure. Figure 22. Bending strain resulted
• Adhesive strength spec is written into IC datasheet to Solder ball crack, die fracture and die crack are main failure
make board level design easy. modes with over bending.
b. Die fracture
a. Compressive load
c. Die crack
1) Engineering challenge:
Evaluate bending strain effect, relieve strain with adhesive
or elastic pad, layout IC in the vertical direction as PCB
bending and far from PCB edge, fix PCB and reduce its
bending, avoid over mechanical stress during board assembly,
c. Substrate crack equipment transportation and installation.
Figure 21. Compressive load and failures Bending failure risk will increase if layout IC in the same
direction as PCB bending, as the device a in the fig below; or
1) Engineering challenge: too close to PCB edge, as the device b.
Evaluate compressive load effect, relieve pressure with
adhesive or elastic pad, avoid over compressive stress.
2) Requirement for IC:
Enough strength to endure compressive load; max
compressive load spec is written into IC datasheet; form
industry standard if possible.
Figure 24. Layout and PCB bending
4B.5.8
2) Requirement for IC: wear out life declines, generally, with IC process shrinking,
from hundreds year to decades years. Now, it is only about 10
Enough strength to endure bending strain; max bending to 20 years at 45nm, 28nm and 22nm nodes.
strain spec, like 300uε (micro-strain), 500uε or 1000uε, is
written into IC datasheet. Several suppliers’ data show that their devices wafer level
EM wear out life is less than 10 years, however, this is at
F. Electrical Stress absolutely max temperature 125ºC; the device still can work
for 10 years under normal conditions.
Telecom equipment may be used with its supply power
on/off switching frequently, like equipments driven by solar
power, or only work during daytime and can be switched off at
night. Sometimes, to save energy, some modules in equipment
are switched off temporarily and switched on for a while,
repeatedly in off-on-off-on cycles.
1) Engineering challenge:
In occasions above, IC in telecom equipments and
modules also suffers frequent power on/off cycling, how does
cycling affect IC reliability? Can IC work so long a time to Figure 25. Wear out life declines with IC process shrinking
meet equipment service life requirement? 1) Engineering challenge:
Frequent power on/off cycling affects IC reliability mainly Some equipments service life is 10~20 years, like high-end
in two ways. One is thermal shock or temperature stress, that Router, or 20~30 years for equipment that are difficult to
is, temperature increased with power on and then IC cools maintain, like submarine equipment. If IC’s wear out life is
down with power off. Another one is electrical stress, its effect less than 10 years, it will be a challenge for telecom equipment
can be ignored with a few cycles as normal use, but cannot design.
with vast amounts like 100, 000~1000, 000 cycles. 2) Requirement for IC:
IC level and board level power on/off test can be Keep wear out life above 10 years.
conducted with high frequency to valuate risk, like 10 on/off
cycle/hour, but suitable accelerate model needed. Power on/off CONCLUSION
cycling is not exactly the same as Temperature Cycling
(JESD22-A104) or Power and Temperature Cycling (JESD22- Telecom system and its reliability are introduced, reliability
A105). approaches are summarized. To communicate anywhere and
anytime, and keep high available and reliable, telecom system
2) Requirement for IC: be confronted with engineering challenges, this simultaneously
Max power on/off cycles allowed. propose requirements for IC components.
4B.5.9