Professional Documents
Culture Documents
Bazu
Reliability of Electronic Components
Springer-Verlag Berlin Heidelberg GmbH
T. I. Băjenescu, M. I. Bâzu
Reliability of
Electronic Components
A Practical Guide to Electronic
Systems Manufacturing
Springer
Prof. Eng. Titu I. Băjeneseu, M. Se.
13, Chem in de Riant-Coin
CH-I093 La Conversion
Switzerland
ISBN 978-3-642-63625-7
This work is subject to copyright. AII rights are reserved, whether the whole or part ofthe material
is concemed, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilm or in other ways, and storage in data banks. Duplication
of this publication or parts thereof is permitted only under the provisions of the German
Copyright Law ofSeptember 9, 1965, in its current version, and permission for use must always
be obtained from Springer-Verlag. Violations are liable for prosecution act under German
Copyright Law.
O Springer-Ver1ag Berlin Heide1berg 1999
Originally published by Springer-Verlag Berlin Heidelberg New York in 1999
Softcover reprint ofthe hardcover Ist edition 1999
The use of general descriptive names, registered names, trademarks, etc. in this publication does
not imply, even in the absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general use.
stress. Hence the process of screening and/or burn-in to weed out the weak part is a
universally accepted quality control tool for achieving high reliability systems.
The technology has an important role regarding the reliability of the concerned
component, because each technology has its advantages and weaknesses with
respect both to performance parameters and reliability.
Moreover, for integrated circuits - for example - is particularly important the
selection of packaging form [inserted, surface-mounted devices; plastic quad
flatpack, fine pitch; hermetic devices (ceramic, cerdip, metal can); thermal
resistance, moisture problem, passivation, stress during soldering, mechanical
strength], as well as the number of pins and type.
Electronic component qualification tests are peremptorily required, and cover
characterisation, environmental and special tests as well as reliability tests; they
must be supported by intensive failure analysis to investigate relevant failure
mechanisms. The science of parts failure analysis has made much progress since
the recognition of reliability and quality control as a distinctive discipline.
However, a new challenge is born - that of computer failure analysis, with
particular emphasis on software reliability. Clearly, a computer can fail because of
a hardware failure, but it can also fail because of a programming defect, though the
components themselves are not defective. Testing both parts and systems is an
important, but costly part of producing reliable systems.
Electrostatic discharge (ESD) induced failures in semiconductor devices are a
major reliability concern; although improved process technology and device design
have caused enhancements in the overall reliability levels achieved from any type
of device family, the filler mechanisms due to ESD - especially those associated
with Charge Device Model (CDM), Machine Model (MM), etc. -, are still not fully
understood.
Recent reliability studies of operating power semiconductor devices have
demonstrated that the passage of a high energy ionising particle of cosmic or other
radiation source through the semiconductor structure, may cause a definitive
electric short-circuit between the device main terminals.
In electromigration failure studies, it is generally assumed that electromigra-tion
induced failures may be adequately modelled by a log-normal distribution; but
several research works have proved the inefficiency of this modelling and have
indicated the possible applicability of the logarithmic distribution of extreme
values.
The reliability problems of the electroriic devices, the parameters influencing the
life time and the degradation process leading to the failure have rapidly gained
increasing importance. The natural enemies of electronic parts are heat, vibrations
and excess voltage. Thus a logical tool in the reliability engineer's kit is derating -
designing a circuit, for example, to operate semiconductors well below their
permitted junction temperatures and maximum voltage rating.
Concerning the noise problem and reliability prediction of metal-insulator-
metal (MIM) capacitors, generally the MIM system may be a source of partial
discharges, if inhomogenities like gas bubbles are present. If the ramp voltage is
applied, a number of current fluctuations occurring in the system is experimen-tally
observable in many capacitors. In the time domain, the current fluctuations are
present with random amplitude and random time between two consecutive pulses.
Electric charge is transferred through this system and its value reaches as much as I
Foreword VII
The authors
To my wife Andrea - for her patience and encouragement throughout this
project - and daughter Christine - a many faceted gem whose matchless
brilliance becomes more abundant with every passing year.
The last decades have generated extremely strong forces for the advancement of
reliable products beyond the current state of the art. The obvious technical require-
ments of the American military forces (during World War II, the Korean, Vietnam
and Gulf conflicts), but also of the American and European Space Programmes,
have resulted in vastly improved reliability in machinery components. New ap-
proaches to components as well as to system reliability will be required for the next
generation. Product reliability can only be realised by combining the proper uses of
compatible materials, processes, and design practices.
While it is not possible to test reliability into a product, testing can be instru-
mental in identifying and eliminating potential failures while not adversely affect-
ing good components. Unfortunately, product reliability is often compromised by
economic considerations. Optimising the product reliability involves special con-
sideration applied to each of the three life intervals: infant mortality period, useful
life and wearout.
Infant failures should be eliminated from the device population by controlled
screening l and burn-in procedures. Two major types of defects are dominant in the
infant mortality period: quality defects and latent defects 2. Adequate derating fac-
tors and design guidelines should be employed to minimise stress-related failures
during the normal operating lifetime of the product. Finally, the effects of compo-
nent wearout should be eliminated by timely preventive maintenance.
1The most" proficient method for determining representative efficiency factors for conventional
screening tests - and subsequently an optimal screening effectiveness programme - involves a
five-step approach: (i) Determine the dominant failure modes experienced in each technology
and package configuration, as well as the impact of variables such as device complexity on
those failure mode distributions. (ii) Investigate the types and magnitudes of stress that activate
the various failure mechanisms and associated failure modes. Relate these stresses and stress
magnitudes to those specified in conventional screens. (iii) Examine screening data for each
technology to establish the range and the average reject rates actually experienced in the con-
ventional screening tests. (iv) Analyse the field experience of screened devices to determine for
each technology the screening escape rates and the types of failure modes that escape screening.
(v) Combine all reliability information to formulate efficiency factors for individual screening
tests for various technologies and package configurations; these efficiency factors can be
merged with screening cost information to determine overall screening effectiveness.
2 A quality defect is one that may be found by employing normal quality control inspection
equipment and procedures without stressing the component. A latent defect is one that will es-
cape the normal quality control procedures and requires component stressing in order to be de-
tected by inspection at the propagated failure level. A well-planned inspection station utilising
detailed criteria, proper instrumentation and trained personnel will exhibit high inspection effi-
ciency. However, no inspection is perfect, and a 100% efficiency is impossible to attain.
XIV Preface
crocircuits on the same chip, the key element of the so-called "second silicon
revolution". The monolithic integration of sensors, actuators, optical devices,
valves leads to new devices - the microsystems - having a higher reliability, be-
cause the failure mechanisms linked to bond wires are virtually eliminated. And the
higher integration degree reduces not only bond pads and bond wires, but also the
number of system interconnects, with beneficial effects on the overall reliability.
On the road of the continuous decreasing of the structure size, the physical and
technological limits of semiconductor nanostructures point to the use of molecules
and atoms in information science. In particular, organic molecules are very attrac-
tive because they can be engineered with very large complexity, and their elec-
tronic and optical properties can be controlled technologically.
Our book should be viewed as a "matter of fact" text on a practical reliability
guide to electronics manufacturing of complex systems rather than a work on the
theory of components reliability, and - as such - it constitutes only a partial survey
(thus, for example, it ignores RF and microwave devices and circuits which are the
heart of wireless products) of some particular aspects of the more common and/or
more encountered practical reliability problems. The aim of this book is to contrib-
ute to the new approaches and to the understanding and development of electronic
component reliability. The underlying objective of the book is to better understand
why components fail, addressing the needs of engineers who will apply reliability
principles in component design, manufacture, testing, and field service.
This book is designed to present such information at a level suitable for students
in the final year and diploma courses, but it is very useful, too, both for electronic
systems manufacture specialists and users, and for the candidate for a doctor de-
gree. Although the material of the book is not developed to the level generally
reached in postgraduate studies, it would be a suitable introduction to the subject, to
be followed by a more detailed examination of particular topics. This book took an
awfully long time to be written; much of the material put together over several
years has been discarded and new chapters have been added for the final English
version.
Our book is the first attempt to compile a volume specifically focused on the re-
liability problems of electronic and/or telecommunications components; it presents
an ample synthesis of specific reliability information in the field, and is addressed
to the electronic engineer who is concerned with equipment and component reli-
ability, and who will encounter a variety of practical, mathematical and scientific
problems in addition to those arising from his own particular branch of engineering.
The result is a reference work that should be invaluable to those involved in the
design and/or in the test of these highly challenging and interesting types of com-
plex electronic systems.
The book tries to make the point in this domain and attempts to summarise the
present knowledge on semiconductor failure modes, degradation and mecha-nisms,
knowledge derived from the studies of numerous workers in the field. For com-
pleteness the book also includes a survey of accelerated testing, achieving better
reliability, total quality problems, screening tests and prediction methods (allowing
evaluating the reliability of a future electronic system by starting from predictions
on the reliability of each component. A detailed alphabetical index, a glossary, two
acronym lists (international organisations and useful abbreviations), three reliability
dictionaries and a rich specific bibliography by the end of each chapter, and a gen-
XVI Preface
eral one by the end of the book round the picture of the infonnation offered by the
book.
The authors
Contents
1 INTRODUCTION
References 37
2.4 Standardisation 87
2.4.1 Quality systems 87
2.4.2 Dependability 87
References 87
Contents XIX
3.2 Resistors 94
3.2.1 Some important parameters 97
3.2.2 Characteristics 98
3.2.3 Reasons for inconstant resistors [3.8] ... [3.10] 100
3.2.3.1 Carbon film resistors (Fig. 3.4) 101
3.2.3.2 Metal film resistors 101
3.2.3.3 Composite resistors (on inorganic basis) 101
3.2.4 Some design rules 101
3.2.5 Some typical defects of resistors 102
3.2.5.1 Carbon film resistors 104
3.2.5.2 Metal film resistors 104
3.2.5.3 Film resistors 105
3.2.5.4 Fixed wirewound resistors 105
3.2.5.5 Variable wirewound resistors 105
3.2.5.6 Noise behaviour 105
References 141
References 169
Contents XXI
References 193
References 213
7.5 Comparison between the IC families TTL Standard and TTL-LS 240
References 241
Contents XXIII
References 275
References 310
References 327
References 336
References 359
XXVI Contents
References 379
References 410
15 APPENDIX 413
15.1 Software-package RAMTOOL++ [15.1 J 413
15.1.1 Core and basic module RJ Trecker 413
15.1.2 RM analyst 414
15.1.3 Mechanicus (Maintainability analysis) 414
15.104 Logistics 414
15.1.5 RM FFT-module 415
15.1.6 PPoF-module 415
15.7 Typical costs for the screening of plastic encapsulated ICs 421
15.8 Results of 1000 h HTB life tests for CMOS microprocessors 421
15.9 Results of 1000 h HTB life tests for linear circuits 422
15.10 Average values of the failure rates for some IC families 422
References 424
XXVIII Contents
INDEX 501
List of figures and tables
Figures
Fig. 4.1 Comparison between failure rates of silicon rectifier diodes, for
different stresses: d. c.loading on barrier layer and operation under capacitive load
Fig. 4.2 The failure causes (in %) of the silicon rectifier diodes
Fig. 4.3 Failure rate versus normalised temperature of the barrier layer,
according to MIL-HDBK-217; 1 - silicon diode; 2 - germanium diode; 3 - Z-diode
Fig 4.4 "Superrectifier" technology with "glass" of plastic materials (General
Instrument Corp.). 1 - brazed silicon structure; 2 - sinterglass passivation; 3 - non
inflammable plastic case
Fig. 4.5 The "double plug" technology. 1 - glass tube; 2 - structure; 3 - plug
Fig. 4.6 Planar structure in the standard technology. 1 - silver excrescence
assuring the anode contact; 2 - Si02 passivation assuring the protection of the pn
junction, at the surface; 3 - metallisation of the cathode contact
Fig. 4.7 Standard technology with the two plugs (FeNi-alloy) 1 - connection; 2
- structure; 3 - hermetically closed glass body ; 4 - plug; 5 - silver outgrowth
assuring the anode contact; 6 - cavity having about 200lllll width; 7 - welding
Fig. 4.8 Technology "without cavity", with mesa structure. 1 - metallisation of
the anode contact; 2 - metallisation of the cathode contact; 3 - Si0 2 passivation
assuring the protection of junction on the lateral parts of the structure
Fig 4.9 Technology ''without cavity", with the two silvered tungsten plugs. 1 -
structure; 2 - welded contact; 3 - hermetically sealed glass body
Fig. 4.10 Intermediate technology between "standard" and ''without cavity":
this is a planar structure, but of bigger dimensions. 1 - (passivate) oxide; 2 -
glassivation; 3 - cathode contact (metallisation)
Fig. 4.11 Intermediate technology: the glass body is in contact with the
giassivation
Fig. 4.12 Behaviour of different Z diodes while ageing after storage at +70°C.
Beyond 20 000 hours, the 6.3V Z diode does not operate reliable anymore
Fig. 4.13 Behaviour at ageing of the breakdown voltages ofZ diodes measured
at -ID = 1mA and 20mA: A) Tj = 135°C; B) Tj = 90°C
Fig. 4.14 Impatt diode chip in hermetically sealed package, with copper stud at
bottom serving as terminal and heatsink. Other terminal is at top
Fig. 4.15 Effect of junction temperature on failure rate for ~ = 1.8eV
Fig. 4.16 The influence of circuit load resistance on output power for either a
pulsed or CW Impatt in a circuit which resonates the diode at a single frequency
faa. The pulsed or d. c. operating current is kept fixed at 10
Fig. 5.1 Failure rate vs. virtual junction temperature [5.10]
Fig. 5.2 Correlation between the damage speed, expressed by the failure rate
(A., in lO'5/h) and the reverse of the temperature, Iff (in lO"l/K)
Fig. 5.3 Voltage dependence of the median time (lognormal distribution).
Experimental data were obtained from four samples withdrawn from the same
batch of bipolar transistors undergoing a life test at the same temperature, at the
same dissipated power (Pmax ), but at different combination Ui, Ii (where Ui x Ii =
Pmaxfor all samples)
Fig. 5.4 Temperature range vs. number of cycles till failure (for power
transistors encapsulated in package TO-3)
Fig. 5.5 Temperature range vs. number of cycles till failure (for power
transistors encapsulated in package TO-220)
List of figures and tables XXXIII
Fig. 5.6 Correlation between failure rate and normalised junction temperature.
For transistors with dissipated power higher than 1W at an environmental
temperature of 25°C, the values must be multiplied by 2
Fig. 5.7 Failure rate vs. junction temperature for various reliability levels of
power transistors
Fig. 6.1 Two transistor analogue ofpnpn structures
Fig. 6.2 Passivation and glassivation (National Semiconductor document). The
passivation is a proceeding permitting the protection against humidity and surface
contaminants with a doped vitreous silicon oxide film: 1 diffusion; 2 substrate; 3
glassivation; 4 conductive line; 5 metal; 6 passivation
Fig. 6.3 Estimated A of a standard SCR depending on junction temperature,
reverse and/or forward voltage, and failure definition for a maximum rated
junction temperature of + 100°C
Fig. 6.4 Estimated A of a standard SCR depending on junction temperature,
reverse and/or forward voltage, and failure definition for a maximum rated
junction temperature of + 125°C
Fig. 6.5 Estimated A of a standard SCR depending on junction temperature,
reverse and/or forward voltage, and failure definition for a maximum rated
junction temperature of + 150°C
Fig. 6.6 Simplified structural simulation model of SITH
Fig. 6.7 Potential distribution in SITH along channel axis
Fig. 6.8 Electron energy distribution along channel axis
Fig. 6.9 Barrier height versus gate bias
Fig. 7.1 Evolution of the metallisation technology and corresponding allowed
current densities
Fig. 7.2 Main sequences of the planar process: a starting material; b deposition
of an epitaxial n layer; c passivation (with an oxide layer); d photolithography; e
diffusion of a p+ layer; f metallisation
Fig. 7.3 A log (~ VN 0) vs. log t plot for hot-carrier degradation mechanism
Fig. 7.4 Plot of the Arrhenius model for A = 1 and Ea = 1.1 eV
Fig. 7.5 Comparison of data refering to early failures and long term failures: a)
typical domain of long term failure mechanisms for commercial plastic
encapsulated ICs; domain of early failures for bipolar commercial SSIIMSI;
domain of early failures of commercial MOS LSI [7.21]
Fig. 7.6 Replacement rate of commercial TTL ICs in plastic package (in RlT,
during infant mortality period) [7.21]
Fig. 7.7 Monte-Carlo reliability simulation procedure for ICs
Fig. 7.8 Failure distribution for bipolar monolithic ICs
Fig. 7.9 Failure distribution for MOS ICs
Fig. 7.10 Failure distribution for COS/MOS ICs
Fig. 8.1 The place of hybrid circuits in the general framework of
microelectronics
Fig. 8.2 Drift of nitride tantalum resistors, under load, is smaller than 0.1 %
after 103 working hours
Fig. 8.3 Stability of nitride tantalum resistors depending on number of cycles
of damp heat
Fig. 8.4 The results of high temperature storage of nitride tantalum resistors,
at various temperatures
XXXIV List of figures and tables
Fig. 8.5 Noise characteristics of Birox 1400 pastes before and after laser
adjustment, depending on the resistor surface (for Birox 1400, 17S, and 17G
pastes of Du Pont better noise figures may be obtained)
Fig. 8.6 Evaluation of the relative costs for the thick- and thin-film integrated
circuits
Fig. 8.7 The experience of users (A. .. L) versus predicted failure rates
Fig. 8.8 Primary causes of failures of small power hybrid circuits
Fig. 8.9 The primary causes of the failures (power hybrid circuits)
Fig. 8.10 Statistical reliability data for hybrid circuits
Fig. 8.11 Without cooling radiator, the enamelled layer works at a smaller
temperature than that of an equivalent aluminium oxide chip. As consequence, for
the aluminium oxide, a cooling radiator has a better power dissipation. 1 -
enamelled layer; 2 - aluminium oxide; 3 - beryllium oxide
Fig. 8.12 A good example of thick-film circuit: a band filter (Ascom Ltd.,
Berne)
Fig. 8.13 Conductive lines printed on ceramic substrate: drying at + lS0°C;
baking of the conductive lines at +8SoC
Fig. 8.14 Printing of the first resistor paste; drying at + lS0°C
Fig. 8.15 Printing of the second resistor paste; drying at +1 SO°C; pastes baking
at+8S0°C
Fig. 8.16 Printing the protection layer (glazing); drying at + lS0°C; baking the
glazing at +SOO°C
Fig. 8.17 Printing the soldering (which remains wet for component mounting);
mounting of capacitors; reflow-soldering
Fig. 8.18 Measuring of all capacitors; calculation of nominal values of resistors
(97% of nominal value); ageing of substrate (70 hours at + lS0°C)
Fig. 8.19 Fine adjustment of resistors at nominal value
Fig. 8.20 Mounting of the active components; mounting of connections
Fig. 8.21 Pre-treatment of integrated circuits for thick-film hybrids
Fig. 8.22 Chip mounting
Fig. 8.23 Beam lead attachment requires thermocompression bonding or
parallel gap welding to the substrate metallisation
Fig.9.1 Decrease of device dimensions in the years 1970 to 2010 [9.3]
Fig. 9.2 Development of molecular electronics/photonics from conventional
electronics and optics [9.3]
Fig. 9.3 Trend of DRAM device parameters [9.5]
Fig. 9.4 Increase of process steps due to device complexity [9.S]
Fig. 9.5 Record density trend in DRAM and other media [9.S]
Fig. 9.6 Another possible classification of semiconductor memories. (PLA:
programmable logic array)
Fig. 9.7 Illustration of a soft error
Fig. 9.8 Defects in digital MOS and linear and digital bipolar technologies IC's
[9.20]
Fig. 9.9 Generation of electron-hole pairs in the gate and field oxides (PG =
polysilicon gate)
Fig. 10.1 Classification of optoelectronic semiconductor components
[10.1][10.2]
Fig. 10.2 A typical red LED cross-section
List of figures and tables XXXV
Fig. 14.9 Integrated circuit 936. Electrical overcharge: pads of the output
transistors are melted
Case 4:
Fig. 14.10 DTL integrated circuit 9946, defect at electrical control of equipped
cards (inputs 1 and 2 overcharged)
Case 5:
Fig. 14.11 Optocoupler: the failure mode is an open circuit of the
phototransistor; the emitter solders are interrupted. Because the optocouplers
passed by a 100% electric control, it seems that no mechanic defects occured. To
reach the aluminium pad (leading to the emitter windows), the glass passivation
layer was removed and the failure mechanism was discovered: the metallisation
surrounding the emitter area was burned by a overcharge current produced by the
scratch of the pad during the manufacturing process. Only a small portion of the
pad remains good, allowing the passing of the electric control. When the
optocoupler was used, the pad was burned and the failure occured
Case 6:
Fig. 14.12 Aluminium and oxide removal during ultrasound solder
Case 7:
Fig. 14.13 Local damage of the protection layer during ultrasound solder
Case 8:
Fig. 14.14 TTL IC 7410: Two inputs are found defect at electrical functionning
control of equipped cards. The silicon was broken under the contact zone (a rare
defect, produced by an incorrect manipulation during manufacturing process
Case 9:
Fig. 14.15 Local removal of aluminium at testing, bellow a thermocompression
area
Case 10:
Fig. 14.16 Break of an aluminium wire (ultrasound bond)
Case 11:
Fig. 14.17 Crack in a crystal
Case 12:
Fig. 14.18 Break of a die
Case 13:
Fig. 14.19 TTL IC 7400 (X170): Output 8 is defect at the electrical control of
equipped cards. One may notice the shortcircuit between the contact wires
soldered at pin 8 and 7, respectively
Case 14:
Fig. 14.20 Failures of diodes after a test at temperature cycling [14.34]. Causes:
wrong centred dies and wrong aligne-ment at diodes mounting
Case 15:
IC TTL 7475 (flip-flop with complementary outputs. The normal operation was
observed only for temperatures between 25 and 40°C. At temperatures higher than
40°C, the output level is instable. The phenomenon is produced by the contact
windows insufficiently open at the open collector output transistors. (Fig.
14.21 ... 14.23 Metallised dies. Fig. 14.24 Dies with metallisation removed.)
Case 16:
Bipolar LSI IC type HAI-4602-2: electrostatic discharges. There are no
differences between the handling precautions for bipolar and MaS ICs, because
List of figures and tables XXXVII
both categories are sensitive to electrostatic discharges. SEM pictures show the
areas affected by electrostatic discharge (Fig. 14.25... 14.27)
Case 17:
Partial vue of the metallisation layer of a ROM die, longitudinal section
(Fig. 14.28... 14.31)
Case 18:
Fig. 14.32 Notches formed during metallisation corrosion
Case 19:
Fig. 14.33 Excellent metallisation of a collector contact window of a TTL IC
(X5000)
Case 20:
Fig. 14.34 Excellent covering of the metallisation over an oxide step (X9000)
Case 21:
Fig. 14.35 Wrong thining of a metallisation pad over an oxide step (Xl 0000)
Case 22:
Hybrid circuit voltage regulator with power transistor at the output. Melt
connection at the emitter of power transistor. This failure mecanism may be
avoided if the manufacturer does not forget to specify in the catalogue sheet that at
the regulator input a capacitor with good high frequency characteristics must be
mounted (Fig. 14.36... 14.38)
Fig. 14.38 An error occured: the output voltage is higher than the input voltage.
To avoid the failure, a blocking diode must be mounted between the input and
output (a detail not mentioned by the manufacturer).
Case 23:
Small signal transistors with wire bonding defects
Fig. 14.39 Bad solder of a connection wire
Fig. 14.40 Edge solder joint
Fig. 14.41 Shortcircuit of the base wire with the crystal
Case 24:
Fig. 14.42 Electrical opens of a metallic pad (RAM chip), produced by
electromigration
Case 25:
Fig. 14.43 Typical example of pop com noise at an operational amplifier
Case 26:
Fig. 14.44 Silicon dissolution in aluminium (X 11000)
Case 27:
Fig. 14.45 Dissolution of silicon in aluminium. To be noted the change of
orientation in horizontal plane (100) (X 1700)
Case 28:
Fig. 14.46 Hole in a gate oxide, leading to a shortcrcuit between metallisation
and substrate (X 5000)
Case 29:
Fig. 14.47 Hole in a gate oxide, leading to a shortcrcuit between metallisation
and substrate (X 5000)
Case 30:
Fig. 14.48 Cristallisation of a point defect in a thermally grown SiOz (X 4400)
Case 31:
XXXVIII List of figures and tables
Tables
Table 2.15 SYRP prediction vs. accelerated life test (ALT) results
[SYRP/ALT in each column]
Table 2.16 Comparison of reliability prediction procedures
Table 3.1 Resistors; fixed; power
Table 3.2 Resistors; variable; power
Table 3.3 Comparison between metal film and carbon film resistors (general
specifications; charge 0.1 ... 2 W)
Table 3.3 Correlation between storage duration and new forming process
(reactivation) for wet aluminium electrolytic capacitors, for different nominal
voltages and diameters
Table 3.4 Criteria for aluminium electrolytic capacitors drift failures (DIN
41240,41332)
Table 3.5 Tantalum capacitor impedance as a function of frequency
Table 3.6 Correction factor OR for various values of the series resistance Rs
Table 3.7 Aluminium electrolytic capacitors versus tantalum capacitors
Table 3.8 Tested quantities and failures in life testing at +85°C, 1.5 UN, max.
7000h
Table 3.9 Estimated A under derated conditions
Table 3.10 Tested quantities and catastrophic failures in climatic tests
Table 3.11 Percentages outside requirements after the damp heating test
without load: 40°C, RH 90-95%, 21 days
Table 3.12 Percentages outside requirements after the damp heating test
without load: 40°C, RH 90-95%, 21 days
Table 3.13 Percentages outside requirements after the accelerated damp
heating test preceeded by the rapid temperature change test 55°C, RH 95-100%,2
days
Table 3.14 Breakdown voltage and field strength at breakdown
Table 4.1 Results of a comparative reliability study on 400m W Z diodes,
allied and diffused, respectively
Table 4.2 Compared reliability of Z diodes (% defects, after 168 hours
operation, at Pmax)
Table 4.3 Mean temperature coefficient (in %/C) of the Z diodes, between
+25°C and+125°C
Table 4.4 Reliability comparisons at the component level
Table 4.5 Failure rates, predicted and observed
Table 4.6 Catastrophic failures
Table 4.7 Degradation failures
Table 4.8 Catastrophic failures, FRD cards.
Table 4.9 The distribution of the typical failure modes
Table 5.1 The main technologies used to manufacture silicon transistors
Table 5.2 Main bonding techniques for silicon transistors
Table 5.3 Technological variants for power transistors
Table 5.4 Bipolar vs. VMOS transistors
Table 5.5 Dilatation coefficients
Table 5.6 Failure sources (in %) for power transistors encapsulated in TO-3
and TO-220
Table 5.7 Testing conditions for temperature cycling testing of cases TO-3
and TO-220
XL List of figures and tables
Table 12.4 The effect of the hwnidity on the time till the pad interruption (that
is SO% corrosion); the pad has the width = 4!J.lll and the thickness = 11lm
Table 12.5 Relationship between the duty cycle and the equilibriwn state (test
conditions: over-temperature of 20°C, duty cycle 0.15, 85°C and 8S% r.h.)
Table 12.6 A history of failure rate improvements (in FITs) for plastic
encapsulated ICs
Table 12.7 Results of a reliability test program: high hwnidity testing in a non-
saturating autoclave (108°C, 90%RH). SOIC = Small outline IC package, SLCC =
Silicone junction coated IC, CerDIP = Ceramic dual-in-line package (hermetic)
Table 12.8 Results of reliability tests performed by IEEE Gel Task Force
Table 13.1 Classification of defects depending on their effects [13.1][13.2]
Table 13.2 Average indicative figures of the parameters A. ... F and the unit
cost for discrete components, linear and digital ICs [13.5]
Table 14.1 Working plan for a failure analysis for semiconductor components
Table 14.2 Trap characterisation from DLTS spectra
Table 14.3 Examples for the usage of a Scanning Electron Microscope (SEM)
Chap. 15
15.2 Failure rates for components used in telecommunications
15.3 Failure types for electronic components [1S.2]
15.4 Detailed failure modes for some components
15.5 Storage reliability data [IS.3]
15.6 Typical costs for the screening of plastic encapsulated ICs (in Swiss
francs) [1S.4]
15.7 Failure criteria. Some examples
15.8 Results of 1000 h HTB life tests for 8 bit CMOS microprocessors
encapsulated in ceramics, type NSC 800 [IS.5]
15.9 Results of 1000 h HTB life tests for lil).ear circuits encapsulated in plastic
[lS.5]
15.10 Average values of the failure rates for some IC families
15.11 Activation energy values for various technologies
15.12 Failures at burn-in [15.8]
1 Introduction
1.1
Definition of reliability
Reliability is a relatively new concept, which rounds off the quality control and is
linked to the study of quality itself Simply explained, the reliability is the ability
of an item to work properly; it is its feature not to fail during its operation. One
may say that the reliability is the operational certainty for a stated time interval.
This deftnition is however imperfect, because although containing the time factor,
it does not describe precisely a measured size.
As the ftrst reliability studies have been made in the USA, at the beginning the
American deftnition has been adopted: the reliability is the probability that a
certain product does not fail for a given period of time, and for certain opera-
tional and environmental conditions. The reliability of an element (or of an en-
semble) is today deftned as the probability that an item will perform its required
function under given conditions for a stated time interval. I The component reli-
ability involves the study of both reliability physics and reliability statistics. They
have an important contribution to a better understanding of the ways in which the
components fail, and how the failures are developing in time. This provides an
invaluable background for understanding and assessing the real-world failure
patterns of component reliability that come to us from fteld failure studies. The
effort of the researchers has been concentrated on establishing lifetime patterns for
individual component types (or for individual failure mechanisms). Reliability is a
collective name for those measures of quality that reflect the effect of time in the
storage or the use of a product, distinctly from those measures that show the state
of the product at the time of delivery.
In the general sense, reliability is deftned as the ability of an item to perform a
required function under stated conditions for a stated period of time.
I Although this definition corresponds to a concept rich in infonnations, it has however one
disadvantage. Because of the need to specify a defmed operation time of the respective
item, the reliability has different values for each time interval. That is why it is necessary
to defme other sizes, depending not only on the operation time, but also on the mean in-
terval between failures (MTBF - mean time between failures) and on the failure speed
for one hour (A). Nevertheless, it is not sufficient to indicate the failure speed for a cer-
tain constructive element, if the operational and environmental conditions (on which the
failure speed depends) are not simultaneously given.
The stated conditions include the total physical environment (also mechanical,
electrical and thermal conditions). Perform means that the item does not fail. The
stated time interval can be very long (twenty years, as for telecommunication
equipment), long (a few years) or short (a few hours or weeks, as for space re-
search equipment). This parameter might be, too, - for example - the mileage (of
an automobile) or the number of cycles (of a relay unit).
1.2
Historical development perspective
The fIrst studies concerning the electronic equipment and its reliability have been
made in the purpose to improve the military avionics technique and the radar
systems of the army. The mathematical formulation of the reliability and its utili-
sation for material tests originate in ideas born during the Second World War,
when Werner von Braun and his colleagues worked on the VI missiles. They
started from the idea that a chain can't be more resistant than its weakest link. The
studied object was a simple rocket; nevertheless they registered failure after fail-
ure, each time a constructive element gave up, although the components have been
submitted to detailed control tests. The diffIculties appeared less due to the sys-
tematic errors, but rather to the multiple error possibilities arising from the conju-
gation of different aspects concerning the numerous component parts, which acted
simultaneously. So they came up with the idea that all the constructive elements
must playa role in the reliability evaluation.
The reliability of individual parts is usually characterised by their failure rate "-
giving the number of failures over the time unit. The mathematician Erich
Pieruschka was invited to this debate and he stated - for the fIrst time - that the
chance of an ensemble composed of n identical elements to survive is 1/xn. In
exponential terms, we can write, for a constant failure rate: the reliability of an
isolated constructive element is exp(-At} and, consequently, the reliability of n
elements is exp(-nAt}. In the general case (in exponential form or not), the reli-
ability of an element is calculated with:
1? = J/x. (1.1)
The reliability of an ensemble formed of n elements connected in series will be:
l?s = ~ = J/xn. (1.2)
Therefore, the reliability of a series circuit, formed of n elements will be:
n
l?s = 1?}·1?2·1?3·····l?n = Imi. (1.3)
i=]
This equation is known as the ''theorem of reliabilities product". It was, also,
established that the reliability of one constructive element must be much greater
than the asked system's reliability. That is why new constructive elements have
been elaborated, with higher reliability, and fInally - for the VI rocket - an overall
reliability of75% was obtained.
1 Introduction 3
Since that time, the complexity - especially that of electronic systems - has
been growing continuously. This explains why all the engineers - if they desire to
reach and remain at the top of the new technologies - and the manufacturers - if
they do not want to lose the collaboration because of the different interpretation or
signification of the reliability concept - must learn how to use the new methods.
1.3
Quality and reliability
To clarify, from the beginning, the problems - although they are inseparably
bound -, we distinguish some very important properties of the electronic systems.
The German society for Quality DGQ defines the quality as the condition that
makes an object or a functional element to correspond to the pre-established re-
quirements. Another definition says: the quality is the measure in which a compo-
nent corresponds to the properties guaranteed by the manufacturer, beginning
with the delivery momentum to the client. In the following, we understand by
quality a measure of the degree to which a device conforms to applicable specifi-
cation and workmanship standards. The quality is characterised by the acknowl-
edged percentage of defects in the studied batch. The quality of the components is
determined by the design quality and manufacture quality, taking into account an
optimum compromise between requirements and costs. We distinguish, too, be-
tween "design quality" and "quality of the finished object".
The product testing must assure that each unit satisfies the requirements. These
tests can be made on the entire lot, or on samples. If the series cost (which can
appear after the utilisation of defective elements) overpasses substantially the test
costs, using a programmable tester instead of samples testing can increase the
certainty of test results for the entire lot.
Since an operational defect can never be excluded for a given time interval, an
operation without errors can be foreseen only with a certain probability. There-
fore, the bases of the reliability theory are probability theory and statistics. That is
why it must be taken into account that the reliability depends directly on the
manufacturing manner, and also greatly depends on the utilisation mode of the
item. This is underlined by the fact that for the reliability not only the number of
elements from the first series which fail is important, but also the deviations of
their characteristics. We must know for what time period the initial characteristics
are preserved, and how great the variation over time of the deviations is, what are
the percentages of failures during the first operation hours, what is the failure
speed for the operation time, what is the shape of the survival function, and finally
what statistical distribution can be associated with. All these characteristics are
represented in Fig. 1.1.
The reliability is the decision criterion for a component, which fulfils all the
quality requirements. Don't forget that the user can have an important contribution
to prolong (or to shorten) the lifetime of the component. In the past, the system
designers imposed drastically quality conditions, trying to obtain a greater certi-
tude that the constructive elements satisfy the specifications of the certificate
4 1 Introduction
1---------------- 1
i
! .--------------,
1 % Defects (AQL, LTPD, ppm)
i'---------------'
Independent Dependent
on time on time
Parameter distribution f--------'
Parameter stability
,-------------------------------------------------------------------------------------r---------------------------------------------------------------------------
I Evaluation I
Fig. 1.1 Elements of the product quality
Quality
Quality
Service
Service
Price
Price
0 50
0 50
a b
Fig. 1.2 The factors influencing the purchasing of an equipment: a some years ago; b today
of guarantee_ Today, the designers demand acceptable tests that complete the
quality inspection; this is requested to make sure that the manufacturer's specifi-
cations are valid and applicable initially, at input inspections, but also later, after a
longer operation time. Some years ago, the factors influencing the purchase of
equipment or of a system had the ratios shown in Fig. L2a_ Today, these ratios
1 Introduction 5
have changed into those shown in Fig. 1.2b. It can be seen that reliability and
quality make together a total of 50%.
1.4
Economics and optimisation
It is known that the improvement of the systems reliability leads to the diminish-
ing of the maintainability costs. In accordance with the DIN 40042 standard, the
maintainability is defined as a size that estimates the measure in which a studied
element is able to be maintained or restored in the situation permitting to fulfil the
specified function. Another definition (MIL-STD 721 D) of maintainability is: a
characteristic of design and installation expressed as the probability that an item
will be retained in or restored to a specified condition within a given period of
time, when the maintenance is performed in accordance with the pre-scribed pro-
cedures and resources.
Price
I'
c
a
I
~ Reliability
0 0.2 0.4 0.6 0.8 1.0
Fig. 1.3 The optimum zone of the best compromise price/reliability: a first investment costs; b
operation costs; c total costs
Still in the planning phase or in the design phase of a new product a maximisa-
tion of the probability that the desired product will be in the limits of the general
planned costs must be taken into account. Not only an optimal reliability, but also
an optimum compromise between price and reliability (Fig. 1.3) is searched. It can
be seen that if the pursued goal is correctly established, the reliability acts in the
sense of price reduction.
1.5
Probability; basic laws
Modem reliability principles are mainly based upon statistics and probability.
Therefore, in the following some elementary concepts are reviewed.
There are two main definitions of the probability: the classical definition and,
the relative-frequency definition. In the classical definition, if an event can occur
6 1 Introduction
in N mutually exclusive and also likely ways, and if n of these outcomes are of
one kind A, the probability of A is niN For example, the probability of a head or
tail in the toss of a coin is 1/2. The classical definition is not widely used in real
applications.
In the relative-frequency definition of probability, a random experiment is re-
peated n times under uniform conditions, and a particular event E is observed to
occur in J of the n trials. The ratio jln is called the "relative frequency" of E for
the first n trials. If the experiment is repeated, a sufficiently large number of times,
the ratio ofjln for the event E approaches the value P, the probability of the event
E. This definition indicates that the probability is a number between 0 and 1:
O::;P::;I. (1.4)
There are three basic laws (for complementation, for addition and for multipli-
cation):
Law oj complementation. If the probability that the event A does not occur is
P(A), then:
P(A) + P(A) = 1 (1.5)
1.5.1
Probability distributions
Xj F(x)
Fig. 1.4 Relationship between the probability density functionf(x) and the cumulative distribu-
tion function F(x)
/f(x)dx = 1. (1.16)
-00
(1.17)
and
and
where 17 is the scale parameter, fJ is the shape parameter, and y is the location
parameter. If the failures can start as soon as the devices are operated, then y= O.
The fJ parameter of the Weibull distribution is important in determining the failure
rate:
For fJ < 1, the failure rate is decreasing; for fJ = 1, the failure rate is constant;
and for fJ > 1 the failure rate is increasing. Therefore, the Weibull distribution can
be used to characterise components that are subject to infant mortality, random
failures, or wearout failure.
1 Introduction 9
1.5.2
Basic reliability distribution theory
Almost every discussion on reliability begins and ends with the statement of fail-
ure rates for either components or systems. Some very basic and interesting reli-
ability equations can be developed. For example, if the probability of a successful
event is represented by R(t) and the probability of an unsuccessful event (a failure)
is represented by F(t):
F(t) = J/(t)dt (1.24)
o
and the probability of success is:
R(t) = I-H(t)dt. (1.25)
o
F(t) is the distribution function for the probability of failure (the probability
that a device will fail until the time moment t). R(t) is the distribution function for
the probability of success (the probability that a device will not fail until the time
moment t). The probability that failures will occur between any times tJ and t2 can
be calculated from the probability function
p =/Jrt)dt (1.26)
tJ
and since all devices and systems have a finite lifetime:
00
P = /f(t)dt = 1. (1.27)
o
The density function f(t) may be derived from (1.25), by differentiating:
f(t) = dR(t) / dt = R'(t). (1.28)
Another expression that is always part of every reliability discussion is mean time
to failure, MITF, used for non-repairable systems. The mean time between (suc-
cessive) failures, MI'BF, is used if the system recovers to the same state after
each failure [1.33]. MTBF values must be computed with different reliability
distributions for different time periods between failures. By using the mathema-
tical expectation theorem, MITF can be expressed as:
00
where A is the failure rate, and t is the time, can be derived from the Poisson dis-
tribution by using the first term of this distribution (for x = 0). The probability
density function of the exponential distribution is:
f(t) = A exp(-At). (1.32)
The MITF can be calculated with (1.29). Making the substitution for f(t):
00 00
!4-----Z(t) = A _ _ _ _ _---j~~1
o '------+--------------+-------..time
o time
Reliability function
Fig. 1.5 Relationship of shapes of failure rate (A), failure density (B), and reliability function
(C)
One can see that the MITF of the negative exponential is equal to the recipro-
cal of the failure rate; this relationship holds only for the negative exponential
distribution. The failure rate of the negative exponential distribution is:
Z(t} = f(t)IR(t) = [A exp(-At}] 1 exp(-At} = A. (1.34)
1 Introduction 11
R(t), F(t)
R(t) + F(t) = 1
1.00 F(t)
0.63
0.37
R(t)
time
o m
Fig. 1.6 Reliability and probability offailure
1.6
Specific terms
To avoid the lack of understanding, we must clarify from the beginning the cha-
racteristic terms and expressions of the reliability vocabulary. At the end of this
book you will find a glossary with the most frequently used reliability terms. Here
only some important notions will be presented.
A device or an item is any component, electronic element, assembly, equip-
ment that can be considered individually. It is a functional or structural unit, which
is considered as an entity for investigations. It may consist of hardware and/or
software and also include, if necessary, human resources.
12 1 Introduction
1.6.1
The generalised definition of failure rate (A) and of
mean time between failures (MTBF)
The failure rate can be deduced considering that the test begins at the moment t =
o with no components. After a time t will survive ns components, also nf compo-
nents have failed;
(1.38)
R(t) = 1 - F(t)
t
F(t) = If(x)dx
o
t
R(t) = exp[- Jz(x)dx]
o
t
f(t) = Z(t) exp[- Jz(x)dx]
o
Z(t) = f(t) / R(t)
00 00
The failure rate is given by: dnfl dt. This ratio can be interpreted as the number of
14 1 Introduction
components which fails in the unit of time. As ns components survived, the failure
rate of each component is
A = (1 I nJ(dnfl dl). (1.39)
The reliability at the time 1 can be expressed as the probability of non-failure for
the interval (0, t]. While from initial no remained only ns:
(1.40)
Differentiating, we obtain:
dR(t) I dl = - (llno}{dnfl dt) (1.41 )
and
dnfl dl = - no[dR(I} I dl}. (1.42)
From (1.39) and (1.42) it results:
A = 1 I ns [-no(dR I dt)]. (1.43)
But, in accordance with (1.40), A(t) - in s·\ - becomes:
A= - 1 I R(t) [dR(I} I dl}. (1.44)
This relation has a general validity if nothing is known about the variation in
time of 'A. The unique restrictive condition is that 'A must be always positive and -
like R(t) - must be a monotone decreasing function. By integrating the relation
(1.44) between 0 and I, we obtain:
t R
1Adl = -IdR(t) I R = - In R(t). (1.45)
o 1
For t = 0 and R = 1 we have
t
R(t} = exp[-IMI}. (1.46)
o
In electronics, the problem is simplified if we consider 'A constant; in this case:
R(t) = exp(-AI} (1.47)
and - in accordance with (1.28) - we have
(t) = Aexp(-At). (l.48)
It can be proved also, that for a working interval (I, t+ ,1I), the reliability is given
by:
t+Llt
exp[-AIdt} = exp(-A,1t}. (1.49)
t
Obviously, the working moment (the age) from the expression (1.49) is not im-
portant, but only the time interval Lit, measured at a certain reference moment, at
which the item was still in operation. If LlI represents the duration of an experi-
ment, then for this experiment the components have the same reliability at differ-
ent ages. In statistics it is considered, that the mean value of a given distribution
1 Introduction 15
f(t) is obtained from the moment of the first order off(t), namely tf(t), the integral
being calculated from t = 0 to t = GO. From the mean of the failure times the good
operation time - MTBF (for repairable systems) or MTTF (for nonrepairable sys-
tems) are calculated. The general expression of MTBF (respectively MTTF) is:
co
m = / tf(t)dt. (1.50)
o
With the aid of the relation (1.28), we can write:
00
it follows that
limit R(t)] = o. (1.55)
t-fCO
m = /R(tJdt. (1.56)
o
If A = constant, then
00
1.7
Failures types
One may distinguish three failure types. (To be noted that the manipulation, trans-
port and faulty failures are not taken into account.) They appear even if the user
does not make any error.
First, there are failures that appear during the early period of component life
and are called early (infantile) failures. They can be explained through a faulty
manufacture and an insufficient quality control in the production. They can be
eliminated by a systematic screening test.
Wearout failures, the second category, constitute an indicator of the compo-
nent ageing.
16 1 Introduction
1.7.1
Failures classification
caused by
inherent weakness Cause
affailure
Emergence
& test
revealed by an revealed by a
interruption of test programme
operation
Nature
affailure
- partial failure
- intermittent failure
• Depending on emergence manner:
- catastrophic failure
- degradation failure.
Figure 1.7 gives a general picture of the most usual failure categories. Being fa-
miliar with the real failure mechanism facilitates both the selection of best compo-
nents and their correct use, and helps to the reliability growth, in general.
1.8
Reliability estimates
Two methods are generally used to make reliability estimates: (i) parts counts
method and (ii) parts stress analysis method.
The parts counts method requires less information, generally that dealing with
the quantity of different part types, quality level of the parts, and the operational
environment. This method is applicable in the early design phase and during
bid/proposal formulation.
Parts stress analysis requires the greatest amount of details and is appli-cable
during the later design phases where actual hardware and circuits are being de-
signed.
Whichever method is used, the objective is to obtain a reliability estimate that
is expressed as a failure rate; from this basic figure, R(t) and MIBF may be devel-
oped. Calculation of failure rate for an electronic assembly, unit or system requires
knowledge on the failure rate of each part contained in the item of interest. If we
assume that the item will fail when any of its parts fail, the failure rate of the item
will equal the sum of the failure rate of its parts. This may, in general, be ex-
pressed as:
n
(1.58)
i=]
where II; = failure rate of the ith parts, and n = number of parts.
Parts count reliability prediction [1.3][1.4]: the information needed to use the
method is: (i) generic part types (including complexity of microelectronics) and
quantities; (ii) part quality levels; and (iii) equipment environment. The general
expression for equipment failure rate with this method is:
n
A = IN/AClrc), (1.59)
i=i
for a given environment, where:
A =total equipment failure rate (failuresll Oh)
AG = generic failure rate for the ith generic part (failures/10 6h)
1tQ = quality factor for the ith generic part
N = quantity of ith generic part
n = number of different generic part categories.
18 1 Introduction
stress level 3
stress level 2
stress level 1
Temperature
Fig. 1.8 Part base failure rate versus stress and temperature
It should be noted that there are certain fundamental limitations associated with
reliability estimates. The basic irifonnation used in part failure rate models is
averaged over a wide data base involving many persons and a variety of data col-
lection methods and conditions which prevent exact co-ordination and cor-
relation. The user is cautioned to use the latest part failure data available, as part
failure rates are continuously improving.
1 Introduction 19
1.9
"8ath-tub" failure curve
The time between successive failures is a continuous random quantity. From the
probabilistic standpoint, this random quantity can be fully determined if the distri-
bution function is known. These failure models are related to life test results and
failure rates via probability theory.
Figure 1.9 shows a typical time versus failure rate curve, the well-known "bath-
tub" curve. In the region of infant mortality the high failure rate is attributed to
gross built-in flaws which soon cause the parts to fail. After this zone - under
certain circumstances - the failure rate remains constant; this is the useful operat-
ing life. These part failure rate are usually summed up to calculate the inherent
system reliability. Finally, whatever wear or ageing mechanisms are involved,
they occur in the wearout time (here the failure rate increases rapidly).
A(t)
Infant
mortality ~Wearout period
/ / {}2> {}J
.,/
Time
Fig. 1.9 The "bath-tub" failure curve of a large population of statistically identical items, for two
ambient temperatures e] > e1 for electronic components
The "bath-tub" failure curve gives a good insight into the life cycle reliability
performance of an electronic system. Depending on the physical meaning, the
random quantities obtained can have different probability distributions laws (ex-
ponential, normal, Weibull, gamma, Rayleigh, etc.). Over the burn-in period of
operation, the bath-tub curve can be represented by gamma and/or Weibulllaws;
over the normal period of operation, by the exponential distribution; over the
wearout period of operation, by gamma and normal distributions. Thus, most
component failure patterns involve a superposition of different distribution laws.
Consequently, with the aid of the above laws, a failure density function, a relia-
bility function and MTBF expression can be obtained. In practice, this is a very
difficult task, hence approximation and much judgement is involved. Each ob-
server may consequently give a different solution to any distribution.
Voices claming that the "bath-tub" failure rate curve does not hold water any-
more [1.112] must also been reviewed.
20 1 Introduction
As it has been seen, the task of reliability modelling may be difficult and the
best a reliability engineer can do is to analyse a system through a simple model-
ling configuration.
1.10
Reliability of electronic systems
1.10.1
Can the batch reliability be increased?
1.10.2
What is the utility of screening tests?
Amount offailures
per 1000 circuits
100
10
0.1
0.01
0.001
0.0001
Number of Ie gates
0.00001
10 102 103 104
. - SSI_-.~....~_MSI ~~ LSI ----..- VLSI .....
Fig. 1.10 Variation offailure rate in function ofIe complexity
All the bibliographical sources agreed that the selection level is the one that al-
lows an economical approach of the electronic systems reliability. The Table 1.2
presents a comparison of the costs for four selection levels and three products
categories. It results that is more economically to identify and eliminate a defect
component by the input controls, and not by the equipped PCBs controls.
An empirical rule says that these costs growth with a magnitude order at each
successive control level. The more advanced the selection level, the more impor-
1 Introduction 23
tant are the costs. As a result it is recommended [1.5] to utilise 100% input con-
trols, justifying this unusual proceeding through a detailed economical analysis.
FAlLURE MECHANISMS
Electrical unstability
Thermal mismatc h
External failures
Encapsulaion fai lures
Seal failures
Contamination
Wires and solder
~
.ll "
logf.1m"
Surface substrate failures
~~d","
Manufacturing fa ilures
Mounting substra
SCREENING TE Fllr
Optical Internal visual in spection • • •• •
External visual in spection t.
.- •
Mechanical Centrifugation • •• • I
Shock • •• •
Vibration •
.. · • • 1:.
f--+-- 1- +-- I--
Thermal High temperature storage
Thermal cycles • • • r
·,. •
•
I
Thennal shocks • • • • !. •
r
Bum-in •• • •
Electrical X-rays
Waterproofs
• t--t
~+-+- ~ t-
.
• !.
:
l..--'-L-..L...-I
Fig. 1.11 Failure mechanisms detectable with the aid of screening tests
At this present, the greatest part of the available data concerning the pre-ageing
(or the selection) refers deliberately to components and, particularly, to les. The
principal result (Fig. 1.11) is the revelation of some failure mechanisms and -
implicitly - of some new selection proceedings of the defect items, non-
satisfactory, marginal or with likely early failures (potential defects items). All
these definitions are presented in MIL-S-19500.
Till the end of the 80's, the plastic encapsulated devices were used only if the
environmental variations were relatively reduced, and the reliability performance
24 1 Introduction
are reasonable. The progresses obtained in the 90's produced the so-called "Acqui-
sition reform" (see section 2.1.5).
The tests constituting the screening tests must have the best ratio effectiveness
Icost. An analysis of these tests is given in Table 1.3.
Besides these aspects, there are other elements that have a certain influence on
the ratio cost/reliability [1.37]:
• the relations between the manufacturer and user;
• the confidence level pitched to the provider;
• the inspections effectuated by the user at the provider;
• the utilisation of an unique specifications set;
• the centralised supply, on the base of a plan that contains more providers.
1.10.3
Derating technique
One of the most used methods to improve the reliability of the equipped printed
circuits boards (PCBs) is the derating technique (the mounted component is ex-
posed to voltages, currents, tests, temperatures, far bellow the nominal operating
values; in this wayan increase of the lifetime duration for the respective compo-
nent is obtained). The underloading values can be found by the manufacturer or in
failure rates handbooks such as CNET Handbook [1.36] or MIL-HDBK 217
[1. 76]. This data - in which the values corresponding to the prescriptions are taken
as parameter - can provide specific failure rates for each one of the operating
1 Introduction 25
conditions. So one must begin with the study of the operating conditions of the
system, by evaluating - in percentage of the nominal values - the voltage, the load
and the temperature, for each component. With the aid of the given tables the
value for the specific operating conditions can be determined and the sum of the
failure rates with a tolerance of approximately 10% can be found, allowing to take
into account the solder joints, the connections, etc.
On demand, it can be foreseen special selection tests (thermal cycles, high tem-
peratures, thermal shocks, vibrations).
By using a minimum number of components operating far bellow the nominal
values, the circuit designer himself may settle the circuit reliability.
If the reliability problem is correctly treated, any apparatus, device or equip-
ment can be decomposed in modules, subsystems, units, ensuring for each ele-
ment the best reliability level, so that the desired reliability of the ensemble can be
obtained.
1.10.4
About the testability of electronic and telecommunication systems
Now the tests represent 35-45% of the production costs (it is not a productive
operation, because the tested PCB has no added value before, but after tests). The
enterprises being exposed on the market to the international competition, must:
• design more quickly (to be present early on the market)
• produce more rapidly (to shorten the putting into fabrication)
• produce cheaper (to be competitive)
• produce the best quality (to reduce the cost of the non-quality) to maintain the
commercial position on the market, to enlarge the sphere of sales.
The solution of the problem: (i) to select a testability politics that permit the
achievement of all these objectives; (ii) to design products that are easly testable.
For the future, the following tendencies are important:
1.10.5
Accelerated ageing methods for equipped boards
These methods are complementary with the screening performed at the component
level.
1.10.6
Operational failures
In the past, usually, the reliability of a system was quantified based on the results
obtained from laboratory tests, the testing conditions being chosen to simulate, as
closed as possible, the real operational conditions. Unfortunately, various con-
straints - such as: equipment cost and lack of knowledge on real operational con-
ditions - determine that the results of laboratory tests are far enough from the real
operational results. This explains why a direct research on the operational behav-
iour is always desirable. But this operation is not as simple as it seems at first
sight. Before to perform the study of system reliability, some other operations
must be solved, allowing to obtain results as closed as possible to the real case.
These problems can be divided in three categories:
• practical problems
• mathematical problems
• data processing problems
Further on, we will try to describe these problems and to find viable solutions.
Practical problems
Theoretically, the collecting of information on system or equipment operation
is simple. One must only to fill a form, each time the system is connected or dis-
connected, and each time a failure occurs.
However, the experience shows that this procedure is not a simple one when
the entire life of equipment must be covered. Moreover, the form is often too
sophisticate for the personnel required to fill it, or the time constraints are impor-
tant. In all these cases, the obtained information is affected by serious doubts. The
solution is to be extremely cautious and careful at the defining of the required
information. It is important to correlate this information with the defined purpose
and to instruct the personnel not only how to fill the form, but also on the purpose
of this operation and to be aware that a high confidence degree about the informa-
tion is extremely important.
However, even for well-organised collecting systems, with well-trained and
motivated personnel, some uncertainties may arise: writing errors, or miss-
interpretations of the handwriting. Other problems are connected with the real
cause of the replacing of some components. Without speaking about the time
elapsed between the failure moment and the moment the failure is reported.
Eventually, there is a very frequent possibility to not find an explanation for a
system failure: that is at the subsequent repair of the system no defect is identified
(in Fig.l.12 this case was not included). There are many possible explanations for
such a situation, but, essentially, the lack of information must be the cause.
28 1 Introduction
~ Connectors
f!53!iEa Capacitors
Semiconductors
Resistors
Various
ICs
Solders
0 10 20 30 40
Fig. 1.12 Typical defects in an electronic system, arisen during the useful life
• a file identifying the system structure and describing details on the system
components,
• a file containing details on the observed failures.
Normally, the information about system operation is not structured and an indi-
vidual ''translating'' soft must be created for each company. If the company has an
well-organised system, in accordance with the requirements of the analysed sys-
tem, this problem can be easily solved.
1.10.7
FMEA/FMECA method
Anytime (whenever) the failure rate (predicted reliability) for critical components
of a system, especially for systems using the redundancy, is to be analysed, a
failure analysis must be performed. The method, known as FMEA (Failure mode
and effect analysis) or FMECA (Failure mode, effect and criticality analysis), is a
systematic research about the influence of possible defects on the reliability of a
component and about the influence of this component on other elements of the
system. The research takes into account various failure rates and their causes,
allowing determining the potential dangers. The efficiency of the measures pro-
posed for avoiding the probability of appearance for these failures is also investi-
gated. The method FMEAlFMECA takes into accounts not only failures, but also
errors and mistakes.
A development engineer, with the help of a reliability engineer performs
FMEAlFMECA upstream. Further on, details about the procedure are presented.
Step 1. A description of the function for the studied element (such as a transistor, a
resistor, etc.) is given. If possible, references about the bloc-diagram of the system
reliability are made.
Step 2. A hypothesis about a possible failure mode is made. In this case, the phase
of the mission for the studied system must be taken into account, because a failure
or a mistake in an early operational period can be easily avoided. For each ele-
ment, all possible defects must be considered, one by one.
Step 3. The possible cause must be described for each possible defect identified
step 2. This is used for calculating the probability of appearance (step 8) and for
elaborating the necessary protection measures (step 6). A failure mode (short
circuit, open circuit, parameter drift, etc.) may have various causes. Moreover, a
primary defect or a secondary defect (produced by other defect) may arise. All
independent causes must be identified and carefully investigated.
Step 4. The symptoms for the failure mode presumed at step 2 and the possibilities
to localise the failure must be given. Also, a short description of the repercussions
of the failures for the studied element and for other elements must be made.
Step 5. A short description of the effects of the failure mode (presumed at step 2)
on the reliability of the entire studied system must be performed.
Step 6. A short description of the proposed measures for reducing the effect of the
failure and the probability of its appearance, and allowing the continuance of
system mission, must be given.
30 1 Introduction
Step 7. The importance of the presumed failure mode on the reliability of the
whole system must be estimated. The estimation figures cover, usually, the fol-
lowing range:
1 - no influence (sure)
2 - partial failure (noncritical)
3 - total failure (critical)
4 - overcritical failure (catastrophic)
The fuzzy type estimation is based on the skill of the reliability engineer
Step 8. For each presumed failure mode (step 2), the probability of failure (or the
estimated failure rate) must be calculated, taking into account the causes identified
at step 3. The usual evaluation range contains the following fuzzy type items:
• A - frequently
• B -probably
• C - less probably
• D - Improbably
• E - very improbably
Step 9. The previous observations are recalled and new ideas are stimulated to
arise, especially about the necessary corrective actions.
1.10.8
Fault tree analysis (FTA)
1.10.B.1
Monte Carlo techniques
There is more than one fault tree simulation programme developed to describe
systems and provide quantitative performance results. The Monte Carlo technique
1 Introduction 31
has the ability to include considerations that would be very difficult to include in
analytical calculations. The programme views the system represented by the fault
tree as a statistical assembly of independent basic input events. The output is a
randomly calculated time to failure (TTF) for each basic block, based on the as-
signed MI'BF. The system is tested, as each basic input event fails, to detect sys-
tem failure within the mission time. A time to repair (TTR) is predicted, based on
the MI'TR values with detection times and a new TTF value assigned to each
failed basic input event to permit failure after repair (Fig. 1.14).
Feasibility stu-
dies and logis-
tics concepts
Systems analYSIS,
optimisation, syn-
thesis & defmition
uerauea eqUipmem
design, layouts, parts ~/n-service
lists, drawings, sup- design review
port data
l'aoncatlon, as-
k::Conceptual k:: System design sembly, test, in-
design review reViews spect, deploy
operational
equipment
Op~rate ~ mam-
~Equipment tam eqUipment
design reviews in the field
System failure
The process continues until the mission period is reached or the system fails. A
new set of randomly selected values is assigned to the basic blocks and the pro-
gramme is rerun. After a significant number of such trials, the user obtains:
• system probability of failure;
• probability of success;
• subsystem/component contributions to system failure identification;
• subsystem failures are recorded for performance comparisons.
32 1 Introduction
1.10.9
Practical recommendations
• Prepare an initial list of all components. One knows that, roughly, the reli-
ability of a system is determined by the reliability of the components. Conse-
quently, the supplier of the components must carefully chosen. For economi-
cal reasons (time and money), the number of components must be drastically
diminished. Often, to determine the component quality, only a control of the
producer specifications is needed. And this control can be made by a data
bank. For doubtful cases, reliability tests must be organised (damp heat, tem-
perature cycling, thermal shocks, vibrations). For memories, microprocessors
and, generally, for LSI and VLSI circuits, sophisticated and expensive sys-
tems are needed. Moreover, even during manufacturing, one must establish if
1 Introduction 33
the input control is made 100% or by samples. In the last case, one must state
ifLTPD method or AQL method is used and the exact control values for each
component.
• State the quality and reliability requirements before starting the manufactur-
ing. It is likely to pre-determine the MTBF value for the future product, the
implications on the warranty costs and on the market chances of the future
product, etc. The best product may not be taken into account if a failure arises
after 2-3 weeks and, consequently, the producer must reset the manufacture
once more. Even during the manufacturing process, the magnitude order of
the future failures must be estimated.
• State a control strategy, prepare all the details of the control specifications and
demand a manufacturing process with an easy access to the measure and con-
trol points, with reduced maintenance and small costs.
• Organise periodically reliability analyses. Even during the manufacturing,
analyses must be performed, to determine the potential reliability of the proj-
ect.
• Perform early tests with increased stress level for some prototypes. The pur-
pose of these tests is to identify the weak points of the design, for operational
conditions, but also for all higher stresses stated in the product specification.
This stress catalogue (shocks, vibrations, high temperatures and humidity, du-
ration, etc.) must be prepared before starting the manufacturing.
• Form workteams and inform regularly the manufacturing department, sales
department and public relation department on the specified problems and on
progress obtained in manufacturing the new product.
• Made a design review, involving the head of manufacturing department, the
sales engineer, the control team, etc.
• If the inherent reliability of the components is too small, the derating tech-
nique must be used. The result will be a decrease of the system failure rate
and an increase of the lifetime. If the price of the common component is taken
as a unity, the price of the component with high reliability (tested 100%, ac-
cording to MIL-STD-883D) increases at least 1.2 ... 1.5 times. For very high
reliability components (military use, etc.), the cost may be multiplied up to
5 ... 20 times.
1.10.10
Component reliability and market economy
Two factors are the most important at the development of a new product: the mar-
keting and the manufacturing. The marketing is a strategic activity, because the
life cycle must be correlated with the cost. In this respect, the duration of the de-
sign phase and the number of iterations required for developing a high quality
product must be drastically reduced. In the manufacturing field, the number of
iterations till the development of anew process must be also diminished. For both
factors, an important problem is that of testing.
34 1 Introduction
In situ testing
Analysis of fabrication
defects
Fig. 1.15 Possible testing scenario, from input control to system testing. To reduce the duration
required for each developing step, specific testing methods will be developed
For each component family, an acceptable quality level (AQL) must be defined.
This AQL may be assured by three ways: i) quality certifying at the provider,
allowing to the component user to avoid component testing; ii) control of limited
number of samples, if the quality level is closed to the specifications; iii) 100%
testing, when the required quality level is far superior to the quality level assured
by the provider. In the last two cases, a testing programme is needed. For LSI
circuits, no standard testing programme is available and consequently the own
specialists of the user must develop a specific testing programme.
Further on, the quality of the equipped cards must be carefully controlled, by
computer aided methods. For each new equipped card, the development depart-
ment creates a new file. The testing engineer creates in situ tests, based on func-
tional testers. A new concept is to design even the testability of the system; self-
control programmes being inserted in the system. Finally, the system is tested
before delivering, with the soft that will be used in operational life. With this
method, short installation times may be obtained.
1 Introduction 35
1.11
Some examples
To lUlderstand better how to apply some of the precedent notions, several practi-
cal examples are given in the next pages. They will aid the reader to have a better
and more complete image concerning the reliability aspect problems. It will be
considered that A = constant.
Example 1.1 - A certain number of tape recorders has been operated 20 000
hours. During this time 8 repairs have been made. If A = constant, then MTBF is
20 000 : 8 = 2500 hours, and the failure rate is 8 : 20 000 hours = 0.0004 failures
per operating hour.
Example 1.2 - For a tested sample, the failure rate will have a likely value
evaluated on the basis of sample data; A is calculated with the ratio
A, = (number offailures) : (total operation time) (1.61)
It was taken a sample of 10 items and after 250 operation hours 2 failures re-
corded; the rest of 8 items survived - without failures - during a 2000 hours test.
We may write:
A, = 2 : [(2 x 250) + (8 x 200)] = 2 : 16500 = 0.0001212 failures/hour =
These parameters should not exceed the limit values indicated in Table 1.4. The
measured data (before and after the reliability tests) are given in Table 1.5.
Before we calculate A, we must remember some rules:
• If - at the end of the reliability test - an item exceeds the maximum prescribed
limit value, the item must be considered as defective.
• The items which exceed the prescribed limits before the reliability test, will not
be considered for the calculus of the failure.
36 1 Introduction
• If-for an item - more parameters have been affected, it will be considered that
a simple failure (and only one) has occurred.
• If - during the intermediate controls - some items are identified as overreach-
ing the failure limits, they will be also taken into account as failed items, even
if later they do no longer exceed the prescribed limit values.
For the calculus of the operation hours, it is considered that the respective item
is failed immediately after the last measurement.
Table 1.5 Experimental data, before and after reliability tests (RT)
With these rules in mind and taking into account that the item 1 and 3 failed
after 200 hours, for a 1000 hours test it results:
A = 2/ [(2 x 200) + (12 x 1000)J = 16.12 X 10.5 failures/hour =
= 16.12%/1000 h (1.63)
Example 1.4 - A reliability test with 100 items gives after 5000 hours the fol-
lowing result: after 2000 h - one failure; after 4000 h - two failures. What is the
value of the mean operation time?
A, = 3/[(1 x 2000) + (2 x 4000) + (97 x 5000)} = 6.06 x 1006 failures/hour
and
References
1.1 AFCIQ (1983): Donnees de fiabilite en stockage des compos ants electroniques
1.2 Ambrozy, A. (1982): Electronic Noise. McGraw-Hill, New York
1.3 Arsenault, 1. E.; Roberts, J. A. (1980): Reliability and maintainability of electronic sys-
tems. Computer Science Press
1.4 Arsenault, J. E. (1980): Screening. Reliability and maintainability of electronic systems,
pp. 304-320. Computer Science Press, Rockville
1.5 Bajenesco, T. 1. (1975): Quelques aspects de la fiabilite des microcircuits avec enrobage
plastique. Bulletin ASEIUCS (Switzerland), vol. 66, no. 16, pp. 880-884
a
1.6 Bajenesco, T. 1.: (1978): Initiation la fiabilite en electronique modeme. Masson, Paris
1.7 Bajenescu, T. 1. (1978): ZuverHissigkeit in der Elektronik. Seminar at the University of
Berne (Switzerland), November 6
1.8 Bajenescu, T. 1. (1979): Elektronik und Zuverliissigkeit. Hallwag Verlag, Bern & Stuttgart
1.9 Bajenescu, T. 1. (1981): Wirtschaftliche Altemativen zu "Bum-in"-Verfahren. Fach-
sitzungsprogramm Productronica 81, Munich
1.10 Bajenescu, T. 1. (1981): Grundlagen der Zuverliissigkeit anhand von Bauelemente-zuver-
liissigkeit. Elektronik Produktion & Priiftechnik, no. of May-September
1.11 Bajenescu, T. 1. (1981): Qu'est-ce que Ie "bum-in"? Electronique, no. II, pp. ELl-EL3
1.12 Bajenescu, T. 1. (1982): Contr61e d'entree et fiabilite des composants electroniques.
L'Indicateur Industriel no. 1, pp. 17-19
1.13 Bajenescu, T. 1. (1983): Quelques aspects economiques du "bum-in". La Revue Polytech-
nique (Switzerland), no. 1439, pp. 667-669
1.14 Bajenescu, T. 1. (1983): Pourquoi les tests de deverminage des composants? Electronique,
no. 4, pp. EL8-ELlI
1.15 Bajenescu, T. 1. (1984): Relais und Zuverliissigkeit. Aktuelle Technik (Switzerland), no. 1,
pp. 17-23
1.16 Bajenescu, T. 1. (1985): Einige Gedanken tiber Qualitiits- und ZuverHissigkeitssicherung
in der Elektronikindustrie. Aktuelle Technik (Switzerland), no. 3, pp. 17-20
1.17 Bajenescu, T. 1. (1989): La testabilite: pourquoi et comment. La Revue Polytechnique
(Switzerland), no. 1514, p. 884
1.18 Bajenescu, T. 1. (1992): Quality Assurance and the "Total Quality" concept. Optimum Q
no. 2 (April), pp. 10-14
1.19 Bajenescu, T. 1. (1993): Einige Aspekte der Zuveriiissigkeitssicherung in der Elektronik-
Industrie. London
1.20 Bajenescu, T. 1. (1993): Wann konunt der niichste Uberschlag? Schweizer Maschinen-
markt no. 40, pp. 74-81
1.21 Bajenescu, T. 1. (1998): On the spare parts problem. Proceedings of Optim '98, Bra'tov
(Romania)
1.22 Barlow, R. E.; Prochan, F. (1965): Mathematical theory of reliability. J. Wiley and Sons,
Inc., New York
38 1 Introduction
1.23 Bazovsky, I. (1961): Reliability theory and practice. Prentice Hall, Inc.
l.24 Beckmann, P. (1968): Elements of applied probability theory. Harcourt, Brace and World,
Inc., New York
1.25 Bellcore, TR-332 (1995): Reliability prediction procedure for electronic equipment. 4th
Edition, Bellcore, Livingston, NJ
1.26 Bell Laboratories (1975): EMP engineering and design principles. Bell Telephones
1.27 Beneking, H. (1991): Halbleiter-Technologie. Teubner Verlag, Stuttgart
1.28 Berger, M. C. (1980): Experience pratique de deverminage de compos ants electroniques.
Actes du second colloque international sur la fiabilite et la maintainabilite. Perros-Guirec-
Tregastel, September 8-12
1.29 Birolini, A. (1997): Quality and reliability of technical systems (second edition).
Springer-Verlag, Berlin
1.30 Blanks, L. (1992): Reliability procurement & use: from specification to replacement. John
Wiley & Sons, Inc.
1.31 Blanquart, P. (1978): Interet de la normalisation des modeles de compos ants par un or-
ganisme international. Electronica, Munich, November 10
1.32 Brombacher, A. C. (1992): Reliability by design: CAE techniques for electronic compo-
nents and systems. J. Wiley and Sons, Chichester
1.33 Ciltuneanu, V. M.; Mihalache, A. N. (1989): Reliability fundamentals. Elsevier, Amster-
dam
1.34 Christou, A. (1994): Reliability of Gallium Arsenide monolithic microwave integrated
circuits. John Wiley & Sons, Inc.
1.35 Christou, A. (1994): Integrating reliability into microelectronics manufacturing. John
Wiley, Design and Measurement in Electronic Engineering Series
1.36 CNET RDF 93 (1993): Recueil de donnees de fiabilite des compos ants electroniques.
CNET, Lannion; also as British Telecom Reliability Handbook HRD5, and Italtel Reli-
ability Prediction HDBK IRPHB93
1.37 Compte, Le, M. (1980): Modes et taux de defaillance des circuits integres. Actes du sec-
ond colloque international sur la fiabilite et la maintainabilite, Perros-Guirec-Tregastel,
8-12 Sept., p. 491
1.38 Crosby, P. B. (1971): Qualitat kostet weniger. Verlag A. Holz
1.39 Danner, F.; Lombardi, J. J. (1971): Setting up a cost-effective screening program for ICs.
Electronics, vol. 44 (30 August), pp. 44-47
1.40 Dhillon, B. S. (1986): Human reliability. Pergamon, New York
1.41 DIN 40039: Ausfallraten Bauelemente
1.42 Dorey, P. et al. (1990): Rapid reliability assessment ofVLSIC. Plenum Press
1.43 Dubi, A. et al. (1995): Monte Carlo modeling of reliability systems. Proceedings of ES-
REDA EC&GA meeting and seminar, Helsinki, May 16-18
1.44 Dull, H. (1976): Zuverlassigkeit und Driftverhalten von Widerstanden. Radio Mentor no.
7,pp.73-79
1.45 Ekings, J. D. (1978): Bum-in forever? Proceedings of the Annual Reliability and Main-
tainability Symp., pp. 286-293
1.46 Feller, W. (1968): An introduction to probability theory and its applications. John Wiley
& Sons, Inc., New York
1.47 Fiorescu, R. A. (1986): A new approach to reliabilediction is needed. Quality and Reli-
ability Engineering Internat., vol. 2, pp. 101-106
1.48 Friedman, M. A.; Tran, P. (1992): Reliability techniques for combined hardware/ software
systems. Proc. Annual Reliability and Maintainability Symp., pp. 290-293
1.49 Frost, D. F.; Poole, K. F. (1989): RELIANT: A reliability analysis tool for VLSI intercon-
nects. IEEE Solid-State Circuits, vol. 24, pp. 458--462
1.50 Gallace, L. J. (1974): Reliability - an introduction for engineers. RCA ST-6342, Som-
merville, N.J.
1.51 Goldthwaite, L. R. (1961): Failure-rate study for the log-normal life time model. Proc.
Seventh Nat. Symp. on Reliab. and Quality Control in Electronics, Philadelphia, Pa.,
January
1 Introduction 39
1.52 Graf, R. (1974): Electronics data book. D. Van Nostrand, New York
1.53 Guillard, A. (1980): Le deverminage de composants: est-ce utile? Bilan d'une experience.
Actes du second Colloque International sur la Fiabilite et la Maintainabilite, Perros-
Guirec-Tregastel, September 8-12
1.54 Hakim, E. B. (1988): Microelectronic reliability, Tome II. Artech House, London
1.55 Hannemann, R. 1. et al. (1994): Physical architecture of VLSI systems. John Wiley &
Sons, Inc.
1.56 Harrison, R.; Ushakov, 1. (1994): Handbook of reliability engineering. John Wiley &
Sons, Inc.
1.57 Henley, E. J.; Kummamoto, H. (1992): Probabilistic risk assessment. IEEE Press, Pis-
cataway, N. J.
1.58 Hernandez, D. et al. (1978): Optimisation cout-fiabilite des composants - I'exemple du
lanceur Ariane. Actes du Colloque International sur la Fiabilite et la Maintainabilite, Paris,
June 19-23
1.59 Hnatek, E. (1973): Epoxy packages increases IC reliability at no extra cost. Electronic
Engineering, February, pp. 66-68
1.60 Hnatek, E. (1977): High-reliability semiconductors: paying more doesn't always payoff.
Electronics, vol. 50, pp. 101-105
1.61 Hoel, P. G. (1962): Introduction to mathematical statistics. John Wiley & Sons, Inc.
1.62 IEC 1709 (1996): Electronic components reliability - Reference - Condition for failure
rates and stress models for conversion
1.63 IEEE-STD 493-1980: Recommended practice for the design of reliable industrial and
commercial power systems
1.64 Information about semiconductor grade moulding compounds. Down Corning Corpora-
tion, Midland, Michigan, 48640 USA
1.65 Jensen, F.; Petersen, N. (1982): Burn-in - an engineering approach to the design and
analysis ofburn-in procedures. John Wiley & Sons, Inc.
1.66 Jensen, F. (1995): Electronic component reliability. John Wiley & Sons, Inc.
1.67 Kohyama, S. (1990): Very high speed MOS devices. Oxford Science Publications
1.68 Kulhanec, A. (1980): Kriterien fur die Konfiguration eines Burn-in Systems. Elektronik
Produktion & Priiftechnik, February, pp. 11-14
1.69 La fiabilite des grands systemes electroniques et Ie contr61e d'entree. Bulletin SAQ (Swit-
zerland), vol.9 (1975), pp. 9-10
1.70 Locks, M. O. (1973): Reliability, maintainability & availability assessment. Hayden Book
Co., Inc. Rochelle Park, New Jersey
1.71 Lukis, L. W. F.: Reliability assessment - myths and misuse of statistics. Microelectronics
and Reliability vol. 11, no. 11, pp. 177-184
1.72 Mader, R.; Meyer, K.-D. (1974): Zuverlassigkeit diskreter passiver Bauelemente. In:
ZuverIassigkeit elektronischer Bauelemente. VEB Deutscher Verlag fur Grundstoff-
industrie, pp. 93-105
1.73 Masing, W. (1974): Qualitatslehre. DGQ 19, Beuth Verlag, Berlin
1.74 Merz, H. (1980): Sichedrung der Materialqualitat. Verlag Technische Rundschau, Bern
1.75 Messerschrnitt-Bolkow-Blohrn (1986): Technische Zuverlassigkeit. 3rd Edition, Springer
Verlag, Berlin
1.76 MIL-HDBK-217 (1991): Reliability prediction of electronic equipment. Edition F
1.77 MIL-HDBK-338: Electronic reliability design handbook; vol. I (1988); vol. II (1984)
1. 78 MIL-S-19500, General specification for semiconductor devices. U. S. Department of
Defense, Washington D. C.
1.79 Mood, A.; Graybill, F. A. (1963): Introduction to the theory of statistics. McGraw-Hill
Co.
1.80 Myers, D. K.: (1978): What happens to semiconductors in a nuclear environment? Elec-
tronics, 16th March, pp. 131-133
1.81 NASA CR-1126-1129 (1968): Practical reliability; vol. 1 to 4
1.82 NTT (1985): Standard reliability tables for semiconductor devices, Nippon Telegraph and
Telephone Corporation, Tokyo
40 1 Introduction
1.83 Novak, V.; Kadlec, J. (1972): Thennische Ubertragung in integrierten Schaltungen. Fern-
meldetechnikvoI.12,no.3,pp.ll7-118
1.84 O'Connor, N. (1991): Practical reliability engineering. 3rd edn., John Wiley & Sons, Inc.
1.85 O'Connor, P. D. T. (1993): Quality and reliability: illusions and realities. Quality and
Reliability Engineering Internat., vol. 9, pp. 163-168
1.86 Ott, W. H. (1988): Noise reduction techniques in electronic systems. 2nd edn. l Wiley &
Sons, Inc.
1.87 Pecht, M. (1994): Reliability predictions: their use and misuse. Proc. Annual Reliability
and Maintainability Symp., pp. 386-387
1.88 Pecht, M. G.; Palmer, M. and Naft, J. (1987): Thennal reliability management in PCB
design. Proc. Annual Reliab. and Maintainability Symp., pp. 312-315
1.89 Pecht, M. G. (1994): Integrated circuit, hybrid, and multichip module package design
guidelines. John Wiley & Sons, Inc.
1.90 Pecht, M. G. (1994): Quality confonnance and qualification of microelectronic package
and interconnects. John Wiley & Sons, Inc.
1.91 Pecht, M. G. (1995): Plastic encapsulation of microcircuits. John Wiley & Sons, Inc.
1.92 Peck, D. S.; Trapp, O. D. (1978): Accelerated testing book. Technology Associates, Por-
tola Valey, California
1.93 Pollino, E. (1989): Microelectronic reliability. Integrity, assessment and assurance. Tome
II, Artech House, London
1.94 Polovko, A. M. (1968): Fundamentals of reliability theory. Academic Press, New York
1.95 Prasad, R. P. (1989): Surface mounted technology. Van Nostrand Reinhold
1.96 Robach, Ch. (1978): Le test en production. Conception des systemes logiques tolerant les
pannes. Grenoble, February
1.97 Robineau, l et al. (1992): Reliability approach in automotive electronics. Int. Conf. ES-
REF, pp. 133-140
1.98 Rooney, J. P. (1989): Storage reliability. Proc. Annual Reliability and Maintainability
Symp., pp. 178-182
1.99 Rubinstein, E. (1977): Independent test labs: Caveat Emptor. IEEE Spectrum, vol. 14,
no.6,pp.44-50
1.100 Schaefer, E. (1980): Bum-in: Was ist das? Qualitlit und Zuverliissigkeit, vol. 25, no. 10,
pp.296-304
1.101 Schmidt-Briicken, H. (1961): Die Zuverliissigkeit sich verbrauchender Bauelemente. NTF
vol. 24,pp. 188-204
1.102 Schwartz, Ph. (1981): Le bum-in: une garantie de la fiabilite des circuits integres. EI
(France) no. 16,pp. 57-62
1.103 Shooman, M. L. (1968): Probabilistic reliability. An engineering approach. McGraw-Hill
Book Co., New York
1.104 Siewiorek, D. P. (1991): Architecture offault-tolerant computers, an historical perspective.
Proc.IEEE, vol. 79,no. 12,pp. 1710-1734
1.105 Silberhorn, A. (1980): Aussere, einschrlinkende Einfliisse auf den Einsatz von VLSI-
Bausteinen. Bulletin SEVNSE vol. 71, no. 2, pp. 54-56
1.106 Stonner, H. (1983): Mathematische Theorie der Zuverliissigkeit. Oldenbourg Verlag,
Munich
1.107 Suich, R. C.; Patterson, R. L. (1993): Minimize system cost by choosing optimal subsys-
tem reliability and redundancy. Proc. Annual Reliability and Maintainability Symp., pp.
293-297
1.108 Traon, Le, l-Y; Treheux, M. (1977): L'environnement des materiels de telecom-
munications. L'echo des recherches, October, pp. 12-21
1.109 Tretter, J. (1974): Zum Driftverhlaten von Bauelementen und Geriiten. Qualitiit und Zu-
verliissigkeit (Gennany), vol. 19, no 4, pp. 73-79
1.110 Villemeur, A. (1993): Surete de fonctionnement des systemes industriels. 2nd Edition,
Eyrolles, Paris
I.ll1 Williams, S. D. G. (1980): Fault tree analysis. In: Arsenault, J. E.; Roberts, J. A. (eds.):
Reliability and maintainbility of electronic systems. Computer Science Press
1 Introduction 41
1.112 Wong, K. L. (1990): What is wrong with the existing reliability methods'? Quality and
Reliability Engineering Internat., vol. 6, pp. 251-258
1.113 Denson, W. K.; Keene Jr., S. J. (1998): A new reliability-prediction tool. Proceedings of
the Annual Reliability and Maintainability Symp., January 19-22, Anaheim, California
(USA), pp.15-22
1.114 Lin, D. L.; Welsher, T. L. (1998): Prediction of product failure rate due to event-related
failure mechanisms. Proceedings of the Annual Reliability and Maintainability Symp.,
January 19-22, Anaheim, California (USA), pp. 339--344
1.115 De Mari, A. (1968): An accurate numerical steady-state one-dimensional solution of the
pnjunction. Solid-St. Electron., vol. 11, pp. 33-39
1.116 Frohman-Bentchkowski, D.; Grove, A. S. (1969): Conductance of MOS transistors in
saturation. IEEE Trans. Electron. Dev., vol. 16, pp. 108-116
1.117 Sincell, J.; Perez, R. J.; Noone, P. J.; Oberhettinger, D. (1998): Redundancy verifiaction
analysis: an alternative to FMEA for low-cost missions. Proceedings of the Annual Reli-
ability and Maintainability Symp., January 19-22, Anaheim, California (USA), pp. 54-60
1.118 Grove, A. S.; Deal, B. E.; Snow, E. H.; Sah, C. T. (1965): Investigation of thermally
oxidized silicon surfaces using MOS structures. Solid-State Electron., vol. 8, pp. 145-165
1.119 Hauser, 1. J. R.; Littlejohn, M. A.(1968): Approximations for accumulation and inversion
space-charge layers in semiconductors. Solid-St. Electron., vol. 11, pp. 667-674
1.120 Leistiko, 0.; Grove, A. S.; Sah, C. T. (1965): Electron and hole mobility in inversion
layers on thermally oxidized silicon surfaces. IEEE Trans. Electron Dev., vol. 12, pp.
248-255
1.121 Hoffman, D. R. (1998): An overview of concurrent engineering. Proceedings of the An-
nual Reliability and Maintainability Symp., January 19-22, Anaheim, California (USA),
pp.I-7
1.122 Onodera, K. (1997): Effective techniques ofFMEA at each life-cycle stage. Proceedings of
the Annual Reliability and Maintainability Symp., January 13-16, Philadelphia, Pennsyl-
vania (USA), pp. 50--56
1.123 Gulati, R.; Dugan, J. B. (1997): A modulat approach for analyzing static & dynamic fault-
trees. Proceedings of the Annual Reliability and Maintainability Symp., January 13-16,
Philadelphia, Pennsylvania (USA), pp. 57--63
1.124 Price, C. 1.; Taylor, N. S. (1998): FMEA for multiple failures. Proceedings of the Annual
Reliability and Maintainability Symp., January 19-22, Anaheim, California (USA), pp.
43-47
1.125 Bowles, J. B. (1998): The new SAE FMEA standard. Proceedings of the Annual Reliabil-
ity and Maintainability Symp., January 19-22, Anaheim, California (USA), pp. 48-53
1.126 Upadhayayula, K.; Dasgupta, A. (1998): Guidelines for physics-of-failure based acceler-
ated stress testing. Proceedings of the Annual Reliability and Maintainability Symp.,
January 19-22, Anaheim, California (USA), pp. 345-364
1.127 Klyatis, L. M. (1997): One strategy of accelerated-testing technique. Proceedings of the
Annual Reliability and Maintainability Symp., January 13-16, Philadelphia, Pennsylvania
(USA), pp. 249-253
1.128 Epstein, G. (1998): Tailoring ESS startegies for effectiveness & efficiency. Proceedings of
the Annual Reliability and Maintainability Symp., January 19-22, Anaheim, California
(USA), pp. 37-42
1.129 Zimmer, W. J.; Keats, J. B.; Prairie, R. P. (1998): Characterization of non-monotone
hazard rates. Proceedings of the Annual Reliability and Maintainability Symp., January
19-22, Anaheim, California (USA), pp. 176--181
1.130 Zimmerman, P. (1997): Concurrent engineering approach to the development of the
TM6000. Proceedings of the Annual Reliability and Maintainability Symp., January 13-
16, Philadelphia, Pennsylvania (USA), pp. 13-17
1.131 Dugan, J. B.; Venkataraman, R. G. (1997): DIFtree: a software package for analyzing
dynamic fault-tree models. Proceedings of the Annual Reliability and Maintainability
Symp., January 13-16, Philadelphia, Pennsylvania (USA), pp. 64-70
42 1 Introduction
1.132 Anand, A.; Somani, A. K. (1998): Hierarchical analysis of fault trees with dependencies,
using decomposition. Proceedings of the Annual Reliability and Maintainability Symp.,
January 19-22, Anaheim, California (USA), pp. 69-75
1.133 Kocza, G.; Bossche, A. (1997): Automatic fault-tree synthesis and real-time tree trimming,
based on computer models. Proceedings of the Annual Reliability and Maintainability
Symp., January 13-16, Philadelphia, Pennsylvania (USA), pp. 71-75
2 State of the art in the reliability
of electronic components
These changes determine a new attitude toward the reliability field expressed by
the approaches in the main domains concerning the reliability of semiconductor
devices, domains listed in Table 2.2.
Further on, the new trends in each of these domains (cultural features, reliability
building, reliability evaluation, and standardisation) will be identified.
2.1
Cultural features
Firstly, the basic approach describing the new wave in reliability and the cultural
features of the present period will be presented.
2.1.1
Quality and reliability assurance
Quality assurance means all the organisational and technical activities assuring the
quality of design and manufacturing of a product, taking also into account eco-
nomical constraints.
Traditionally, quality assurance performs the assurance function through inspec-
tion and sorting operations. By using this strategy, one assumes that large amounts
of nonconforming material are allowed. Consequently, the quality assurance
department assumes a police role, guarding against the nonconforming material.
The new quality assurance function, rested on prevention, by eliminating the
sources of nonconforming material, arises in the early 70's.
The nonconforming material has two major causes: inadequate understanding of
the requirements and unsatisfactory processes. The quality assurance team must
determine, analyse and disseminate the requirements, both at the manufacturer and
2 State of the art in reliability 45
The reliability problems found during the field use phase can also be taken into
account for corrective and preventive actions. A very reliable link must be created
between all the teams involved in the quality and reliability assurance.
It is important that a reliability assurance program contains the following
elements:
• A set of strategic and tactical objectives.
• A reliability program with objectives for different organisational segments.
• Measurement process of the global system, which is complementary to the reli-
ability measurements performed by each organisational segment (design, manu-
facturing etc.).
• A very strong feedback process based on corrective and preventive actions.
The system for quality and reliability assurance must be described in an appropriate
handbook supported by the company management. Anyway, the reliability team
must depend only to the quality assurance manager (Fig. 2.3). Further details about
this subject are given in [2.15].
46 2 State of the art in reliability
~~
Designing Sales
Material QUALITY
+------(
purchasing ASSURANCE Service
Planning Sub-Suppliers
Management
Fig. 2.2 Information flow between the quality assurance department and others departments
Fig. 2.3 An example of the structure for quality and reliability activity
2.1.2
Total quality management (TQM)
At the end of the 80's, a new approach, called total quality management (TQM),
was introduced. The definition given in August 1988 by the Department of Defence
of USA for it and reported by Yates and Johnson [2.74] considers TQM an appli-
cation of the management for the involved methods and human resources in the
purpose to control all process and to achieve a continuous improvement of the
quality. This is the so-called total quality approach. TQM demands teamwork,
commitment, motivation and professional discipline. It relies on people and in-
volves everyone. In fact, as Birolini [2.15] said, TQM is a refinement to the concept
of quality assurance. TQM is based on four principles, presented in Table 2.3.
The relationship between the customer and the manufacturer changes its content.
A real partnership is created (see Fig. 2.4) but this change must occur also at the
level of the other relationships, inside the company:
2 State of the art in reliability 47
Principles Explanations
i
Customer satisfaction Total quality means satisfaction of the d~
needs and expectations of a customer I
Plan - do - check - act Known also as the Deming circle: Plan what to
do - do it - check the results - act to prevent
further error or to improve the process
Fig. 2.4 The relationship between supplier and customer in a total quality system
Recently, a new tendency appears, trying to replace the term TQM with other
terms, such as CI = constant improvement and TQL = total quality leadership
[2.25].
48 2 State of the art in reliability
2.1.3
Building-in reliability (BIR)
control of the input parameters, the reliability must be tested and monitored on the
manufacturing flow l .
Element Details
I
Proact rather than react Identify and eliminate or control the cause~
reduced reliability rather than test for and I
react to the problem I
Control the input parameters Control the input parameters of the process rather -I
than test the results of the process
I
:
Integrate the reliability Integrate the reliability driven considerations into I
\
all phases of manufacturing
·----l
I
Asses the reliability Asses the reliability of the product on the basis of!
a documented control of critical input parameter I
!
and of the reliability driven rules ,
i
2.1.4
Concurrent engineering (CE)
Robust design
I The BIR focus is on uncovering and understanding causes for reduced reliability and on finding
ways to eliminate or control them. In doing so, the approach offers not only new measures for
product reliability, but also a methodology for attaining ever-greater product reliability.
50 2 State of the art in reliability
cause the developers, from the outset, to consider all elements of the product life
cycle from conception through disposal, including quality, cost, schedule and user
requirements (MIL-HDBK-59, Dec. 1988).
As Hoffman [2.41] points out, CE must include business requirements, human
variables and technical variables. All these elements are presented in Fig. 2.5 and
must be taken into account starting with the design phase. The design team contains
specialists from various fields, such as: designing, manufacturing, testing, control,
quality, reliability, service, working in parallel. (In fact, another name for CE is
parallel engineering). Each specialist works part time in a project and he is involved
at each phase of the developing process. A synergy of the whole team must be
realised: the final result overreaches the sum of the individual possibilities.
With CE, the number of iteration to a project is diminished and the time period
required to obtain a new product is shortened. An important change in the mentality
must be performed: from toss it over the wall to a synergetic team. A strong
supporter ofCE is DoD, which encourages its contractors to lead the way.
2.1.5
Acquisition reform
In June 1994, the Department of Defence (DoD) of USA abolished the use of mili-
tary specifications and standards in favour of performance specifications and com-
mercial standards in DoD acquisitions [2.25]. Consequently, in October 1996,
MIL-Q-9858, Quality Program Requirements, and MIL-I-45208 A, Inspection
System Requirement, were cancelled without re-placement. More over, contractors
will have to propose their own methods for quality assurance, when appropriate. It
is likely that ISO 9000 will become the de facto quality system standard. The DoD
policy allows the use of military handbooks only for guidance. Many professional
organisations (e.g. IEEE Reliability Society) are attempting to produce commercial
reliability documents to replace the vanishing military standards [2.35]. Besides
them, there are a number of international standards produced by IEC TC-56, some
NATO documents, British documents and Canadian documents. In addition to the
new standardisation activities, Rome Laboratory (USA) is also undertaking a num-
ber of research to help implement acquisition reform. However, there are voices,
such as Demko [2.32], considering that a logistic and reliability disaster is possible,
because commercial parts, standards and practice may not meet military require-
ments. For this purpose, lIT Research Institute of Rome (USA) developed, in June
1997, SELECT, as a tool that allows to the users to quantify the reliability of com-
mercial off-the-shelf (COTS) equipment in severe environment [2.53]. Also, begin-
ning with April 1994, a new organism, called GIQLP (Government and Industry
Quality Liaison Panel), made up of government agencies, industry associations and
professional societies, is intimately involved in the vast changes being made in the
government acquisition process [2.63].
A great effort was made for reliability evaluation of Plastic Encapsulated
Microcircuits (PEM), which are considered typically commercial devices. The
current use of these devices is an example of reliability engineering responding to
both technology trends and customer policy [2.24]. The acquisition reform policy
2 State of the art in reliability 51
encouraged U. S. Military to use PEM over other packages. On the technical side,
users of PEM are employing Highly Accelerated Stress Testing (HAST) and
acoustic microscopy to screen out flawed devices. While the reliability of PEM is
constantly improving, the variability between suppliers remains a problem. More
details are given in chapter 12.
2.2
Reliability building
The reliability is built at the design phase and during the manufacturing. This
means that reliability concerns must be taken into account both at the design of the
process/product (the so-called design for reliability) and also at the manufacturing
(process reliability). A special attention must be given to the last step of the manu-
facturing process, the screening (or burn-in).
The component reliability is influenced by the materials, the concept and the
manufacture process, but strongly depends on the taking over input control
conditions, so not only the component manufacturer, but the equipment
manufacturer too must contribute greatly to the reliability growth of the equipment.
If the failure rate is constant during the operation period, this is a consequence of a
good component selection during the manufacturing process. But there are, also,
components that frequently fail, without a previous observation of a wearout effect.
The early failures - usually produced as a consequence of an inadequate
manufacturing process - must be avoided from the beginning, in the interest as
much of the manufacturer, as of the user. Unfortunately, this wish is not always
feasible; before all because physical and chemical phenomena with unknown action
can produce hidden errors which appear as early failures.
2.2.1
Design for reliability
This new concept is an important step in the implement of the cultural changes,
being linked with the Concurrent Engineering. First, the customer voice is to be
considered in the design, being translated in an engineering function [2.49]. Then,
the design must be immune to the action of perturbing factors, and this can be done
with the so-called Taguchi methods. This means: (i) to develop a metric capturing
the function while anticipating possible deviations downstream and (ii) to design a
product that ensures the stability of the metric in the presence of deviation. Finally,
the design team must use reliable prediction methods. In principle, the design for
reliability means to pass from evaluate and repair to anticipate and design. An im-
portant contribution to the development of the design for reliability was given by
the special issue on this subject of IEEE Transactions on Reliability, June 1995,
with papers covering the various aspects of the subject. Taguchi [2.65] talked about
developing a stable technology by taking into account not only the predictible
variations in manufacturing and operation, but also the unknown or unproved.
Other papers treated the logic-synthesis to handle electro migration and hot-carrier
52 2 State of the art in reliability
2.2.2
Process reliability
2.2.2.1
Technological synergies
The particle contamination is a good example for the technological synergies, with
two effects inducing failure risks for the future device:
• The physical effect: the particles mask an area of the chip, hindering the deliber-
ate impurity doping process or producing the breakdown of the processed layer.
• The chemical effect: the particle-contaminant diffuses into the crystal, producing
electrical effects, such as soft I-V characteristics or premature breakdown; the
electrical effect may appear later, after the contaminant has migrated into the
active area, during the device functioning.
For the physical effect, a failure risk synergy is obvious at the subsequent
manufacturing steps:
• at photolithography, the dust particles reaching the transparent area of the masks
transfer their images on the wafer.
2 State of the art in reliability 53
Static charge
Wafer handling
Men
Masks
Equipment
o 10 20 30 40
In the 70's, the use of new test structures for process monitoring was initiated.
By stressing these reliability test structures (used earlier in the process and sensitive
to specific failure mechanisms) more accurate information about the reliability of
the devices would be obtained and in a shorter time than using traditional methods.
Because test structures are used, the extrapolation of the results to the device level
must be cautious. From 1982, the Technology Associates initiated annually the
wafer level reliability (WLR) workshop, were the concept WLR was developed.
Tools allowing to investigate the reliability risks at the wafer level and to monitor
the process factors affecting the reliability were created. In a more general sense,
WLR problems are included in the process reliability concept. Hansen [2.40]
determined with a Monte Carlo simulation model the effectiveness of estimating the
wafer quality, in particular in terms of wafer yield. Reliability predictions can be
obtained from wafer test-chip measurements.
Details about the process reliability for particular types of electronic compo-
nents will be given in chapters 3 to 10.
2.2.3
Screening and burn-in
To better understand the role of screening tests for the reliability estimation, it
will be given an example concerning the failure causes. Assume that a printed
circuit board (PCB) has 60 integrated circuits (ICs), and the probability of failure
for an IC is 2%; it is considered that all the ICs are statistical independent. It results
that the probability to find at least one defect IC is 1 - 0.9860 = 0.7. Some reasons
can lead to component failures. For example, if the components are very old, or if
they are overloaded. In these cases, the screening tests have no sense. Other defects
result from the intrinsic weaknesses of the components. These weaknesses are
surely unavoidable and - for well defined limits - are accepted even by the
manufacturer. With the aid of electrical tests and/or operating tests (during the
fabrication or before the delivering) these components with defects can be identified
and eliminated. Nevertheless it remains a small percentage 2 of components with
hidden defects, which - although still operational - have a low reliability and
influence negatively the reliability of the components batch. The role of the
screening tests is to identify the components partially unreliable, with defects that
do not lead immediately to non-operation. For each lot, the time dependence of
A has the form - already presented - of the bathtub failure curve (Chap. 1). From
this point of view, the screening tests signify:
2 It is considered that the early failures vary between 1% and 3% for SSUMSI ICs, and
respectively between 4% and 8% for LSI ICs [2.2(1996)]. The defective probability of a PCB
with about 500 components and 3000 solder joints can have the following average values [2.15]:
1-3% defective PCBs (113 assembling, 113 defective components, 113 components out of
tolerance) and 1.2 to 1.5 defects per defective PCB.
56 2 State of the art in reliability
or functional control (performed 100%), with the aim to eliminate the defect items,
the marginal items or the items that will probably have early failures (potentially
unreliable items).
By deftnition, an accelerate test is a trial during which the stress levels applied to
the components are superior to these foreseen for operational level; this stress is
applied with the aim to shorten the necessary time for the observation of the
component behaviour at stress.
The accelerated lifetesting is used to obtain information on the component
lifetime distribution (or a particular component reliability parameter) in a timely
manner. To do this, a deep knowledge of the failure mechanisms - essential in all
reliability evaluations - is needed. In the practice, the thermal test alone is not
sufftcient for the reliability evaluation of a product; it is necessary to perform other
stress tests too (supposing that the stress is not "memorised", and consequently the
wearout does not exist).
The accelerated thermal test has an important disadvantage: there is a great
probability that the stress levels create failure mechanisms, which don't appear
usually in the normal operation conditions. On the other hand, it is true that for the
comparative evaluation of different component series this disadvantage doesn't
exist. At any rate, the accelerated thermal test is not a panacea to economise the
time or for the elaboration of economic tests concerning the lifetesting and the
behaviour of electronic components.
The goal of screening tests can be realised in two ways: (a) the utilisation of the
maximum allowed load, since the components predestined to fail in the early failure
period are very sensitive at overloading; (b) the utilisation of several efftcient
physical selection methods which can give information concerning any potential
weaknesses of the components (noise, non-linearity, etc.). In general, it can be said
that all selection tests and practical methods are described in MIL-STD 883. The
methods described in this handbook are too expensive for the usual industrial
purposes. It has been proved that the combination of different stresses to produce
the early failures of the elements, followed by a 100% electrical test, is optimal and
efftcient, especially if the costs must be taken into account. To establish the optimal
stresses (their sequence and stretching) is a delicate problem, while the failures
depend on the integration degree, on the technology and on the manufacturing
methods. In the following the most important tests groups and their shortcomings
will be mentioned, without discussing the mechanical tests (acceleration, shocks,
vibrations).
2.2.3.1
Burn-in
The bum-in method (no. 1015.2 ofMIL-STD 883D) belongs to the first test cate-
gory. Its goal is to detect latent flaws or defects that have a high probability to
come out as infant mortality failures under fteld conditions. Although the major
defects may be found and eliminated in the quality and reliability assurance de-
partment of the manufacturer, some defects remain latent and may develop into
infant mortality failures over a reasonably short period of operation time (typically
comprised between some days and a few thousand hours). It is not so simple to find
2 State of the art in reliability 57
the optimum load conditions and bum-in duration), so that nearly all potential in-
fant mortality components are eliminated. There must be a substantial difference in
the lifetime of the infant mortality population and the lifetime of the main (or long
term) wearout population under the operating and environmental conditions applied
in bum-in [2.42]. The situation may differ depending on the today's components, on
the new technologies, on the custom-designed circuits. The trend is towards moni-
tored bum-in [2.59]. The temperature should be high, without to exceeding
+ 150°C, for the semiconductor crystal.
A clear distinction must be made between test and treatment. A test is a
sequence of operations for determining the manner in which a component is
functioning and also a trial with previously formulated questions, without expecting
a detailed response. That is why the test time is short and the processing of the
results is immediately made. It is an attributive trial, which gives us information
about the type goodlbad. As a treatment, the bum-in must eliminate the early
failures, delivering to the client the rest of the bath-tub failure curve. We
distinguish three types ofbum-in:
• Static bum-in: temperature stresses and electrical voltages are applied; all the
component outputs are connected through resistors too high or too low.
• Dynamic bum-in: temperature stresses and dynamic operation of components (or
groups of components).
• Power bum-in: operation at maximum load and at different ambient temperature
(0 ... +150°C), also the function test under the foreseen limits of the data sheet for
+25°C.
It is often difficult to decide when static or a dynamic bum-in is more effective.
Should surface, oxide and metallisation problems be dominant, a static bum-in is
better; a dynamic bum-in activates practically all failure mechanisms. That is why
the choice must be made on the basis of practical results.
The static bum-in is used as control selection, by the manufacturers, and by the
users. Usually, according to MIL-STD 883D, a temperature of+125°C during 168
hours is applied. From all the six basic tests specified by the method 1015.2, the
methods A and D are the most utilised (min. 168 h at the specified temperature).
The condition A foresees a static bum-in (only the supply voltages are present, so
that the many junctions can be biased). This type is applied particularly if utilised
together with cooling, to bring forward the surface defects. The condition D is
frequently utilised for integrated circuits. The clock signal is active during the
whole bum-in period and exposes all the junctions as much to the direct voltages,
as to the inverse voltages. All outputs are loaded to the maximum allowed value.
The direction in which the bias is applied will influence the power dissipation and
consequently the junction temperature of the device. However, in complex devices
there is very little distinction between stresses resulting from the two biasing
methods since it becomes increasingly difficult to implement a clear-cut version of
either option.
) Any application of a load over any length of time will use up component lifetime; there can
easily be situations where bum-in can use up an unacceptable portion of the main population
lifetime.
58 2 State of the art in reliability
The static burn-in is particularly adequate for the selection of great quantities of
products, and is simultaneously an economic proceeding. The distribution is
dominated by the surface-, oxide-, and metallisation-defect categories, resulted
from some type of contamination or corrosion mechanism 4•
The continuously growing number of LSI and VLSI ICs (memories, micropro-
cessors) has essentially contributed in the last time in disseminating the dynamic
burn-in, while the load can be easily regulated, the tests can be programmed and
continuously supervised, memorised, and the tests results can be automatically and
statistically processed. The selection temperature usually varies between +100°C
and +150°C. Beyond a certain duration (comprised normally between 48 and 240
hours, depending on component and selection parameters), no more failure
diminishing occurs. The applied burn-in voltage depends also on duration; so, for
example, the same result can be obtained with the applied nominal voltage after 96
hours, or - with a superior applied voltage - after only 24 hours. But - as in the
case of temperature - the limit values must not be exceeded.
Another parameter for dynamic burn-in is the resolution that determines the
maximum frequency of the stimuli sent to the components (for example, in the case
of ICs, a resolution of lOOns corresponds to a frequency of 10 MHz). The best
solution is to reach the vicinity of the effective operation frequency of the
component.
MIL-STD 883 specifies clearly defined methods: class B (168 hJ125°C), class S,
for high reliability and special applications (240 h), etc., without any mention of the
particular manufacturers methods or the methods of ICs users. Table 2.5 shows the
screening sequence according to MIL-STD-883, ICs class B quality.
4 Other defects include wirebond problems resulting from intermetallic formation and oxide
breakdown anomalies. Dynamic operation results in higher power dissipation, current densities
and chip temperature that the static bum-in configuration.
2 State of the art in reliability 59
2.2.3.2
Economic aspects of burn-in
Is it often asked if one may replace the component burn-in with an equipped PCBs
burn-in. The answer is negative and this for three essential reasons:
• the most equipped PCBs can't be exposed or operated at high temperatures;
• the hunting out of the early failures should be made through a repair and renewal
process, waiting the failures to appear;
• to a reduced temperature, the acceleration time can't be extended to cover the
early failure period; by testing the equipped PCBs, the component itself can't be
tested in accordance with the complete data sheet specification.
Consequently the components burn-in is the key of component reliability pro-
blems. The burn-in at the system level is recommended as a first step for burn-in
optimisation; analysing the defects that appeared at this level, the utility of burn-in
for certain components can be better exploited. In fact, in most cases, the optimal
solution consists in a burn-in combination at components level and at system level.
Although complementary, the equipped PCBs level is seldom utilised.
Theoretically, presuming that the environmental and selection conditions are
unchanged, a burn-in at system level must be optimised in relation with the
reliability and in relation with the costs. In the first case, the situation has some
ambiguities, while it is virtually impossible with a burn-in to eliminate all the weak
components. On the contrary, if you wish the batch5 to contain, after the burn-in,
only I % of the potential failures, it is possible to determine the optimal duration
with the aid of a combination of analytical and graphical methods [2.42].
Concerning the burn-in optimisation costs, we can distinguish the following
parameters:
Cr - the total costs, in cost/equipment units;
C) - constant cost that can be expressed as units of costs per system (or units of
costs per equipment), independently of the burn-in duration and on the number of
failures recorded in this period (for example the burn-in installation and taking
down costs);
C2 - costs that appear each time the equipment fails;
C3 - costs depending on time, such as a) costs/equipment/day of ovens; b) costs
due to the delay of total production, for the number of days in which the systems
are submitted to burn-in; c) tests and failure controls costs (failure monitoring
costs);
C4 - costs/failure/equipment for the systems under guarantee (repair cost by the
clients);
Np - number of failures during the burn-in period;
Nb - number of failures after burn-in, during the guarantee period;
n - duration (number of days) of the equipment burn-in.
5 The assumptions of a good selection [2.2(1983)] are: (i) homogenous batches; (ii) accelerated
ageing eliminates the early failures; (iii) accelerated ageing eliminates, also, the components
which normally should not fail during the first years of operation.
60 2 State of the art in reliability
Cost/equipment
Fig. 2.7 Typical curves for the difference Cr - Cs. The curve A shows a situation where bum-in
does not pay-off, i. e. total costs using bum-in is always greater than the costs without bum-in,
irrespective of the bum-in period; the curve B demonstrates that a bum-in lasting about two days
(48 h) gives the maximum economic benefit. [2.42]
It can be seen that the value of total costs Cr have a linear dependency on the
number of days in which the equipment are on burn-in, while the value of total
guarantee costs without burn-in Cs is a constant. If the difference C r - Cs is
calculated utilising n as an independent variable, one obtains the curves plotted in
Fig. 2.7, corresponding to two different equipments.
For the curve A, the problem is to know if the awaited number of failures
(without burn-in) is acceptable. If the response of manufacture fIrm is negative,
burn-in must be introduced at the system level, with duration of 2-3 days, as being
more effIcient. Certainly, it must be evaluated the number of failures awaited after
this burn-in period, and during the guarantee period.
Any burn-in policy must be closely evaluated for each specifIc product leaving
the company.
2.2.3.3
Other screening tests
etc.). Usually, the tested components (the ICs are placed, pins down, on a metal
tray in the oven) remain during 24 hours at the temperature of +IS0°C (for an IC,
this temperature is much greater than the maximum allowed limit in operation).
The third group of tests is formed by the thermal cycles (method 1010.2, MIL-
STD 883D). This is a process that causes mechanical stresses, while the compo-
nents are alternatively exposed to very high and very low temperatures. This
explains why the method can easily emphasise the potential defects of each tested
entity (capsule, marking, semiconductor surface, contact wires, structure soldering
defects, structure cracks). Thermal cycles are performed air-to-air in a two-
chamber oven (transfer from high to low temperature chamber, and vice versa,
using a lift). The non-biased ICs are placed on a metal tray (pins on the tray to
avoid thermal voltage stress) and exposed to at least 10 thermal cycles (at the
temperature range -6SoC ... +IS0°C), but 20 cycles are often used.
A typical cycle consists in a dwelling time at extreme temperatures (~l 0 mi-
nutes), with a transfer time inferior to one minute. Should solderability be a
problem, an N2-protective atmosphere can be used. Normally, after the thermal
cycles a stabilisation at high temperature is made, with the aim to better localise the
defects.
The thermal shock belongs to the fourth group of methods (MIL-STD 883D-no.
1011.2). It is utilised to test the integrity of the connection wires (with important
dilatation coefficients, positive and negative). This method is similarly to the
thermal cycles, but is much harder, since the thermal transfer medium is not air, but
a transfer fluid able to produce the shock. The extreme temperatures must be
selected with care, because the thermal shock can destroy much constructive
elements, e.g. ceramic packages of ICs. We recommend not to exceed the extreme
temperatures of O°C and +100°C. Even for these limits, the manufacturer must be
consulted.
The seal test (fine leak and gross leak) is performed to check the seal integrity of
the cavity around the chip in hermetically packaged ICs. For the fine leak, the ICs
are placed in a vacuum (lh at O.S mm Hg) and stored in a helium atmosphere under
pressure (4h at 5atm), then placed \ll1der normal atmospheric conditions, in open air
(30 minutes), and finally a helium detector (required sensitivity 1O-8 atm cm3/s,
depending on the cavity volume) identifies any leakage.
For gross leak, the ICs are placed in a vacuum (1 hour at 5 mm Hg) and the in a
2 hours storage under 5atm in fluorocarbon Fc-n. After a short exposure (2
minutes) in open air, the ICs are immersed in a FC-40 indicator bath at 125°C
where the hermeticity is tested; the presence of a continuous stream of small
bubbles or two large bubbles from the same place within 30 seconds indicates a
defect.
2.2.3.4
Monitoring the screening
Consequently, the remainder of the lot has a better reliability. This is the ideal
case [2.15][2.2(1996)]. However, reports on components damaged after screening
were often made. There are two sources for such an unlucky event: (i) the screening
sequence contains destructive tests; (ii) the electrical characterisation does not
succeed in eliminating the weak components.
Design of
screening
sequence
Screened lot
To overcome these problems, recently, a method was proposed [2.11]. The method
was called MOVES, a acronym for Monitoring and Verifying a Screening
sequence. MOVES contains five procedures: VERDECT, LODRlFT, DISCRlM,
POSE and INDRlFT. One can say that, with MOVES, low reliability items moves
away from a lot passed through a screening sequence. In Fig. 2.8, the flow chart of
MOVES is presented. From the designed screening sequence, VERDECT
(VERifying the DEstructive Character of a Test) identifies the destructive tests.
These category of tests must be substituted at the design review by non-destructive
tests activating the same failure mechanisms (e. g. thermal cycles may replace
thermal shocks). Then, the screening sequence is performed for all the N
components and the failed items are withdrawn. For the remainder of the lot,
LODRlFT (Lot DRlFT) can say if the drift of the lot _. described by the mean of
each main electrical parameters - reaches the failure limit, during the lifetime. If it
is so, the lot must be rejected.
If the answer is negative, the behaviour of individual items is to be investigated.
DISCRlM sets apart by optimal discrimination and eliminates the items which do
not follow the whole lot tendency, POSE (POSition of the Elements) identifies the
components which change their position in the parameter distribution for each
measuring moment and INDRlFT (INDividual DRlFT) analyses the individual drift
for the main parameters of each component. Eventually, the failed items (nf) are
eliminated and for the remainder of the lot (N-nf) a higher reliability is obtained.
An improvement of the POSE method, by using fuzzy logic was recently
proposed [2.77].
Basically, with POSE, the electrical parameter drift of each item during the
screening sequence, after each screening test, is carefuly analysed. For each
electrical parameter of the device, the value range is divided into five zones. The
position of the parameter value is noted at the beginning of the screening sequence
and then, identified after each screening step. With an appropriate rule, the
movement of a parameter from a zone to another may be linked to the reliability of
the device. But the analysis is difficult to perform. The fuzzy logic may be useful in
this respect, and, in the following, a method allowing to properly select (and to
remove) the items which might fail in the future is presented.
Il,(x)
The "mobility" of the parameter value after each screening test is investigated. A
triangle-shaped membership function with 5 regions (called: very small, small,
64 2 State of the art in reliability
medium, high, very high, referring to the mobility value, m, with core values from
0.1 to 0.5) is used (Fig.2.9), given by:
rr
/I = ~.. I I,V, fior x < rand
IX - r,,v.I / \'Ir - r.l I
IX
\- - r·,J
" / Ir
{I
I
- r,J
I,'
fior x> r U (2.3)
where: ro = ri - 0.1; r;,u = ri + 0.1; r] = 0.1 (for very low), to r5 = 0.5 (for very high).
The "movement" of the parameter value from a zone to another is quantified by
the following rules:
• Initially, a "very small" mobility (m) is assigned to each device, with core value
0.1.
• All "jumps" from a zone to the next one is penalised by a doubling of m. This
multiplication factor becomes 3 and 4 for jumps over two or three zones.
• A "jump back" from a next zone does not modify m. If this jump back is longer
than the initial jump (two zones, instead of one), m is doubled. For shorter jumps
back (e.g. one zone, instead of two), m is diminished by 50%.
• If the parameter value remains in the same zone, each time m is diminished by
30%.
• Usually, the final screening test is a burn-in. It seems that the failures arisen at
this test are indicative for the reliability. So, if a jump of one or two zones arises
at this final test, a value of 0.1 or 0.2, respectively is added to m.
• Finally, for the screening sequence the overall mobility (m) is obtained for each
device. If this value is higher than 0.3, the device must be removed, because its
reliability is not high enough. Certainly, for various applications, other removing
limits may be established.
Table 2.6 Selection of the reliable items at screening, for a batch of 15 items (fuzzy method with
5 regions)
The procedure will be detailed for a case study. For a batch of 15 devices
undergoing a screening sequence with 3 tests (temperature cycling, acceleration
and burn-in), an electrical parameter is measured initially (i), and after each test
(Tl, T2, T3). The results are presented in Table 2.6, together with the mobility
values (m) calculated following the rules previously presented. As a conclusion, the
"mobility" being higher than 3, the devices with no. 4, 8, and 11 must be removed.
A new methodology to select an effective burn-in strategy for ICs used in
automotive applications is given by Tang [2.68]. The clue is to analyse failure
mechanisms for different technologies and to use the results together with the IC
family data to determine appropriate burn-in conditions for new ICs. The results
have shown that burn-in is useful for detecting wafer processing defects rather than
packaging defects.
2.3
Reliability evaluation
and thennal stress follows an Arrhenius model. A burn-in perriod (tB) is followed
by a functioning test at accelerated thennal stress (with the duration tA) till the end
of life (tE). On the basis of the parameter measurements, the initial value (PI) and
the values measured at tB (noted with PB) and at tB + tA (noted PA), the model gives
the drift at the end of life (tE), noted with LlpE/ PI. The following relation is
obtained:
iJPd PI = [iJpII PIln(r)] In [r2 + (r-1) (tdtsJ] -
2.3.1
Environmental reliability testing
To find of a correct definition for the environment is the first step in Environmental
Reliability Testing. For this purpose, an international document, namely IEC 721
"Classification of environmental conditions", may be used. The environmental
conditions are codified with three digits: a figure (from 1 to 7) indicating the using
mode, a letter indicating the environmental conditions and again a figure (from I to
6) indicating the severity degree. As an example, the climatic conditions for using
in a fix post unprotected from bad weather (Table 2.7) and for a fix post protected
from bad weather (Table 2.8).
One may notice that for the same severity degree the climatic conditions for
using in a protected post are more severe. For instance, the maximum air
temperature is 40°C for 3K4, and 55°C for 4K4, respectively. Now we have all the
elements for expressing the environment of a device. First, the using type must be
settle, between the seven categories: 1 - storage, 2 - transport, 3 - used in a fix post
unprotected from bad weather, 4 - used in a fix post unprotected from bad weather,
5 - used in a terrestrial vehicle, 6 - used on the see, 7 - used in portable sets. Then,
the environmental conditions are indicated by letters: K - climatic, Z - special
climatic, B - biological, C - chemical active substances, F - contaminant fluids, M
- mechanical. Eventually, the severity degree (from 1 - small, to 6 - high) is
indicated. Some examples: 3Z1 - negligible heat irradiation from the environment,
3S1 - chemical active substances, 3M3 - mechanical conditions of vibrations /
shocks.
2 State of the art in reliability 67
Table 2.7 Climatic conditions for using in fixed post unprotected from bad weather
Table 2.8 Climatic conditions for using in fixed post protected from bad weather
Environmental agent Unity Category
A.70'C / MO'C
10
8
I - Integrated circuits
6 2 - Capacitors
3 - Hybrid circuits
4 4 - Transistors
5 - Connectors
2 6 - Resistors
7- Relays
o 8-Coils
12345678
Fig. 2.10 Failure rates ratios of different component families at environment temperatures of
+40°C and +70°C [2.70]
Concerning the humidity activity, it must be observed that by reaching the dew
point, a water deposition is formed which produce the surface corrosion. More
ionised particles (producing modifications in the isolation resistance, capacities,
wafer dimensions, and water diffusion -leading to the growths of the failure rate of
the components encapsulated in plastics) are contained by the condensed water,
more the corrosion is important.
The air pressure influences the ventilation (heat evacuation) and the air exchange
(sensibility at too rapid variations).
The solar radiation influences the material composition (through photochemical
processes) and leads so to a supplementary heating of the environmental air
(dilatation, mechanical effects, etc.).
2.3.1.1
Synergy of environmental factors
for the reliability. The involved stresses are carefuly analysed. As an example, for
the weapons in storage by U.S Army at Anniston, the temperature varies daily with
maximum 2°C and the humidity is 70% [2.82]. There are, also, storage areas in
tropical or arctic zones. For systems exposed to solar radiation, temperatures higher
than 75°C were measured, with temperature variations exceeding 50°C. For cheking
the component reliability in these situations, studies about the behavior at
temperature cycling were performed (see Section 2.3.1.2). Other stress factors, such
as rain, fog, snow or fungus or bacteria may act and must be investigated. At
transport, the same (temperature cycling, humidity) or specific stress factors
(mechanical shocks, etc.) may arise.
For all these factors, studies about the involved synergies were performed. An
example is given in [2.83], where the behavior at temperature and vibrations of an
electronic equipment protecting the airplane against the sol-air missiles is
investigated. Operational data were collected, due to a complex system (elaborated
by the specialists from Westingouse). This system contains 64 temperature sensors
(AD 590 MF, from Analog Devices) and 24 vibration sensors (PCB Piezotronics
303 A02 Quartz Accelerometers), mounted on two systems ALQ-131 used for the
fight plane F15. The tests were performed between December 1989 and August
1990. The data were processed and laboratory tests were built, based on the
obtained information, for the components with abnormal behavior. Eventually,
corrective actions were used for improving the component reliability. The result
was that during the Gulf W ar (January 1991), the equipment ALQ-131 had a higher
reliability than previously.
Functionning. The essentional difference between the storage and transport
environment and the functioning is the presence of the bias. At first sight, it seems
that the only effect of the electrical factor is an increase of the chip temperature,
folowing the relation:
where ~ is the junction temperature, Ta - the ambient temperature, rth j-a - the
thermal resistance junction-ambient and P d - the dissipated power. If the effect of
the electrical factor means only a temperature increase, than its effect must be the
same as an increase of the ambient temperature. Experimentally, it has been shown
that this hypotesis is not valid. The electrical factor has a thermal effect, but also an
specific electrical effect due to the electrical field or electrical current.
Often, the components have to work in an intermittent regime. In these cases, the
phenomenon limitting the lifetime is the thermal fatigue of the metal contact,
produced by the synergy between the thermal factor (thermal effect of functioning)
and the the mechanical factor, modeled with [2.84]:
(2.6)
where N is the number of functioning cycles, L1ep is the terma-mechanical stress,
given by:
(2.7)
where L is the mlrumum dimension of the contact, L1a is the average of the
dilatation coefficients of the two interfaces, L1 T is the temperature variation and x is
the width of the contact. Experiments about intermittent functioning of rectifier
70 2 State of the art in reliability
2.3.1.2
Temperature cycling
Nm
T0-39
lOs
TO-I8 metal
104 ~ TO-72
• TO-I8 plastic
103 \
102
10
10 102
Fig. 2.11 The median number of temperature cycles producing the failure of 50% of a component
batch (Nm) vs. temperature range (~T)
The components were measured initially and after 50, 100, 200, 400, 500 and 1000
cycles. The failed components were carefully anlaysed and the populations affected
by each failure mechanism were established. For the component encapsulated in
T0-39 case (bipolar RF transistor), a degradation of the chip solder was observed,
produced by different dilatation coefficients of silicon and header, respectively.
2 State of the art in reliability 71
Accelerated level
Nonnallevel
Fig. 2.12 Failure mechanisms at temperature + vibrations. Appearance of the second failure
mechanism after 10 4 temperature cycles
72 2 State of the art in reliability
2.3.1.3
Behavior in a radiation field
The most nocive environment for the semiconductor components is the nuclear one.
In Table 2.10, the sensitiveness in a radiation field is shown for components
manufactured by various technology types.
Various failure mechanisms were investigated. The rapid neutrons produce
current gain degradation and increase of saturation voltage for bipolar transistors,
by creating defects in the crystaline structure. The ionisation radiation generates
photo currents in all PN junctions reversely biased, producing modifications of the
logic states [2.88].
In 1992, a team of researchers from Hitachi elaborated two models for the
evaluation of threshold drift and leakage current increase for CMOS devices
irradiated by 'Y rays (C0 60). They stated that the defects are produced by trapping the
hole charge in MOSFET gate and increasing the state density at the interface
Si/Si0 2•
F or the threshold drift (11VTO), a linear model was proposed, described by:
LlVro(t) = - TC + A log t + IS (2.10)
where TC is the threshold drift generated by the charge trapped in the oxide per
unitary dose, A is a coefficient linked to this phenomenon and IS is the threshold
drift generated by the charge of interface states. So, a synergy of two failure
mechanisms is modeled.
The increase of the leakage current (IL) was modeled with the formula:
It = KJ exp(-A-jI) + K2exp(-A2t) (2.11)
where KI and K2 are the leakage currents generated by the unitary dose and Al and
A2 are constants.
Table 2.10 Comparison of the sensitiveness in a radiation field, for components manufactured by
various technology types
To be noted that both models take into account the synergy between irradiation
and thermal factor, because the coefficients depend on temperature following an
Arrhenius model. For instance, for the coefficient A from (2.10):
A = Ao exp(-EJkT). (2.12)
So, the superposition of temperature and ionisation radiation is accomplished.
2.3.2
Life testing with noncontinous inspection
The reliability tests are performed on samples withdrawn from a batch of compo-
nents. If the components are measured at foreseen inspection moments, when the
life tests are stooped, this is the method of noncontinuous inspection. On the con-
trary, if the components are measured permanently, during life testing, the method
is called continuous inspection. In most cases, the noncontinuous inspection, a
much cheaper method, is used. With this method, the failure moment is not accu-
rately known, being assimilated with the subsequent measuring moment. Further
on, a method for increasing the accuracy of the noncontinuous inspection will be
presented.
If n items were withdrawn from a batch of N components, (a", b,J, k = I,2, ... ,i is
the time period between two successive inspections, i is the total number of
inspections and m],m2, ... ,m; are the failures at each time period (ak' b,J, then:
m] + m2 + .... + m, = n
- sample volume: n
2. In a zero approximation, one emphasises that all items fail in the middle of the
time period (ak' bJ:
(2.19)
where:
k = l,for 1<j<mj,
k = 2,for mI+l<j<mI+m2,
3. The Weibull distribution parameters in the first approximation (~(O) and e(O)) are
calculated by using the Maximum Likelihood Estimation (MLE), with the iterative
method Newton-Raphson [2.93]. From the equation:
n n
E (flln II
j~I J
E (p -liP = (E
J j~I J j~I
In t)ln (2.20)
4. Further on, the failure moments for the first approximation are calculated with
relations (2.13 )... (2.17).
5. The Weibull distribution parameters in the first approximation (~(I) and e(l)) are
calculated.
6. The iterative process continues until I ~(r) - ~(r.l) I< £, where r is the order number
ofthe approximation and £ is the foreseen accuracy.
7. For the values ~(r) and e(r) the failure rate is calculated with formula:
A(t) = (prrJI ElJ) (t I (/rJ) P(r)-I. (2.22)
The method (called NIMLE = Noncontiuous Inspection with Maximum Likelihood
Estimation) will be used for an example used for the first time by Menon [2.91]. A
sample of 20 items, withdrawn from an Weibull population of 1000 items, with ~ =
0.5 and e = 2.7183. The failure moments for these 20 items were:
0.001 0.030 0.071 0.185 0.345 0.435 0.469 0.470 0.505 0.664
0.806 0.970 1.033 1.550 1.550 2.046 3.532 7.057 9.098 57.628
A noncontinuous inspection was simulated and the results presented in Table 2.11.
The results obtained with the NIMLE method were compared with results obtained
by Menon [2.91] and Cohen [2.92] (which is the MLE method), with the moment
method and with the graphical method (Table 2.12). From Table 2.12 one may note
that the NIMLE method, although starts from incomplete data, allows to obtain
surprisingly accurate results (especially refering to ~ value). By using this method,
the handicap of the noncontinuous inspection (compared with the continuous
inspection) may be almost surpassed.
2 State of the art in reliability 75
2.3.3
Accelerated testing
The first accelerated tests were made in the early 60's and tried to shorten the time
period necessary to obtain significant results from life tests. The failure mecha-
nisms must be investigated with great care, because it is essential that the failure
mechanism acting at the higher stress level be the same with that acting at normal
stress level. Accelerated life tests (ALT) with bias and temperature as stress factors
have been developed since the early 60's [2.57]. Data from ALT are processed with
the aid of life & stress models: Weibull or lognormal for the life models and Ar-
rhenius or reverse-power for the stress models [2.71][2.58]. Constant-stress tests,
with bias and temperature as stress factor, are used for quantitative determinations
of the failure rate. The activation energy (calculated from at least three constant-
stress tests performed at three different electrical and environmental conditions) is
the key parameter, allowing to extrapolate ALT results to normal operational condi-
tions.
A step-stress test used for qualitative determination of the failure rate was
proposed by Bazu [2.4], based on previous works [2.39][2.62]. This test was called
reliability fingerprint and allows obtaining a measure of the lot reliability. The
fingerprint of the lot reliability may be obtained as the response of the device to a
step-stress test: a single sample of 30-50 items, undergoing 4-10 hours at each
stress level, progressively increased, the number of failed devices at each stress
level giving the fingerprint. As one can see from Fig. 2.13, by comparing this
fingerprint with a reference fingerprint, obtained for a reference lot with the
reliability level determined from constant stress tests (3 samples of 30-50 items,
each sample at a given stress level, 1000 hours), the failure rate level of the lot can
be estimated. The clue of this method is to have a robust procedure for comparing
the fingerprint of the current batch with those of the reference batch. Currently, this
analysis is performed qualitatively by a human expert. But fuzzy logic allows
76 2 State of the art in reliability
developing a better comparison method [2.77]. The range of the ten steps is
fuzzified by a triangle-shaped membership function with five regions, as shown in
%
8
Step 2 3 4 5 6 7 8 9 10
%
8
Step I 2 3 4 5 6 7 8 9 10
Fig. 2.13 Comparison between: a the reliability fingerprint (RF) for a current batch and b the
fingerprint of the reference batch (RFref)
Table 2.13 Rapid estimation of the reliability level for the current batch presented in Fig.2.l3
(fuzzy comparison method with 5 regions)
Fig.2.9 and described by formula (2.3): very low, low, medium, high, very high,
referring to the reliability level of the batch. The method has been used for the two
RFs from Fig.2.9. They are compared in the purpose to evaluate the reliability level
of the current batch. The results are presented in Table 2.13. As a conclusion, the
current batch has a higher reliability level than the reference batch.
Eventually, a System for Rapid Estimation of the Reliability (SRER) was
created, based on constant and step stress tests [2.7].
Lately, new tendencies were observed in ALT: to increase the accelerated stress
level up to the highest possible value, counterbalanced by the idea of Barton [2.3]
which proposed a method for optimising ALT by minimising the maximum test-
stress. Another tendency is to perform accelerated tests early on the manufacturing
process (even at the wafer level).
It seems that the Arrhenius model must be corrected: the temperature is no
longer sufficient as an acceleration stress. The electric field must also be taken into
account. In an experiment performed for many samples withdrawn from the same
bipolar transistor batch, life tests at the same junction temperature but at different
electric fields were made [2.94]. Instead of having the same reliability level,
according to the Arrhenius model, an electric field dependence of the median time
for each sample (describing the lognormal distribution) was found:
(2.23)
where k is a constant and U is the applied voltage). The failure mechanism was the
formation of a diffusion channel and shortcircuiting by spikes. These spikes
depends on the width of the space charge region collector-base, which is directly
proportional to U 1I2 • So, the formula (2.22) is verified by the physics of failure. A
generalised Arrhenius model, taking into account the electric field dependence, was
given by Bazu [2.96]:
tm = A U 1l2 exp (E,/kT). (2.24)
This model proved to have a fair accuracy, being used for other types of
semiconductor devices.
2.3.3.1
Activation energy depends on the stress level
In the last years, experiments have shown that a dependence of the standard
deviation on the stress level exists. The current procedure for processing statistical
data from a lognormal distribution (as it is the case for most of electronic
components) is based on the assumption that the standard deviation (J has the same
value at any stress level [2.97]. In fact, it has been proved that the standard
deviation (and, as a consequence, the activation energy) is dependent on the stress
level.
An example is the experiment about the effect temperature and humidity on the
reliability of metallic pads [2.98]. Three test structures (metallic pads having width
of 8/llll and inter-pads distance of 811m, unprotected, to be more sensitive to the
effect of the environment) underwent tests at 70 .. .l30°C and 10 ... 70% relative
humidity. The leakage currents induced by humidity in the presence of the
temperature were measured. The phenomenon leading to the increase of the leakage
78 2 State of the art in reliability
current is the absorption of the water molecules on the structure surface. Three
models tried to explain the dependence of the surface conductivity (a parameter
linked to the structure reliability, because a high surface conductivity produce the
failure by shortcircuiting) on the temperature and humidity:
In y = In A + B (JrllT) + C InH (2.25)
2.3.4
Physics of failure
(2.29)
where Ta is the ambient temperature, rthj-a - the thermal resistance between junction
and ambient, P d - the dissipated power, Ci and di - coefficients. The coefficients ai,
b i, Ci and di may be calculated form experimental data. For instance, if there are
three stress factors (temperature, bias, humidity), i = 1, and the relations (2.29) and
(2.30) become:
(2.31 )
(2.32)
From this generalised model, previously developed models may be obtained (see
Table 2.14).
Table 2.14 Models obtained from the model described by the relations (2.31) and (2.32)
Parameters Models
SI al bl CI dl
0 - - - - Arrhenius
SI 0 0 1 1 Hakim-Reich [2.100)
SI I 2 0 - Lawson [2.100)
S, I m 0 - Peck [2.101)
The model can be used for calculating the failure rate at various environments and
electric stresses, but also to design accelerated tests. Such tests may be useful for
screening or for evaluating the reliability level of a batch of components. In Fig.
2.14, two examples are presented:
• Screening: the point SI is for normal test conditions; the equivalent duration of a
test performed at higher stress level (desribed by the point S2) is obtained: the
point 8 3•
80 2 State of the art in reliability
• Reliability evaluation: from an accelerated test, performed at the stress level EJ,
a lognormal failure rate distribution, described by the parameter tm (point E 2),
was obtained; by using the model, at the normal test condition described by the
point E 3, the value oftm (point E4) may be obtained.
.......:lE2. .
.: .... ....
: ....
~--~ ....
....
....
.... .
10
100 120 140 160 F (a.u.)
Fig. 2.14 Screening and reliability evaluation perfonned by using the model described by the
relations (2.29) and (2.30)
Absolutely fault Electrically fault- El. faultless, but El. faultless, but with struct ure im-
less components less, but with con- with structure perfections, caused by exceeding
structive & techno- imperfections the technological parameters
logical weaknesses
I
I Potentially unreliable components I
I Defect mechanisms
I
Structural causes
of the defect
I
Electrical causes
of the defect
Fig. 2.15 Emergence possibilities of the semiconductors defects
2 State of the art in reliability 81
It is important to note that the model described by the relations (2.29) and (2.30)
can be used for various stress factors, such as: temperature cycling, pressure and
mechanical stress.
The accelerated testing is now the main tool for the determination of the
reliability level. Recent progress was obtained in this field. Clark et al. [2.23]
presented an approach to design ALT experiments, using multiple stress, usable to
low-cost, high volume production items. Another method, developed by Klyatis
[2.45] is based upon physical modeling to demonstrate the influence of mechanical
and environmental factors under operating conditions.
In Fig. 2.15 a possible classification of the semiconductor defects depending on
their origin can be seen.
Initially, for an allowed load, almost always a constructive fault or a fabrication
failure occurs. Consequently, an ageing mechanism, respectively a latent structural
weakness of the component may come out. This is why the component structure is
continuously modified under the influence of its load, until finally its electrical
parameters exceed the allowed limits. Performing the defect analysis, the causal
relations are gradually discovered starting the examination from the causes of the
electrical fault.
Because of their high package density and reduced utilisation voltages, the
semiconductor components are exposed to various influences that produce failures
in operation and, unfortunately, often lead to their destruction. It must be
distinguished between internal and external influences.
The external influences operate through direct inductive, capacitive or chemical
effect or occur to the very sensitive structure through component operation i. e.
during the commutation, electrical current running or connecting to the electrical
mass. In this respect, the actions of the short-circuits, of the connecting and the
deconnecting shocks, of the atmosphere conditions, and of the inductive and ca-
pacitive influences are to be mentioned. Other possible effects are produced by UV
and X (for example REPROM) irradiation, by radiowaves irradiation and under
special conditions [2.65] through electromagnetic pulse EMP. While the EMP
effect is complex [2.13] and its action mode is multiple, the protection measures
against EMP are not simple. Until a certain point, they are similar to the protection
measures against lightning. Taking into account the differences between the rise
times, frequencies, field intensities and energy, and taking into account that the
involved domain can cover some million of km 2 , it can be noticed that the anti-
EMP protection has greater requirements and its realisation is more expensive.
The internal influences can act inside an electronic constructive element or for a
semiconductor structure, through the introduction of inductive currents, short-
circuit, and - at any rate - by inductive and particularly capacitive influences.
2.3.4.1
Drift, drift failures and drift behaviour
the useful lifetime and in the wearout period, when the parameters exceed the al-
lowed tolerance limits for the nominal values corresponding to the normal opera-
tion of the component. As a consequence of this parameter drift, overloads may
emerge and some of them can lead to total failures. Drift failures can be tracked
through periodical electrical characterisations of the components, before the general
failure occurs (for example, the growth of the resistance for the carbon resistors, the
diminution of the capacity for the electrolyte capacitors, the growth of the residual
current for the semiconductor components, etc.).
The drift behaviour can be tracked through long time researches and often
emphasises a dependency on the loading value. The drift can be eliminated through
ageing, so that during the useful lifetime a more stable behaviour can be obtained. If
the drift behaviour of the component is known, the useful lifetime can be deter-
mined by selecting correspondingly the load and the loading conditions. Utilising
these indirect methods to determine the reliability data needs a short time, and that
is why it can be renounced to the long trials.
At the design and re-design phases, the drift behaviour must be considered. This
enables to reduce the elaboration time and the costs.
Failure mechanism (FM) identification is essential for the reliability accelerated
testing, because the obtained degradation laws must be extrapolated beyond the
time period of the test and the extrapolation must be made separately for each
population affected by a FM. The subject is still modem, taking into account that
new failure mechanisms are discovered and even the old ones are not completely
explained. Consequently, a series of tutorial papers on FM were published from
1991 by IEEE Transactions on Reliability, almost in each issue. Most of these
papers were written by the specialists from CALCE Electronic Packaging Research
Centre, a research team led by Michael Pecht, from Maryland University (USA).
The following FM were investigated: overstress and wearout, including quantitative
models [2.26], excessive elastic deformation [2.27] irreversible plastic deformation
[2.28], brittle fracture [2.28], ductile fracture [2.29], buckling [2.30], mechanical
wear [2.34], creep, cycling fatigue [2.31], material ageing due to interdiffusion,
electromigration [2.75] irradiated polymeric material [2.1], popcoming of plastic
encapsulated components [2.38].
A method predicting the effect of particular defects on the failure rate of metal
interconnections due to electromigration was proposed by Kemp et al. [2.44]. The
defect of interest is the missing-material that reduces the effective cross section of
the conductor at the point of the defect.
Chick and Mendel [2.21] proposed a method for incorporating previous infor-
mation on FM into a lifetime model. The clue of this approach is that wear, stress
and strain are more directly linked to the failure than is the component age. A
lifetime model, based on previous information about wear, allows using lifetime
data for similar components used under various operating conditions.
Trends in micro systems integrating electronic, microelectromechanical, electro-
optical and micro-fluidic devices are bringing the miniaturisation close to its
physical limits creating in this way a need for extensive reliability, a physics effort
to identify and counter failure mechanisms in new devices [2.24].
2 State of the art in reliability 83
2.3.5
Prediction methods
Weare today far from the situation of late 1960s, when the poor intrinsic quality
and reliability of components dominated the failures in electronic equipment. In-
trinsic reliability is the reliability a system can achieve based on the types of de-
vices and manufacturing processes used. The models are oriented to describe fail-
ures as originated by the intrinsic characteristics of the components in relation to
their use and application. If intrinsic failures in the useful life period are really due
to inherent defects (a residual of defects that did not surface in early life), the track
record of the component vendor is more meaningful in this respect than the generic
handbook values of A. In fact, it is quite difficult to separate the contribution due to
the component from that induced by the application. In the 1960s, two approaches
were prevalent: a) a constant hazard rate is assigned to the components; b) the
influence of the environmental and loading conditions can be modelled using sim-
ple formulas or correction factors. In other words, the systems are assumed to be
simple series systems in a reliability sense, and the hazard rate of the system is the
sum of the hazard rates of all contributing components. The operating and the stress
conditions of the individual components were partially absent. Assigning a constant
hazard rate to a component is a fallacy. When a component fails, it is either because
it has been subjected to a freak overload during its useful life period, or because it
has reached a long-term wearout phase. The majority of the real life failures is
caused by external events, being in fact extrinsic failures (rather than caused by any
inherent deficiency of a component). Actually, among the available data a small
amount of non-intrinsic failures almost always exists, which it is not possible to
positively identify as non-intrinsic. All the field failures are caused by randomly
occurring defects accelerated by operational stress. Component failures (even those
considered of a random nature) are due to manufacturing defects or misuse that
show their effects during operation.
With bum-in
curve
Fig. 2.16 Superposition of physics offailure intrinsic reliability models with field failure data, in
the useful period
84 2 State of the art in reliability
• General estimations;
• Per function estimations;
• Analytical estimations.
For a dependable prediction of field component reliability we need information
about composite reliability (the sum of intrinsic and extrinsic failures - see Fig.
2.16). Failure rate prediction requires knowledge on event statistics as well as on
device robustness. The benefits of dependable predictions can be summarised as
follows:
• forecast the field reliability of a system;
• comparing the reliability of similar designs and ability to make trade-offs with
the aim to enhance reliability through derating or design changes;
• prediction of spare parts provisioning;
• prediction of warranty costs;
• prediction oflogistic support (repair and maintenance facilities);
• survey of the company's competitiveness.
2.3.5.1
Prediction methods based on failure physics
Hazard rate
Z(t)
I nored
i+---
Wearout period
(related to component
~i
Time
Fig. 2.17 The physics offailure modelling approach
Bazu [2.9] proposed an improved methodology called SYRP - for predicting the
failure rate of a lot - with the following characteristics: (i) a lognormal distribution
for each FM involved is used and (ii) the interaction (synergy) between the
technological factors depending on the manufacturing and control techniques, is
considered. Failure risk coefficients (FRC) - assessed at each manufacturing step -
are fuzzy sets (triangle membership function) and they are corrected at the
subsequent manufacturing steps by considering the synergy of the manufacturing
factors. From the final FRC, the parameters of the lognormal failure distribution are
calculated for each potential FM. A comparison of SYRP predictions and
accelerated life test results is shown in Table 2.15
Table 2.15 SYRP prediction VS. accelerated life test (ALT) results [SYRP/ALT in each column]
One may notice that although data obtained from AL T are experimental data,
SYRP prediction, made before performing ALT, seems to be fair enough.
A comparison of RPP, containing also the work done by Talmor [2.68], is given
in Table 2.16.
2.3.5.2
Laboratory versus operational reliability
What for a link exists between the reliability measured in laboratory and the opera-
tional reliability of components? It is known, that the values established by the
component manufacturer depend on the test conditions, and the operational values
depend on the incoming control conditions of the components used in electronic
systems. In practice, the differences between the two results can reach one or two
orders of magnitude. Surely, only the operational reliability can include all the
stresses that can demonstrate a sufficient reliability of the component. As long as
exact operational reliability knowledge is not available, the ratio between the reli-
ability measured in laboratory and the operational reliability will remain a perma-
nent discussion subject between manufacturer and user. The reduction of compo-
nent defects/failure rate can be obtained on the following ways:
2.4
Standardisation
2.4.1
Quality systems
2.4.2
Dependability
Dependability is the official title of the Technical Committee (TC 56) of the Inter-
national Electrotechnical Commission (IEC), producing international standards on
reliability, maintainability and availability. A fruitful co-operation [2.14] was initi-
ated between the ISO and IEC, at the level of the committees: ISO TC 176 (qual-
ity), IEC TC 56 (dependability), and ISO TC 69 (statistics). Consequently, the IEC
series IEC 300, on dependability management is directly linked with ISO 9000
family [2.55].
References
2.1 A1-Sheiklh1y, M.; Christou, A. (1994): How radiation affects polymeric material. IEEE
Trans. Reliability, vol. 43, no. 4, December, pp. 551-556
2.2 Bajenescu, T. (1996): Fiabilitatea componente1or electronice. (The reliability of electronic
components). Editura Tehnicii, Bucharest, Romania
Biijenesco, T. I. (1978): Microcircuits. Reliability, incoming inspection, screening and
optimal efficiency. International Conference on Reliability and Maintainability, Paris, June
19-23
Bajenesco, T. I. (1981): Problemes de la fiabilite des composants electroniques actifs
actuels. Masson. Paris
88 2 State of the art in reliability
2.26 Dasgupta, A.; Pecht, M. (1991): Material failure mechanisms and damage models. IEEE
Trans. Reliability vol. 40, no. 5, Dec., pp. 531-536
2.27 Dasgupta, A.; Hu, J. M. (1992): Failure mechanical models for excessive elastic
deformation. IEEE Trans. Reliability vol. 41, no. I, March, pp. 149-154
2.28 Dasgupta, A.; Hu, J. M. (1992): Failure mechanical models for plastic deformation. IEEE
Trans. Reliability vol. 41, no. 2, June, pp. 168-174 and Dasgupta, A., Hu, J. M. (1992):
Failure mechanical models for brittle fracture. IEEE Trans. Reliability vol. 41, no. 3, June,
pp.328-335
2.29 Dasgupta, A.; Hu, J. M. (1992): Failure mechanical models for ductile fracture. IEEE
Trans. Reliability vol. 41, no. 4, Dec., pp.489-495
2.30 Dasgupta, A.; Haslach Jr., H. W. (1993): Mechanical design failure mechanism for
buckling. IEEE Trans. Reliability vol. 42, no. 1, March., pp.9-16
2.31 Dasgupta, A. (1993): Failure mechanism models for cyclic fatigue. IEEE Trans. Reliability
vol. 42, no. 4, Dec., pp. 548-555
2.32 Demko, E. (1996): Commercial-Off the Shelf (COTS): challenge to military equipment
reliability. Proceedings of the Annual Reliability and Maintainability Symposium, Jan. 22-
25, Las Vegas, Nevada (USA), pp. 7-12
2.33 Dull, H. (1976): ZuverHissigkeit und Driftverhalten von WidersHinden. Radio Mentor no. 7,
pp. 73-79
2.34 Engel, P. (1993): Mechanical failure mechanism models for mechanical wear. IEEE Trans.
Reliability vol. 42, no. 2, June, pp. 262-267; Fiorescu, R. A. (1986): A New Approach to
Reliability Prediction is Needed. Quality and Reliability Eng. Int., vol. 2, pp. 101-106
2.35 Ermer, D. (1996): Proposed new DoD standards for product acceptance. Proceedings of the
Annual Reliability and Maintainability Symp., Jan. 22-25, Las Vegas, Nevada (USA), pp.
24-29
2.36 Frost, D. F.; Poole, K. F. (1989): RELIANT: A Reliability Analysis Tool for VLSI
Interconnects. IEEE Solid-State Circuits, vol. 24, pp. 458-462
2.37 Giilateanu, L. et al. (1996): Stress and strain in automotive diodes - a RVT, IR and XR
study. Proceedings of the International Semiconductor Conference CAS'96, Oct. 9-12,
Sinaia, pp. 361-364
2.38 Gallo, A. A.; Munamarty, R. (1995): Popcorning: A failure mechanism in plastic-
encapsulated microcircuits. IEEE Trans. Reliability vol. 44, no. 3, Sept., pp.362-367
2.39 Hakim, E. (1963): Step-stress as an indicator for manufacturing process change. Solid State
Design, March, pp. 115-117
2.40 Hansen, C. (1997): Effectiveness of yield-estimation and reliability-prediction based on
wafer test-chip measurements. Proceedings of the Annual Reliability and Maintainability
Symp., Jan. 13-16, Philadelphia, Pennsylvania (USA), pp. 142-148
2.41 Hoffman, D. (1997): An overview of concurrent engineering. Proceedings of the Annual
Reliability and Maintainability Symp., Jan. 13-16, Philadelphia, Pennsylvania (USA), pp.
1-6
2.42 Jansen, F.; Petersen, N. (1982): Burn-in - an engineering approach to design and analysis of
burn-in procedures. John Wiley & Sons, Inc., Chichester
2.43 Jensen, F. (1995): Electronic component reliability. John Wiley & Sons, Inc.
2.44 Kemp, K. G.et al. (1990): The effects of particular defects on the early failure of metal
interconnects. IEEE Trans. Reliability, vol. 39 , no. 1, April, pp. 26-29
2.45 Klyatis, L.M. (1997): One strategy of accelerated testing technique. Proceedings of the
Annual Reliability and Maintainability Symp., Jan. 13-16, Philadelphia, Pennsylvania
(USA), pp.249-253
2.46 Knight, C. R. (1991): Four decades of reliability progress. Proceedings of the Annual
Reliability and Maintainability Symp., Jan. 29-31, Orlando, Florida (USA), pp.156-160
2.47 Kross, E. J.; Sicuranza, M. A. (1996): Commercial-components initiative: ground benign
systems - plastic encapsulated microcircuits. IEEE Trans. Reliability vol. 45, no. 2, June,
pp. 180-183
2.48 Kuehn, R. E. (1991): Four decades of reliability experience. Proceedings of the Annual
Reliability and Maintainability Symp., Jan. 29-31, Orlando, Florida (USA), pp.76-81
90 2 State of the art in reliability
2.49 Kuo, W.; Oh, H. (1995): Design for reliability. IEEE Trans. Reliability vol. 44, no. 2, June,
pp.170-171
2.50 Li, J.; Dasgupta, A. (1993): Failure mechanism models for creep. IEEE Trans. Reliability
vol. 42, no. 3, Sept., pp.339-353
2.51 Li, J.; Dasgupta, A. (1994): Failure mechanism models for material ageing due to
interdiffusion. IEEE Trans. Reliability vol. 43, no. 1, March, pp.2-10
2.52 Lukis, L. W. F. (1972): Reliability assessment - myths and misuse of statistics.
Microelectronics and Reliability vol. 11, no. 11, pp. 177-184
2.53 Nicholls, D. (1996): Selection of equipment to leverage commercial technology (SELECT).
Proceedings of the Annual Reliability and Maintainability Symp., Jan. 22-25, Las Vegas,
Nevada (USA), pp. 84-90
2.54 O'Connor, P. D. T. (1993): Quality and reliability: illusions and realities. Quality and
Reliability Engineering Internat., vol. 9, pp. 162-168
2.55 O'Leary, D. J. (1996): International standards: their new role in a global economy.
Proceedings of the Annual Reliability and Maintainability Symp., Jan. 22-25, Las Vegas,
Nevada (USA), pp. 17-23
2.56 Pecht (1994): Quality Conformance and Qualification of Micro Electronic Package and
Interconnects. John Wiley & Sons, Inc.
2.57 Peck, D. S. (1961): Semiconductor reliability predictions from life distribution data.
Semiconductor Reliability, Reinhold Publishers, pp. 51-63
2.58 Peck, D. S.; Zierdt Jr., C. H.(1974): The reliability of semiconductor devices in the Bell
System. Proceedings of the IEEE, vol. 62, no. 2, Feb., pp. 185-211
2.59 Robineau, J. et al. (1992): Reliability Approach in Automotive Electronics. ESREF '92, pp.
133-140
2.60 Roy, K.; Prasad, S. (1995): Logic synthesis for reliability: an early start to controlling
electromigration & hot-carrier effects. IEEE Trans. Reliability, vol. 44, no. 2, June, pp.
251-255
2.61 Rudra, B.; Jennings, D. (1994): Failure mechanism models for conductive-filament
formation. IEEE Trans. Reliabilityvol. 43, no. 3, Sept., pp.354-360
2.62 Ryerson, I. (1978): Reliability testing and screening: a general review paper.
Microelectronics and Reliability, no. 3, pp. 112-118, London
2.63 Schneider, C. (1997): The GIQLP-Product integrity's link to acquisition reform.
Proceedings of the Annual Reliability and Maintainability Symp., Jan. 13-16, Philadelphia,
Pennsylvania (USA), pp. 26-28
2.64 Jensen, F.; Petersen, N. E. (1982): Bum-In. Wiley, New York
2.65 Taguchi, G. (1995): Quality engineering (Taguchi methods) for the development of
electronic-circuit technology. IEEE Trans. Reliability, vol. 44, no. 2, June, pp. 225-229
2.66 Silberhorn (1980): Aussere, einschrenkende Einfliisse auf den Einsatz von VLSI-
Bausteinen. Bulletin SEVNSE vol. 71, no. 2, pp. 54-56
2.67 Talmor, M.; Arueti, S. (1997): Reliability prediction: the tum-on point. Proceedings of the
Annual Reliability and Maintainability Symp., Jan. 13-16, Philadelphia, Pennsylvania
(USA), pp. 254-262
2.68 Tang, S.-M. (1996): New bum-in methodology based on IC attributes, family IC bum-in
data and failure mechanism analysis. Proceedings of the Annual Reliability and
Maintainability Symp., Jan. 22-25, Las Vegas, Nevada (USA), pp. 185-190
2.69 Tretter (1974): Zum Driftverhalten von Bauelementen und Geriiten. Qualitiit und
Zuverliissigkeit (Germany), vol. 19, no.4, pp. 93-79
2.70 Traon Le, I.-Y.; Treheux, M. (1977): L'environnement des materiels de telecom-
munications. L'echo des recherches, Oct., pp. 12-21
2.71 Tseng, S.-T.; Hsu, C.-H. (1994): Comparison of type-I & type-II accelerated life tests for
selecting the most reliable product" IEEE Trans. Reliability, vol. 43, Sept., pp. 503-510
2.72 Wong, K. L. (1993): A change in direction for reliability engineering is long overdue. IEEE
Trans. Reliability, vol. 42, no. 2, June, pp. 261-266
2 State of the art in reliability 91
2.73 Yang, K.; Xue, 1. (1997): Reliability design based on dynamic factorial experimental
model. Proceedings of the Annual Reliability and Maintainability Symp., Jan. 13-16,
Philadelphia, Pennsylvania (USA), pp. 320-326
2.74 Yates, W.; Johnson, R. (1997): Total Quality Management in U. S. DoD electronics
acquisition. Proceedings of the Annual Reliability and Maintainability Symp., Jan. 13-16,
Philadelphia, Pennsylvania (USA), pp. 571-577
2.75 Young, D.; Christou, A. (1994): Failure mechanism models for electromigration. IEEE
Trans. Reliability vol. 43, no. 2, June, pp. 186-192
2.76 Lall, P. (1996): Temperature as an input to microelectronics-reliability models. IEEE
Trans. Reliability, vol. 45, no. 1, pp. 3-9
2.77 Bazu, M. (1999): Reliability assessment based on fuzzy logic. International Conf. on
Computational Intelligence for Modelling, Control and Automation, CIMCA'99, Viena,
Austria, February 17-19
2.78 Bosch, G. (1979): Model for failure rate curves. Microelectronics and Reliability, vol. 19,
pp.37l-379
2.79 Hallberg, O. (1977): Failure-rate as a function of time due to log-normal distributions of
weak parts. Microelectronics and Reliability, vol. 17, pp. 155-161
2.80 Moltoft, J. (1980): The failure rate function estimated from parameter drift measurement.
Microelectronics and Reliability, vol. 20, pp.787-791
2.81 Ash, M.; Gorton, H. (1989): A practical end oflife model for semiconductor devices. IEEE
Trans. on Reliability, October, pp. 485-493
2.82 Livesay, B. R. (1978): The reliability of electronic devices in storage environment. Solid
State Technology, October, pp. 63-68
2.83 Calatayud, R.; Szymkowiak, E. (1992):Temperature and vibration results form captive-
store flight-tests provide a reliability improvement tool. Proceedings of the Annual
Reliability and Maintainability Symp., Las Vegas, Nevada, January 21-23, pp.266--271
2.84 Popa, E. et al. (1986): Thermal fatigue - a limitation of the reliability of medium power
semiconductor devices. Proceedings of the Annual Conference for Semiconductors,
October, Sinaia (Romania), pp. 247-250
2.85 Udrea, M. et al. (1988): Intermittent functioning - an efficient method for evaluating the
reliability of soldering systems. Proceedings of the Annual Conference for Semiconductors,
October, Sinaia (Romania), pp. 219-222
2.86 Bazu, M. et al. (1989): Behaviour of semiconductor components at temperature cycling.
Revue Roumaine des Sciences Techniques, January-March, pp. 151-155
2.87 Hu, J. et al. (1992): Role of failure mechanism identification in accelerated testing.
Proceedings of the Annual Reliability and Maintainability Symp., Las Vegas, Nevada,
January 21-23, pp. 181-188
2.88 Bi'ijenescu, T.I. (1985): Zuverlassigkeit Elektronischer Komponentes. VDE Verlag
2.89 Bazu, M. (1982): A mathematical model for the reliability of semiconductor devices. Elec-
tronics and Automatics, no. 4, pp. 151-157
2.90 Bazu, M. (1982): Reliability prediction for a Weibull population of semiconductor compo-
nents: the non-continuous inspection case. International Conference on Reliability, Varna
(Bulgaria), May
2.91 Menon, M.V. (1963): Estimation of the shape and scale parameters of the Weibull
distribution. Technometrics, no. 2, pp. 175-181
2.92 Cohen, A. C. (1965): Maximum likelihood estimation in the Weibull distribution based on
complete and on censored samples. Teclmometrics, no. 4, pp. 579-585
2.93 Thoman, D. et al. (1969): Inference on the parameters of the Weibull distribution.
Technometrics, no. 3, pp. 445-453
2.94 Bazu, M. et a!. (1987): Failure mechanisms accelerated by thermal and electrical stress.
Proceedings of the Annual Conference for Semiconductors, October, Sinaia (Romania), pp.
53-56
2.95 Bazu, M. (1992): Accelerated life test when the activation energy is a random variable.
Proc. of the Int. Semicond. Conf. CAS '92, October 5-10, Sinaia, pp. 245-248
92 2 State of the art in reliability
2.96 Bazu, M. (1990): A model for the electric field dependence of semiconductor device
reliability. 18th Int.Conf. on Microelectronics (MIEL). Ljubljana, Slovenia, May
2.97 Peck, D. S.; Trapp, O. D. (1987): Accelerated Testing Handbook. Technology Associates,
Portola Valley, California
2.98 Weick, W. (1980): Acceleration factors for IC leakage current in a steam environment.
IEEE Trans. on Reliability, June, pp. 109-114
2.99 Schwartz, J. A. (1987): Temperature dependent standard deviation of log (failure time)
distributions. J. Appl. Phys., vol. 61, pp. 801-805
2.100 Reich, B.; Hakim, E. (1972): Environmental factors governing field reliability of plastic
transistors and integrated circuits. International Reliability Physics Symp., pp. 82-87
2.101 Peck, D. S. (1986): Comprehensive model for humidity testing corelation. International
Reliabilty Physics Symp., pp. 44-50
2.102 Klinger, D. J. (1991): Humidity Acceleration Factor for Plastic Packaged Electronic-
Devices. Quality and Reliability Engineering International, vol. 7, pp. 365-370
2.1 03 Baw, M. (1982): Temperature dependence of the reliability of semiconductor components.
National Conference of Electronics, Telecommunications and Computers (CNETAC), No-
vember, pp. 1.81-1.85
2.104 Bazu, M. et al. (1987): Reliability of semiconductor components in the first hours of func-
tioning at high temperature. Electrotechnics, Automatics and Electronics, no. 1, pp. 10-15
2.105 Bazu, M.; Ilian, V. (1990): Accelerated testing of integrated circuits after storage. Scandi-
navian Reliability Engineers Symp., Nykoping, Sweden, October
2.106 Bazu, M. (1992): Synergetic effects in reliability. Optimum Q, no. 2, April, pp. 32-35
2.107 Bazu, M. (1992): Accelerated life test when the activation energy is a random variable.
Proc. of the Int. Semicond. Conf. CAS '92, October 5-10, Sinaia, pp. 245-248
2.108 Baw, M.; Bacivarof, I. (1989): On the Validity of the Arrhenius Model in the Accelerated
Testing of Semiconductor Devices Reliability. In: Aven, T. (ed.) Reliability Achievement,
Elsevier Science Publishers Ltd., pp. 151-157
2.109 Birolini, A. (1996): Reliability Analysis Techniques for Electronic Equipment and Systems.
Proceedings ofEuPac'96, Essen, January 31-February 2, 1996
2.110 Birolini, A. (1996): Reliability Engineering: Cooperation between University and Industry
at the ETH Zurich. Quality Engineering 8(4), pp. 659-674
2.111 Bora, J. S.: Limitations and Extended Applications of Arrhenius Equation in Reliability
Engineering. Microelectronics and Reliability vol. 18, pp. 241-242
2.112 Bowles, J. B., Klein, L. A. (1990): Comparison of Commercial Reliability-Prediction Pro-
grams. Proc. Ann. ReI. & Maint. Symp., pp. 450-455
2.113 Howes, M. J. and Morgan, D. V. (Eds.) (1981): Reliability and Degradation. J. Wiley, New
York
2.114 Pieruschka, E. (1963): Principles of Reliability. Prentice-Hall, Englewood Cliffs
2.115 Schaeffer, R. L. (1971): Optimum Age Replacement Policies With an Increasing Cost
Factor. Technometrics no. 13, pp. 139-144
2.116 Sinnadurai, N (1991): Environmental Testing and Component Reliability Observations in
Telecommunications Equipment Operated in Indian Climatic Condition's. Proceedings
ESREF'91, Bordeaux, pp. 55-63
3 Reliability of passive electronic parts
3.1
How parts fail
I A device can fail in a catastrophic, degradation, or intelTI1ittent mode. Electrical failures are
usually opens, shorts, or parameter drift out of specification. The failure mechanism is the
basic chemical of physical change that results in an identifiable failure mode. Similar
definitions apply to mechanical failure modes and mechanisms. Part failures can be labelled as
early fallout, stress-related, and wearout. Early failures are often linked to design and
manufacturing flaws or to reliability screening escapes, but can be stress-related. NOlTI1al
operating stresses cause most failures that occur after the early ones. Wearout failures are
linked to ageing and deterioration.
2 Hazard rate is defined as the rate of change of the number of parts that have failed at a
particular time, divided by the number of surviving parts.
A(t), Z(t)
.: .. Wearout
Ts Tw
Fig. 3.1 Overall life characteristic curve
3.2
Resistors
where a distinct breakpoint may be identifiable, so that reducing stress beyond the
breakpoint will not significantly affect device failure rate.
In any organisation - even those engaged in purely commercial manufacturing
- preferred parts lists (PPL) have an important place, because they offer the
opportunity, to the component specialist, of controlling the use of components of
known quality level. An ideal PPL should include an outline of the relative costs
of components [3.5]. It is vital that the designers give at an early stage a clear
statement of the components to be used. Non-standard part procurement is very
costly, particularly when qualification data has to be supplied.
conclusion that a lack of balance will be produced (in comparison with the initial
state, or in comparison with the output state), and that this lack of balance will
increase with time. We can presume that at the manufacture of a lot of electronic
parts (initially foreseen to have the best quality and reliability) - because of an
error during the production - the quality requirements will not be satisfied by all
prescribed parameters. As the "error" is known, the reliability is not necessarily
negatively influenced. In this case, we sort all the items that satisfy all high quality
and reliability requirements, without knowing why they have had this behaviour.
Due to a superposition effect, it is possible for a second error to have been
introduced. In this latter case, with a sorting process, the high quality components
will be eliminated, too.
The parameter list, given to the user by the component manufacturer, must
contain the following important points:
• the foreseen type (eventually special operating requirements)
• the smallest / the highest value of the operating resistance
• the maximal charge during operation
• the maximal temperature of the environment in which the resistor is operating,
taking into account the heating produced by the other parts of the mounted parts
of the equipped card
• the maximum value of relative humidity during operation
• the operating type (pulse/uninterrupted); by pulse operation, it is necessary to
have the exact form of the pulse and its repetition frequency
• the maximal foreseen duration of operation
• the mean time operation of the system during 24 hours
• the data concerning the eventuality of a particularly unfavourable operation
• the foreseen failure criteria (what is understood under "failure" and the maxi-
mall accepted limits)
• the desired statistical safety. (In which percent the electronic parts must
correspond to the data supplied by the producer? The data must generally be as
exact as possible but - for sudden variations - it is difficult to have more than
0.073%, respectively 30', in a statistical sense. The specified confidence limit
cannot be greater than 91 %).
3.2.1
Some important parameters
The size of resistor values variations depends on temperature (compared with the
reference temperature) and is expressed generally in 1O-6/0 C. If the variation is
linear in the range of operating temperature, we call it a temperature coefficient; if
this variation is non-linear, it is named the resistance / temperature curve.
The so-called own noise is the consequence of the tension produced in the
inside part of the resistor. Any element with a resistance R, which is found in a
thermic balance, has an own white noise, proper to all constructive elements.
The temperature coefficient of the resistor (expressed in percentage) is the
measure of the variation of a resistance under voltage. The resistor value is
influenced by time, mechanical factors, humidity and operating conditions. If - by
98 3 Reliability of passive electronic parts
L1RIRx 100(%)
1.5
1.0
-
0.5 /'
o V
o 2000 4000 6000 8000 10,000 time (h)
Fig. 3.2 Time behaviour domain of 100 carbon film resistors (IMn/O.25 W; nominal power).
Prescribed limit value L'iRIR = 1%
3.2.2
Characteristics
Table 3.3 Comparison between metal film and carbon film resistors (general specifications;
charge 0.1 ... 2 W)
t (hours)
106
10 5
104
10 3
102
Fig. 3.3 Drift data for metal film resistors in accordance with MIL-R-10509: t - operating time;
6 K - body temperature (OC); tlR - resistance variation (%)
c) Wirewound resistors:
• smaller temperature coefficient; very reliable if proper care is taken to keep the
operating temperature within reasonable limits;
• hardly measurable temperature coefficient (if one intends to obtain quality
resistors);
• very small own noise;
• susceptible to induction;
• bad properties at high frequency, and great dissipation power.
100 3 Reliability of passive electronic parts
3.2.3
Reasons for inconstant resistors [3.8]. .. [3.10]
&?/R (10)
5
2 a
o
5
4 N nom
0 t (h)
0 10 000 20000 30000 40000 50000
Fig. 3.4 Parameters variation by ageing depending on the following parameter: a nominal value;
b operating power; c nominal charge [3.9]
3 Reliability of passive electronic parts 101
3.2.3.1
Carbon film resistors (Fig. 3.4)
Certain impurities are working on the quality of the film resistors (which can not
always be avoided during the manufacturing process). For example, since the
ceramic support, the carbon film and encapsulation cannot be maintained entirely
free of ionic substances, these ones - like the humidity - are the premises for the
ion migration and for indirect destruction of the film resistors, as a result of
electrolytic phenomena [3.8].
Another cause of inconstancy is the irregular feldspat local concentration on the
surface of the porcelain. These concentration differences may lead to smaller
thickness of the carbon film which has a higher resistance in these regions. The
results are specific greater charges and strong local heatings which lead to the
destruction of carbon film and to early failures. The best preventive measure is the
underheating.
3.2.3.2
Metal film resistors
The predominant factor is the oxidation, in accordance with Arrhenius law. Other
parameters that influence the stability characteristics are the surface roughness, the
chemical reaction between the used materials, the proportion of alkaline ions, etc.
The evaporation speed, the substrate temperature, the resistance value, the
temperature coefficient, etc. are determinant factors on which the film formation
depends. A certain temperature level must be overreached to obtain significant
variations.
3.2.3.3
Composite resistors (on inorganic basis)
The only ageing causes are the local thermal processes at the terminal contacts
level.
3.2.4
Some design rules
3.2.5
Some typical defects of resistors
The resistors failures can be explained by one or more of the next factors:
• fatigue, interruptions
• design errors
• manufacturing errors
• inadequate utilisation.
a b c
Charge level
"",
1.0
F F
-r--"
0.8
D
"
0.6
...
0.4 p
0.2
o s
(Oe) 0 40 80 120 160 40 60 80 100 120 0 40 80 120 160
Fig. 3.5 Minimisation charging curve for: a) carbon film resistors; b) metal film resistors; c)
wirewound resistors. P = permitted region; the area with the best ratio reliability/costs and with
optimal safety working reserves. Utilisation of resistors in this area is very frequent since a
reliability deterioration is normally not expected. D = doubtful region; in this area the resistors
are working without going beyond the nominal values, but not with the optimum reliability. F =
forbidden region; in this area the nominal values are exceeded and the resistors are overcharged
3 Reliability of passive electronic parts 103
20 ./ 192°e
15 / I
10 ./
V II
I
V l60 e I
. /~
0
5
l~OOr"' i
o
~ we \
"1 Or"'
t (hours)
o 1000 2000 3000 4000 5000 6000
Fig. 3.6 200kn carbon film resistor time behaviour at different normal operating temperatures
(mean values, alternating voltage)
A(%/lOOOh)
operating power / nominal power
0.1 ~-~--,------,-----;,------.--------.-.-~~---,r-..-----, 0.2
f--+---+--+--I--+---A-f~hh~----:c=- 0.4
0.08 I----+----+--+----+---_l_~~,K--,~~~~_l_-~ 0.6
I----+----+--+----+--______'-.~L-='I_~'---~_l_~~ 0.8
1.0
0.06
0.04
--t--~~-I-~<---J-----------I~--+---t-I
I '
0.02
0.01
o 20 40 60 80 100 120 140 160 180
Fig. 3.7 Failure rate dependence on the operation temperature, for different derating ratios and at
a relative humidity s 60%
substrate layer and by the oxidation of constitutive elements of the resistors, after
the uninterrupted utilisation during some years.
The defective design is rarely encountered in the case of operating products.
Hazard manufacturing errors may appear if the producer employs a new material,
without a sufficient previous testing. Inadequate utilisation of the resistors can be
reproached to the user only.
3.2.5.1
Carbon film resistors
3.2.5.2
Metal film resistors
The metal film is thinner than the layers utilised for the manufacture of carbon
film resistors. Therefore the probability for fissures and "hot spots" to arise -
which can lead to failures at interruption - is much more important (in the case of
early failures, the interruptions are the most frequent phenomenon). Other frequent
defects are the non-homogenities of the resistive layer (for example a too thin
layer of the wirewound resistors, because of the defective disposition of the
helixes, and of the bad cut grooves, can lead to intermittent contacts between two
neighbouring helixes and - as a consequence - to instability and to high noise
level).
For thick film resistors, Russel [3.12] indicates the following failure rates (at
60% confidence level):
• sudden failures: A = 9.lO'8h'l
• drift failures: A = 2.5 .lO'7h'l.
In Fig. 3.7 the variation of the failure rate A (in % for 1000 hours) versus the
operating temperature (0C), for different undercharging ratios (operating power /
nominal power) and at a relative humidity ~60% is shown. The final A value is
3 Reliability of passive electronic parts 105
obtained by multiplication of the values from Fig. 3.7 with the product iJ2 (see
Tables 3.7).
3.2.5.3
Film resistors
In the case of film resistors (whose film is neither metallic nor carbon) there are
some possibilities to emphasise the early failures, before putting the resistor into
operation. For well manufactured resistors, and for a smaller charge, a diminution
of A during the whole observation period can be recorded; some resistors have -
for a small charge - interrupting failures only after 30 years of operation. The
resistor reliability depends not only on charging, but also on the resistance value.
3.2.5.4
Fixed wirewound resistors
The most frequent failure causes are short-circuits between two neighbouring
helixes and bad contact between wire and terminal, especially for thin wires.
3.2.5.5
Variable wirewound resistors
3.2.5.6
Noise behaviour
Figure 3.8 depicts the noise behaviour of the three main resistor types. Resistors
may be purchased off-the-shelf as established reliability (ER) devices for most
parts. Various screening programmes, established on the basis of life test criteria,
are available. Details may be found in [3.13].
3.3
Reliability of capacitors
3.3.1
Introduction
There are few electronic components which - carrying out one and the same
function - can differ so much from the constitutive materials point of view as the
capacitor. Concerning the constitutive materials, the capacitors are built from two
electrodes isolated with a dielectric medium, serving to store an electrical charge.
They are characterised by the capacity (with the given tolerance), by the dielectric
constant of the utilised dielectric medium, by the insulating time constant (in the
106 3 Reliability of passive electronic parts
Noise (JiV/V)
2.0 , - - - - - - - - , - - - - - - - - - , - - - - - - - - . ,
1.6
1.2
0.8
0.4
o R (MQ)
0.001 0.01 0.1
Fig. 3.8 Noise variation for 1) metal resistor; 2) carbon resistor and 3) wirewound resistor
Which are the criteria for selecting the right capacitor for the right application?
There are some important parameters such as:
• the required reliability for the entire device
• the system complexity
• the component failure rate and its time dependence
• the costs of a system failure
• the required precision of life duration prediction for different operating voltages
and temperatures
• the environmental component stress (chemical, electrical and mechanical
influences)
• the limitations concerning the dimensions and weight of components.
The capacitors can have fixed or variable capacities. In the first category there are
capacitors with paper, plastic (KS, KP, KT, KC), metallised plastic (MKP, MKT,
MKC, MKU), metallised paper (MP, MKV), mica 3, synthetic film, polyester
film/foil, ceramic\ electrolytic, tantalum and special capacitors. The trimmers
belong to the second category.
3 Mica as a dielectric can withstand temperatures up to about 400°C before dehydration occurs,
but mica capacitors are limited by the sealing material. Silvered mica capacitors in Mycalex
cases will operate at about !30°C. Vitreous-glaze capacitors should operate satisfactorily at
150°C in sizes comparable to the mica capacitors.
4 High-permitivity ceramic dielectric capacitors cannot - in general - operate beyond 100°C
because of a degradation effect known as creep, which becomes apparent as a change in
capacitance with temperature; the mechanism of the change is not fully understood, but is
being under study.
3 Reliability of passive electronic parts 107
The prime objective of any system is that it must meet the basic operational
performance. High reliability, good maintainability, electromagnetic compati-
bility and other desirable goals are of course important, but they are secondary
factors. The struggle between the basic performance requirements and the
reliability and maintainability requirements are often reflected in the part selection
problems. Choosing the latest types of parts can improve performance and
sometimes leads to an increase of the reliability level. Great care is needed to
ensure the following: a) the new part (range) is indeed superior; b) the new part
will become a de facto standard and thereby multisourced; c) the new part is
qualifiable to a degree equal to standard parts of roughly similar function.
Capacitors may be purchased off-the-shelf as established reliability (ER)
devices for most parts. Various screening programmes, established on the basis of
life test criteria, are available. As an example, ceramic capacitors may be screened
at twice rated voltage for a specified time period, at a maximum rated temperature.
Details may be found in [3.14].
3.3.2
Aluminium electrolytic capacitors
Half-dry electrolytic wound capacitors (the most used) are formed from an
oxidised aluminium foil (anode and dielectric) and a conducting electrolyte
(cathode). A second aluminium foil is utilised as covering cathode layer. They are
available with two formed nonpolarised foils and they have large loss factor
(frequency and temperature dependent), a limited useful life, and are not too
reliable (A = 10 to 50FIT; drift, shorts, opens). If their utilisation cannot be
avoided, it is better to choose the types built with high quality requirements. In the
case of aluminium electrolytic capacitors for high requirements - besides the
unavoidable early failures - they have nearly always wearout failures, too. The
beginning of these wearout failures limits their usability. A reliability dependence
on the size of the case and of the electrolyte quantity was proved: the smaller the
capacitor, the shorter the useful life and the higher its failure rate.
The operating capacity of these types is limited by the existence of a
determined adequate electrolyte quantity. As a consequence of the diffusion, of
the ageing or of the decomposition, the active electrolyte quantity of a capacitor
system diminishes and leads to a growth of the loss factor (tan c5) or to a
diminution of the capacity. These modifications are important in the case of fluid
systems; for the solid semiconductor electrolytes, the changes are insignificant and
- generally - don't lead to the fault of the capacitor. The growth of the leakage
current over a certain limit serves as failure criterion for this capacitor type, since
tan 5 and the capacity have only unimportant variations. Due to the structure and
to the operation mode of electrolytic capacitors, the voltage stresses at the nominal
value are not taken into account, the solution for accelerated testing is the
operation at high temperatures. In the case of operation at temperatures over the
guaranteed limit value given by the manufacturer, supplementary electrolyte
108 3 Reliability of passive electronic parts
losses appear. It results that one must remember that the life duration of a
capacitor is inversely proportional to the specific loading:
q= U. C/V (3.1)
where U = nominal voltage, in volts; C = capacity, in JLF; V = volume, in cm3 .
Studies have demonstrated that this relationship is valid only for a capacitor
batch having the same geometric dimensions and manufactured with the same
technology. The same studies have led to the conclusion that to evaluate the
natural duration of life of electrolytic capacitors, the duration test at nominal
voltage is inadequate. Since these tests are rarely utilised in permanent operation
conditions, the so-called "duration life" studies undertaken at nominal voltage and
maximum operation temperature (having the aim to estimate the total natural
lifetime) will lead to completely false results.
The natural behaviour can be faithfully reproduced with periodic testing
methods [3.15][3.16]. According to these methods, the capacitors are submitted to
voltage and high temperature, with a certain periodicity, and then stored without
voltage. High temperatures accelerate the test. Until now, the aluminium
electrolytic capacitors are the classical example of components with increased
failure rate, and suffering a clear ageing phenomenon. We can say that the end of
the duration life is programmed in advance. Many manufacturers - taking into
account these circumstances - indicate, for high temperatures, a maximum
duration of life, in years.
3.3.2.1
Characteristics
The miniaturisation has led to the reduction of the foil surface. To increase the foil
surface, the chemical or the electrochemical ruggedness is utilised, which allows
obtaining higher capacities for the same volume, and for relatively harder
reliability conditions. The capacity of this capacitor type strongly depends on
voltage, temperature and frequency. Because of the bad electrolyte conductivity at
temperatures under O°C, the operational capacity is strongly affected (growth of
the capacitor impedance, expressed by increased apparent resistance and dissipa-
tion factor values, and by an increased apparent series resistance, respectively).
The alternating current passing through the equivalent series resistance can heat so
much the aluminium electrolytic capacitor (in spite of reduced environmental
temperature), that it is not possible to maintain its capacitive properties necessary
for the system operation.
As a result, the capacity variation with temperature is an important quality
criterion. To increase the lifetime and the reliability of a capacitor, it is recom-
mended that they operate at the lowest possible temperature. For the same reason
it is recommended to mount this capacitor type in the zones with the lowest
environmental temperature. The highest storage temperature is +40°C, but an
operating temperature between 0 and +25°C should be preferred. Other
disadvantages of the electrolytic capacitors are: an important leakage current (as a
consequence of the imperfect closing device current), a strong dependence on
3 Reliability of passive electronic parts 109
4 90
.'.
:IA'"
3
.......
70
I \ ......
-
2 50
. /~
~
....... ......'
30 I
1.. . /
...
o o
2 5 10 20 50 100 t (x24 h) o 5 10 15 20 25 30 t(l03h)
Fig.3.9 Impedance and residual current variation for an electrolytic capacitor 681J.F I 15Y for
an environmental temperature of +70°C (at nominal voltage, without charge). - Charge with
nominal d. c. voltage; Without charge (environmental temperature + 70°C)
process, reactivating). One may say that a capacitor is in a conserving state if the
voltage applied is smaller than O.l5UN (UN = nominal Voltage). For conserving
temperature between 15 and 40°C the new forming process (reactivating) must be
applied after the amount of years specified in Table 3.3.
Table 3.3 Correlation between storage duration and new forming process (reactivation) for wet
aluminium electrolytic capacitors, for different nominal voltages and diameters
3.3.2.2
Results of reliability researchstudies
Reliability researches have led to a long series of results [3.15] ... [3.29].
Concerning the lifetime, the wet aluminium electrolytic capacitors can be
classified [3.17] in the following seven categories:
Class A - guaranteed 1000 h at 70°C
Class B - guaranteed 2000 h at 70°C
Class C - guaranteed 1000 h at 85°C
Class D - guaranteed 2000 h at 85°C
Class E - guaranteed 5000 h at 85°C
Class F - guaranteed 10 000 h at 85°C
Class G - guaranteed 2000 h at 125°C.
"'8
............ .... '..... .........
104 ................ ••••• "
~..:·::···>D 'E
··c
.,
,',
"
D' G ....
c••• '
"
..•••• ..... , " ~.,.,' .
"
104
"~'.
'"
~:: .... .... , "
. F
~~
·B
In Fig. 3.10 the guaranteed lifetime for these capacitors, unaffected from
voltage and encapsulation is shown. Fig. 3.11 gives the possible lifetime for
different case studies.
To calculate the failure rate of the dry aluminium electrolytic capacitors,
Durieux [3.17] - starting from the relationship 11,== AbITu.IIo.1O·9h-1 (where ITu
and ITo represent the environmental factor, respectively the quality factor) -
proposed an adequate nomogram. Ab depends on the charge p and on the
temperature.
3.3.2.3
Reliability data
3.3.2.4
Main failures types
During the operation time, the electrolytic capacitors are submitted to a multitude
of stresses. To evaluate the quality and reliability, we must consider not only the
electrical stresses due to the voltage and current, but also the mechanical and
microclimatic influences, caused mainly by the temperature and humidity of the
air [3.6][3.19][3.21].
Table 3.4 Criteria for aluminium electrolytic capacitors drift failures (DIN 41240, 41332)
Elements Severe Normal
specifications specifications
* Growth of tan Jvs. the initial value. 3 3
* Diminution of nominal capacity:
- at UN up to 6.3V 40% 50%
- at UN between 10 and 25V 30% 40%
- at UN between 40 and 100V 25% 30%
- at UN between 160 and 450V. 20% 30%
* Growth of nominal capacity (vs. the upper limit) XU XU
* Impedance increases with a factor:
- at UN::O; 25V 4 4
- at UN >2SV. 3 3
* Leakage current Up to the initial Up to the initial
limit value limit value
112 3 Reliability of passive electronic parts
The main factors that influence the reliability are oxide layer, impregnation
layer, foil porosity and paper (the last two factors are common to all types of
capacitors). At oxide forming - for example - various hydrate modifications can
appear. The conductivity of the impregnating electrolyte works directly on the loss
factor of impedance, on the chemical combinations and on the stability of
electrical values.
Capacitors with great stability, reduced dimensions and reduced corrosion
sensitivity, having simultaneously reduced dissipation factors and impedances
may be obtained by using electrolytes with great ionic mobility, even in poor
water media. The depositing volume of the electrolyte influences directly the
lifetime. Untight encapsulation leads to a rapid modification of the electrical
parameters: a diminution of the electrolyte quantity or a modification of its
consistency lead to a growth of the loss factor, a diminution of the capacity and a
growth of the impedance.
3.3.2.5
Causes of failures
3.3.3
Tantalum capacitors
3.3.3.1
Introduction
In the last two decades, [3.30][3.36] the tantalum capacitors with solid electrolyte
have conquered large utilisation domains, and - due to their superiority - have
partially replaced the aluminium electrolytic capacitors. In comparison with
aluminium electrolytic capacitors, the indisputable qualities of tantalum capacitors
are a very good reliability, a favourable temperature and frequency behaviour, a
large temperature range, and relatively reduced dimensions. The factors that
influence the reliability are the environmental temperature TE, the operation
voltage UE, the series resistance Rs, and - for plastic encapsulated capacitors - the
air humidity.
Because drift failures appear only at the limits known by each user only, the
failure rate calculation is only optional, since for other applications drift failures
do not arise. Until now, wearout processes for tantalum capacitors have not been
observed. In most cases, a diminution in time of the failure rate has been observed.
3 Reliability of passive electronic parts 113
Ta
~fi
RectI ler
10
Normal polarisation
J
V
6
V
4
V
--
2
V
-20 -10 10 20 30 40 5o U(VJ
Reverse polarisation
I
Fig. 3.13 The residual curve of the tantalum capacitor CTS 13 (10~ 125V)
3.3.3.2
Structure and properties
)
95
90
/~
v7'<~
/
80
1//// ",,,.......'.D'
60
//// ,/,/~.•...•.../
" ....
40
j/// ,,, ,,, ,
....
....'
.'.'
.. '
20 ,-, .'
/l.f/ ,, .....'
,, .'
10 J ,,- .....
,,
5
2
/1 ,, .....'.'
0.02 0.04 0.1 Residual current IR (JiA)
Fig. 3.14 Time dependence of the residual current for a tantalum capacitors group operating at
an environmental temperature of +85°C. A) After zero operation hours; B) after 1000 operation
hours; C) after 4000 operation hours; D) after 8000 operation hours
10,000 12
1,000
100 5
10
0.01 0.1 10 IR (rnA) 6 40 U v (V)
Fig. 3.15 Reliability of tantalum capacitor Fig. 3.16 LiC/Co variation bet-
(the hatched zones are theoretically estimated) ween 25 and 85°C, at nominal
voltages from 6 to 40V
3 Reliability of passive electronic parts 115
That is why in such cases it is proper to select tantalum capacitors with higher
nominal voltage, since the greater dielectric thickness can assure a higher
reliability for the applied stress. When such a capacitor is submitted to a voltage
with sudden variations (switching-on, switching-off, over-voltages, etc.),
momentary overcurrents arise. These great local current densities lead to thermic
modifications of the dielectric crystallographic state.
Fortunately, the dielectric breakdown seldom occurs. On the contrary, for great
stresses in pulse operation (source with interpolated filter in a circuit without
series resistances) it is recommended to utilise fluid electrolyte tantalum
capacitors due to "selfrepair" of dielectric under voltage and due to the thermic
exchange favoured by the electrolyte mobility. Lately some progresses have been
realised in this respect.
The other disadvantage of tantalum capacitors is a consequence of the operation
principle; it is equivalent to a diode with a very great surface, used in the blocking
direction. The residual current curve (Fig. 3.13) is a proof in this respect.
Since, in reality, the data are more complex, the limits of the reverse voltage
depend on structures and technologies.
Some useful recommendations:
• Don't overheat the junction wires.
• Bind rigidly the case on the printed circuit board (to avoid the deterioration
produced by vibrations).
• Reserve sufficient space for component cooling.
• Don't mount reversely a tantalum capacitor.
3.3.3.3
Reliability considerations
dimensions the plot of a straight lines was obtained. If this diagram is not used, the
time variation of the 50% value, represented by the curve A (Fig. 3.15) [3.28] is
followed. Curve B represents the mean residual current variation of a capacitor
group operating at nominal voltage and 55°C. It is very likely that before 100 000
hours, no manifest deterioration will appear. Curve C represents the variation
corresponding to the value of95%.
3.3.3.4
LtC/Co variation with temperature
..............................
0.5 Confidence
se~ I
4
level 60%
.....
-V
1\1 3
0.2 /'L
..../ ...•./ .......................; ....... 2
0.1 100kHz B
~
......................... C
~
0.05 ~ D
o
0.02 6
100 1000 cu
Fig. 3.17 Interdependence of CU Fig. 3.18 Measured values oftantalum
and f..... M = mean failure rate capacitor impedance, at different nomi-
nal voltages (j= 100Hz)
j No matter how surprising this artifice can appear, it is nevertheless valid. The exact
mathematical expression of the distribution law is not important, if a unique straight line is
obtained; it is only a question of clarity and graphic representation. It must still be observed
that the selected scale is satisfactoriy only between 2% and 98% and Fig. 3.14 covers this
range.
3 Reliability of passive electronic parts 117
3.3.3.5
The fai/ure rate and the product CU
3.3.3.6
Loss factor
The typical value of the loss angle tangent tan 6 is about 2% (at normal
environmental temperature and 100Hz) - for small CU values - and increases up
to about 8% for very great capacities. Because the dielectric losses are small, the
value of tan6 depends in essence on the series resistance elements of the capacitor.
The small values of the series resistance are a proof that the manufacturer
manages well the fabrication problems.
3.3.3.7
Impedance at 100Hz
This parameter is more important than tan 6, because it allows a better evaluation
of the constructive qualities oftantalum capacitors (Fig. 3.18).
3.3.3.8
Investigating the stability of 35V tantalum capacitor
The investigation [3.33] includes capacitors of widely different types which all
have undergone the same test plan. For the user of capacitors it is important to be
able to compare the stability and quality of different types with the prices. The
purpose of the investigation is to compare the stability of the 35V tantalum
capacitor from different manufactures. With this in view, a 35V capacitor in a
high (47!-lF) and a low capacity value (1 ~F) was tested for each manufacturer. The
greatest part of the attention has been fixed on the long-term stability when
exposed to humidity conditions and to load lifetest at high temperature.
The following exposures have been used.
Group number 1:
a) Transient test. 10 capacitors are connected in parallel, and exposed to a 10Hz
square-wave voltage with low voltage OV and high voltage 35V. The rate of
increase and decrease in the loading current is 20A/~s and the loading current is
limited by a resistance of 0.10, corresponding to maximum 350A. The capacitors
are exposed to 3 000 cycles (5 minutes).
b) Climatic sequence. This is a block of tests often used for components in the
lEe specifications. The various tests are carried out without measuring the
capacitors between the tests. In this sequence, there is no bias on the components.
The following five steps are carried out in the sequence:
118 3 Reliability of passive electronic parts
+- A (initial)
Fig. 3.19 The main type of graphical display for the obtained results
~~
~ 11%
1~ll
LlC(%) LlC(%)
-I 0 2 3 -I 0 2 3
---'-_--'-----'-_--'--..l.J.. tg8
10-9 10-8 10-7 10-6 10-5 10-9 10-9 10-9 10-9 10-9
Fig. 3.20 Results of the stability investigations of tantalum capacitors from various
manufacturers (L, M, N, 0)
eluded. c) The key point is that all the capacitors have undergone the same test.
Consequently only a few types of capacitors have been exposed to harder
3 Reliability of passive electronic parts 121
conditions than recommended by the manufacturer (mainly in the damp heat test
and the surge test).
N '\.
r--
o r-- ~
0 1 2 3 X Unom 0 2 3 X Unom
3.3.3.9
The failure rate model
In accordance with the standard 005 ITT 10300 the following failure rate is
indicated, estimated for tantalum capacitors in stationary operation and normal
conditions, at an environmental temperature of +40°C, 50% charge and a
confidence level of 60%:
(3.2)
where IIR is a correction factor for the series resistance, indicated in Table 3.6. In
these conditions, the useful lifetime of tantalum overreaches 25 years.
3.3.4
Reliability comparison: aluminium electrolytic capacitors
versus tantalum capacitors
for the same conditions - the tantalum capacitors have much greater values than
the electrolytic capacitors. Moreover, contrary to the situation encountered for
electrolytic capacitors, in the case of operation at reduced starting voltage, an
important reliability growth arises (Fig. 3.23). The consequence of the reduction
with 75% of the starting voltage nominal value is the doubling of the lifetime; at a
voltage representing 50% from the nominal value, the life duration increases with
a magnitude order. Concerning the environmental temperature, Ackmann showed
that, starting from +85°C, at each temperature diminution with 10 ... 12°C, the
lifetime increases with a power of ten. One must mention also that the median
lifetime of the aluminium electrolytic capacitors is - in a first approximation -
reversely proportional with the area of the capacitor.
-
Relative capacity (%)
130
"'
120
110 f
100
-'
\\
90 R
I
80
Table 3.6 Correction factor DR for various values of the series resistance Rs
Rs(!!/V) 11K
3 ~Rs I
2~ Rs<3 1.5
I~ Rs<2 3
0.8~ Rs<1 4
0.6~Rs<0.8 6
0.4~ Rs<0.6 9
0.2:5 Rs <0.4 12
0.1:5 Rs<0.2 IS
3 Reliability of passive electronic parts 123
About aluminium electrolytic capacitors one knows that the critical parameter is
the increase of impedance produced by the growth of the series resistance, as a
consequence of electrolyte loss. On the contrary, for tantalum capacitors, the
apparent diminution is smaller, and - on the other hand - the loss factor varies
very slightly, so that such ageing criteria have no sense.
B
10 000
5000
0.7 UN
1000
15 20 25 30 35 Operating voltage (V)
Fig. 3.23 Increase of the median lifetime with the reduction of operating voltage, at +85°C.
Criterion: A - IR > O.041lA11lf x V; B - IR > O.021lA1J.!F x V
3.3.5
Another reliability comparison: aluminium electrolytic
capaCitors (miniature type) versus tantalum capacitors
be obtained (depending on volume and electrolyte), however, for f> 10kHz the
series resistance requirements will be exceeded, unlike the tantalum capacitors.
For aluminium capacitors, the diminution of capacity at growing frequency
produces high values of the resonance frequency. At low temperatures a still more
clear diminution of the impedance of aluminium electrolytic capacitors was
observed: in worst cases, 5 to 10 times, at 40 o e. However, at 10kHz and -40 o e,
the lowest impedance values of these capacitors are already 3 to 30 times greater
(for 1, and 47~, respectively) than that of tantalum capacitors, and at 100kHz this
ratio can't even be calculated. One must remember that at lower temperature and
high frequencies the aluminium electrolytic capacitors are considered only as
resistors.
Today, for the miniature aluminium electrolytic capacitors with connections
only on one side, the time behaviour is improved. Depending on volume and
electrolyte, the expected lifetime varies between 2000 and 7000 hours, at +85°e
and nominal voltage, taking as reference the criteria given in DIN 46910/124. At
5% failures, a lifetime representing 85% of the foreseen values can be expected.
The typical causes of failure are the increase of tan 0 and the modification of the
impedance and (rarely) of the capacity, or a greater residual current.
Because of the typical ageing behaviour of aluminium fluid electrolytic capa-
citors, the process of parameter degradation with the operation time is rapid in
comparison with that of tantalum capacitors. Particularly at low temperatures
and/or high frequencies, no resemblance with behaviour of the tantalum capacitor
may be observed. The utilisation of miniature aluminium electrolytic capacitors
seems justified due to dimensions and price, if it is not necessary to have a long
lifetime, if there are no high temperatures in the environment, and if the beha-
viour at high frequencies or at lower temperatures can be neglected.
3.3.6
Polyester film I foil capacitors
3.3.6.1
Introduction
3.3.6.2
Life testing
These tests are carried out tmder extreme conditions of load and temperature. The
components are loaded for maximum 7000 hours at 150% of the rated d. c. voltage
and at the maximum allowable temperature of +85°C. The failure rate 'A is
calculated from the number of failures and the available number of component
hours. In accordance with IEC Publication 271, catastrophic failures are short-
circuits and interruptions. The degradation failures - after test duration of 1000
hours - are as follows:
• LlC greater than 2 x the required value;
• tan J greater than 2 x the required value;
• Rinsu' less than 0.1 x the required value.
In analogy with those two types of failures, two failure rates 'A are calculated:
Ac : failure rate where only catastrophic failures are taken into accotmt.
Ac+d: failure rate where both catastrophic and degradation failures are taken into
accotmt.
A is quoted with a confidence level of 60% and - in accordance with MIL-
HDBK-217 - in failures per million hours (x 10-6 h-I). To ensure this confidence
level, the number of failures actually observed is artificially raised by adding a
quantity C60 (based on Poisson distribution). A is then calculated from:
..1,60% = (number offailures + C60) / (number of component hours). (3.3)
The calculated failure rate is an average for all values of the relevant testing
period, and is valid only for the conditions under which the tests were carried out.
These failure rates are further analysed. Table 3.8 gives a survey of some test
results. The tests were all performed at an overload of 50% and a temperature of
+85°C; the maximum test duration was 7000 hours; 7714 capacitors were tested
for a total of 19 242 000 component hours.
Table 3.8 Tested quantities and failures in life testing at +85°C, 1.5 UN, max. 7000h
3.3.6.3
A, as a function of temperature and load
The A, values given in Table 3.8 are valid only for the specific testing conditions:
50% overload at +85°C. The acceleration factors can be derived from MIL-
HDBK-217 where the A, is given as a function of temperature and load (type MIL-
27287). By using the same acceleration factors for the capacitors with PETP film,
we can calculate the A, under derated conditions. The estimated A, for some derated
conditions are given in Table 3.9.
Conditions Ac AC+d
Temp. eq Load (%) (10-<i/h) (10-<i/h)
For climatic tests to a high relative humidity (RR) and for temperature changes,
Table 3.11 gives a survey of all the quantities of capacitors tested under various
conditions. According to Table 3.10, the average rejects - with a confidence level
of 60% - was less than 0.02% (1 failure from 12 428). The only failure occurred
in the accelerated damp heat test (not preceded by the rapid change of temperature
test) on the 100 V capacitor.
Damp heat test without load Q (quantity) 460 855 505 458
40°C, RH 90-95%, 21 days F (failures) 0 0 0 0
3.3.6.4
Reliability conclusions
Table 3.11 Percentages outside requirements after the damp heating test without load: 40°C, RH
90-95%, 21 days
.~
I Quantity-t~ted-- ----_._-------"
460 855
---
505 458
Percentage LIe 0 0.1 0 0.7
I
outside tan () 0 0.1 0 0
I
requirements Rin.\,ul 0 0.1 0 0.2
I
- - - - - - - - - ---
128 3 Reliability of passive electronic parts
LlC (%)
6
--------- ---- ----- ----------- ----------- ------:.0_ 0: ------------- ------------- requirement
o 7
-2 ~--~-+--~-----+-----+-----+------~----~
-4 ~--~-+--~-----+-----+-----+------~----~
Fig. 3.24 Distribution of DC during the damp heating test without load at 40°C,
RH 90-95%, for 21 days. -100 V; x
= 2.1 %; n = 460
·-----250 V; x=
2.6%; n 855 =
400 V; x=
2.8%; n 505 =
----600 V; x=
2.8%; n 458 =
The tan 15 drift level was slightly positive versus the level measured after the
testing at the change of temperature. Table 3.13 gives a survey of the quantities
tested and the percentages outside the requirements. In this test, with a confidence
level of 60%, the total percentage outside requirements is under 2.2%.
Breakdown voltage
The breakdown voltage shows a normal distribution for all four ratings. All the
tested capacitors met the requirement that the breakdown voltage should be at
least equal to twice the rated operating voltage. Table 3.14 gives a survey of the
average breakdown voltage and the average field strength at breakdown.
Insulation resistance
Rinsulis found to be dependent on the operating voltage and no values outside
requirement (requirement: Rinsul :2: 105 MQ).
3 Reliability of passive electronic parts 129
400V
tan ox 10- 4
170 ,--,--,-----,-----,-----,------,
60 requirement
50
...._----- V"a
bc-'//<
40
30
20
/
10
0.1 I 10 50 90 99 99.9 Cumulative
frequency (%)
Fig. 3.25 Distribution of tan 0 after the damp heat test without load at 40°C, RH 90-95%, for 21
days: (a) before the test; x = 36 x 10-4; n == 505, (b) after the test; x = 38 x 10-4 ; n = 505
Table 3.13 Percentages outside requirements after the accelerated damp heating test pre-
ceeded by the rapid temperature change test 55°C, RH 95-100%, 2 days
Voltage rating (V) 100 250 400 630 I
3.3.7
Wound capacitors
This type of capacitors (with or without case) contains plastic foil capacitors
where an aluminium foil serves as coating, and the polystyrene as dielectric. The
terminal connections are joined with the coating (thickness: 10 to 401-lIll), so that
the capacitors can be used at high frequencies. The classical paper types with
various impregnation (with wax, oil, chloride naphthalene, epoxy resin) undergo
strong ageing phenomena. On the contrary, no degradations by the ageing of
parameters for the plastic foil capacitors were observed. The better behaviour in
time of these capacitors is one of the reasons which has contributed to the disap-
pearance of the paper capacitors from the market. They have remarkable
characteristics: small losses at high frequencies, high capacity constancy, insen-
sitivity at overcharge and mechanical over-stresses, well defined temperature
coefficient, relative insensitivity at humidity and temperature. For the unencap-
130 3 Reliability of passive electronic parts
!iC/Co (to)
2
-- -- - -
o ~
~~~ //
.-.
/' /
-\
~
-2
o 200 400 600 800 1000 t (hours)
Fig. 3.26 Capacity variation of the lOOnF polystyrene with plastic cover capacitor
Failure types:
• Bad contacts, particularly between foil and terminals; after a long operation
time, they can be recognised by a strong oxidation;
• Bad soldering;
• Dielectric ionisation, as a result of high alternating voltages;
• Mechanical instability (bad adhesion of the aluminium foil, weak winding up of
the foil, etc.).
In the case of styroflex capacitors, the main failure modes are due to the changes
of capacity.
3 Reliability of passive electronic parts 131
3.3.8
Reliability and screening methods of multilayer
ceramic capacitors [3.37] [3.38]
The multilayer ceramic capacitors are more and more utilised in hybrid circuits for
telecommunications, but always they have reliability problems. This explains why
many users are interested to have a non-destructive screening method allowing to
detect the early failures of these capacitors, particularly those occurring at low
voltage.
In 1983, Standard Telecommunications Laboratories (STL) presented a
screening method using a ionised solvent (methanol) which produces a temporary
increase of the electric conductivity of the zone which fails at low voltage, zone of
the capacitor structure placed between the two electrodes of the ceramic capacitor.
The silver particles migrating between electrodes (or between the porous
terminals) in the direction of constructive defects, cracks or voids, are identified as
being the causes of failures at low voltages. The humidity - considered as a
necessary condition to produce a failure - penetrates into the capacitor through the
electrodes and terminals (porous - in a certain measure). Strange enough, the
delamination (that is to say the separation of the electrode from the substrate) is
not a failure cause [3.38].
The method proceeds as follows: for example, for a 100V capacitor, a voltage
of only 10V is applied and, after 10 seconds, the crossing current 1/ is measured.
The capacitors are then heated to +85°C and afterwards they are introduced in
methanol for 1.. .15 hours. The greater the immersion duration, the better the
methanol penetrates inside the failure zones. Then, for a time;::: 60 seconds, the
capacitor is dried; once more a voltage of 40V is applied and, after lOs, the
loading current 12 is measured. If the tested component presents an important
failure, normally 12 will be greater than 1O-8A and the ratio 12/11 will have some
magnitude orders. For identical dimensions, the capacitors having the smallest
current values will lead to a smallest failure rates.
The semi-automatic testing installations of STL can test 2000 components per
hour. The tests results indicate that all the capacitors that failed at the screening
process will fail at the life tests (for structures: +85°C, RH 97% over 2600 hours,
with an applied voltage of 4.5V; for encapsulated capacitors: +85°C, RH 85%,
1200 hours, l.5V).
According to the statistical data published by STL, only 1% of all tested
capacitors pass the screening tests but fail at the lifetime testing. Many failures are
attributed to short-circuits due to methanol penetration. A number of failures
originate from defects too small to be detected. Strange enough, there are
components considered inadequate at the screening, but found good at the end of
lifetime test.
For an effective detection it is recommended to test the capacitors at the
structure level, and not at the encapsulated capacitor one.
The low voltage failure depends on the ion migration through or along a
physical defect (crack, void, porous region) that brings in contact two opposed
electrodes. To produce this effect, some environmental conditions must occur
[3.38].
132 3 Reliability of passive electronic parts
In the presence of a defect, the silver particles can migrate from a final point to
an electrode, even if the internal electrode is not made of silver. In addition, the
palladium, platinum or gold complexes are transported against the electrostatic
field. This explains why the movements can be greater for smaller fields. A high
humidity level is unfavourable. For encapsulated capacitors, a bad quality solder
joint allows the humidity penetration, so that the ions can migrate between the
final points. This explains why an encapsulated capacitor can fail, even if its
structure is faultless.
One of the first reaction to the enunciation of the STL screening method
concerned the humidity role: is it a sine qua non condition for low voltage
failures, or is it an accelerator agent?
According to the STL statistics, from more than 800 structures that passed
through the bum-in (which precedes the screening) none failed in an environment
with reduced humidity. It seems that the researches performed at STL have not
emphasised the fundamental failure mechanism. It is known that it is necessary to
have a defect, a fault, in order to lead to failure. But, in our case, the essential
element is a crack or a pinhole?
This confusion results from the fact that certain low voltage defects are self-
cicatrisated. Even if a microscopic analysis is performed, no physical defect can
be emphasised. This leads to the conclusion that the STL researches have isolated
only the defects which appear in certain operation conditions and which lead to a
low voltage defect. Further researches will allow to investigate physically and
electrically a greater and wider number of defects. Correlating the manufacture
process with the defects, it should be possible to establish the necessary
improvements of the technological process with the aim of removing a certain
defect type.
3.4
Zinc oxide (ZnO) varistors [3.39]. .. [3.45]
For the suppression of contact sparks, in parallel with the coils or with the
contacts, RC-combinations, diodes, and selenium overvoltage protections are
mounted. Recently, for the same purpose, the varistors are more and more utilised.
The zinc oxide varistors are resistors dependent on voltage with symmetrical U-I
characteristics, having very similar properties with anti series connected Z diodes,
but with a much greater loading capacity and therefore serving for the protection
of circuits (with a response time < 25ns). At the occurrence of high-energy voltage
peaks, the varistor modifies its previous high resistance value and goes into a
conductive state, until the voltage peak diminishes to a non-dangerous value. The
pulse voltage energy is absorbed by the varistors, and so the voltage sensitive
components are protected against destruction.
One must notice that the Trans-Zorb diode can't be replaced with varistors,
because their purpose and function are identical. For some utilisations, the varistor
can protect the circuit as well as the diodes, but each of the two products has its
own specifications. The specific and concrete application decides which of the two
products will be selected.
3 Reliability of passive electronic parts 133
-
100 I
I
:a
10
I ~
I
I
/'
/'
I
I
I
,1/
I
I
I
I
I
0.1
0.01
0.001
o 25 50 75 100 125 Uc limitation voltage (V)
Fig. 3.27 Comparison between the limitation voltages for different peak pulse currentsa) 39 V
metal-oxide varistor; b) 39V Trans-Zorb
-i"
b
40
35
30
10 20 30 40 50 60 70 80 Ipp (A)
Fig. 3.28 The mean decrease of breakdown voltage BVafter the pulse tests (measured after 10
pulses, each having the duration of IllS): a) 39V metal-oxide varistor; b) 39V Trans-Zorb
J,
U
f-- mn _
/
'f
----,
.-'
,,-, b
Fig. 3.29 Oscilloscope pictures: a) 39V Trans-Zorb; b) 39V metal-oxide varistor. Pulse test
conditions: 50AlllS with a rise time of 4kV/IlS. Vertical scale: 50V/div.; horizontal scale: 2ns/div
134 3 Reliability of passive electronic parts
3.4.1
Pulse behaviour of ZnO varistors
The characteristics of ZnO varistors change if they are exposed to a pulse current
load. We will try to explain how these effects depend on the amplitude, on the
duration and on the number of pulses. Especially, a particular influence on the
leakage current and on the lifetime of the varistor was observed.
ZnO characteristics. To specify exactly a ZnO varistor, the characteristic
values have been defined on the international level [3.45]. In Fig. 3.30 the
voltage-current characteristics of a ZnO varistor and the principal measurement
points are shown. The maximum current refers to a normalised stress of 8/2011s.
At repeated stresses (or other pulse forms), a maximum current under the form of
a derating curve is supposed. A measure point situated in the middle of the
measured domain serves to obtain the "clamp's voltage" or ''terminal voltage", UI .
°
These current values depend on the diameter of the varistor and are defined at the
series Rl (ISO 3).
The advantage of ZnO varistors is given by the important slope "a". In
practice, this nonlinearity coefficient is not perceived as a differential slope, but as
a measure for a mean slope between two currents that should be defined. As
principal value of the U-I characteristic, the "varistor voltage", in other words the
measured voltage for a current of 1mA made its way. It can be simply measured,
has a weak temperature dependence, and therefore can be utilised as a criterion for
climatic and environmental tests.
500
200
100
-
1/
U(VJ ~
50
V-
20
10
10-5 10-2 104 I(A)
Fig. 3.30 Some typical electrical values of a varistor on the U-J curve
3 Reliability of passive electronic parts 135
fissures. The cause of failure is, for the first case, a too important pulse current,
and, for the second case, a too greater energy absorption.
Corresponding to these results, a "normalised" form of pulses has been
selected. As short pulse of great current the exponential shock of 8120j.lS, and as
great energy pulse, the rectangular pulse of 2ms, respectively the exponential
shock of 1011 00j.lS, have been standardised.
Degradation to pulse, without service voltages. Responsible for the
characteristic modifications due to the pulses are the pulse amplitude, the pulse
duration, the number of pulses and the pulse rate.
It is not allowed for the mean pulse load to be greater than the duration load. If
this limit value is exceeded, the varistor lifetime will be shorter due to this
supplementary stress factor. The following correlation can be noted thanks to the
study of the maximum shock current (having a certain form, depending on the
pulse number at which the modification of the varistor voltage Uv should remain
constant):
n(,d.Uv = cons!.) = (III,)" (3.4)
where v = nonlinearity coefficient of the pulse load; I = current amplitude at which
- after a pulse - the varistor voltage Uv is modified with a constant percentage (for
example: 10%); In = current amplitude at which - after n pulses - the varistor
voltage is modified with a constant percentage (for example: 10%).
The nonlinearity coefficient v is a measure of the pulse amplitude dependence
on the number of pulses, but it does not represent a measure for the maximum
number of pulses until the varistor destruction.
Pulse polarisation. The above mentioned differences between the positive and
negative characteristic after a pulse load, can be probably explained through ions
migration in the intergranulated zone [3.49][3.50]. Particularly the interstitial zinc,
which migrates under the influence of electric field, contributes to this effect. The
polarisation degree depends on the density of the current pulse (Fig. 3.31).
V a) LlUv b)
500 10
400 5
0
without load
LlUv _
300 -5
10-5 10-4 10'2 10° A n (i = const)
Fig. 3.31 The varistor polarisation. a) The U-I characteristic. b) Modification of the varistor
voltage (+ in the pulse direction; - in the opposite pulse direction)
Degradation to pulse, with service voltage. For the "operation test", the
ZnO varistors - connected to the service voltage - are submitted to pulses. Until
now it was not possible to indicate precisely if a degradation to pulse is
3 Reliability of passive electronic parts 137
I (IlA)
100
80
60
40
20
0
0 2 3 4 5 6 t (h)
Fig. 3.32 Evolution of the leakage current during the operating test time: 1 - in the opposite
direction of the pulse; 2 - in the direction of the pulse; 3 - comparative curve, without pulses
3.4.2
Reliability results
A lifetime test for a batch of260 varistors of type VP 130AI0, during 200 hours at
lOOO°C, and for an operation voltage of 184V/60Hz [3.52] was performed. Four
catastrophic failures and 12 derating failures 6 arise. This corresponds to a mean
lifetime of 59 000 hours, and of 28 000 hours respectively, at a confidence level of
95%.
In accordance with the data of the firm General Electric, the operation duration
to the same level of confidence is greater than 40 x 106 hours.
3.5
Connectors
6 We speak about derating failure if the voltage modification (measured at lmA current) is
greater than 10%.
3 Reliability of passive electronic parts 139
The evaluation of these functions depends on the given utilisation and on the fonn
of the conductor. Although it seems that some functions are "important" and some
other are "unimportant" functions, the general irreproachable function is
nevertheless only guaranteed if all functions are performed in the correct
proportion. That is why the fonn of the contact elements, the used materials, and
the galvanic coating are particularly important.
In the last 40 years, the connecting technique was efficiently influenced by
three important innovations:
• 1958: the printed circuits;
• 1975: the optical conductors for the digital transmission of the infonnations;
• 1986: the realisation of electronic connections without soldering.
20
5 - Consumer electronics
nn
r-
10 6 - Domestic industry
o
23456
The conductors are used in each of the following domains: transmission of data
or infonnations with the aid of electrical or optical signals, electronic regulation
and measurement devices used to control the industrial processes, in
telecommunications and data technique, in the office activities, transports,
domestic, and consumer electronics. In Fig. 3.33 the distribution of connectors on
the users' market is shown.
3.5.1
SpeCifications profile
1.5
1.0
0.5
0.2
o 250 500 750 1000 1250 1500 Lifetime (h)
Fig. 3.34 Time behaviour of CuNi 9Sn2 connectors
3.5.2
Elements of a test plan
From the user's point of view, a test plan for the qualification of a conductor must
contain the following groups of characteristics: a) product characteristics; b)
operation characteristics; c) processing characteristics; d) operation behaviour
characteristics.
Each of these groups can be subdivided into well defined elements; in this
sense, a distinction between estimation characteristics and test characteristics can
be made.
a) Product characteristics
In this group are gathered the characteristics delivered by the manufacturer, on
the basis of selected materials and technology applied to assure a correct
behaviour of the product. Among these should be mentioned the used materials,
the processing (optical examination of the package and of the contact parts), the
surfaces (material junctions, contact materials porosity, impurities, contacts
smearing), and the layer construction (in contact and connection areas).
b) Operation characteristics
This group contains important operation characteristics: mechanical operation
characteristics (dimensions and tolerances, unchangeability, crossing possibili-
ties, the total connecting and traction forces, contact force, static axial load, etc.),
3 Reliability of passive electronic parts 141
References
3.1 Doyle, E. A. Jr. (1981): How parts fail. IEEE Spectrum, October, pp. 36-43
3.2 MIL-HDBK -175, Microelectronic Device Data Handbook, U. S. Department of Defense,
Washington, D.C.
3.3 Hnatek, E.R. (1975): The Economics of In-House Versus Outside Testing. Electronic
Packaging and Production, August, p. T29
3.4 Johnson, G.M.: Evaluation of Microcircuits Accelerated Test Techniques, RADC-TR-76-
218, Rome Air Development Centre, Griffins Air Force Base, New York, 13441
3.5 Roberts, J. A.; Chabot, C. B. (1980): Application Engineering. In: Arsenault, J. E. and
Roberts, J. A. (eds.) Reliability and Maintainability of Electronic Systems. Computer
Science Press
3.6 Mader, R.; Meyer, K.-D.(l974): Zuverlassigkeit diskreter passiver Bauelemente. In:
Schneider, H. G. (ed.) Zuverlassigkeit elektronischer Bauelemente, Leipzig; VEB
Deutscher Verlag fur Grundstoffindustrie, pp. 400-401
3.7 Nagel, O. (1970): Stabilitat von Schichtwiderstanden. Internationale Elektronische
RundschauH. 12,pp. 315-318
3.8 Hofbauer, C. M.: Die Feuchtigkeits- und Klimabestandigkeit von Schichtwiderstiinden.
Radio Mentor, vol. 27, no. 5, pp. 400-401
3.9 Tretter, J. (1974): Zum Driftverhalten von Baue!ementen und Geraten. Qualitat und
Zuverlassigkeit vol. 19, no. 4, pp. 73-79
3.10 Bajenescu, T. I. (1978): La fiabilite des resistances. La revue polytechnique no. 9, pp. 993-
997; Bajenescu, T. I. (1981): Zuveriassigkeit elektronischer Komponenten. Tei! 1: Zu-
verlassigkeitskenngrossen und Ausfallmechanismen. Feinwerktechnik & Messtechnik no.
5,pp.232-239
3.11 Stanley, K. W. (1971): Reliability and Stability of Carbon Film Resistors. Microelectronics
and Reliability, no. 10, pp. 359-374
3.12 Russel, R. F. (1971): Test on Thick Film Resistors. Microelectronics and Reliability, no. 10,
p.1I5
3.13 MIL-STD-199, Resistor, Selection and Use of, Supplemental Information, U. S. Department
of Defense, Washington, D. C. 7
3.14 MIL-STD-198, Capacitor, Selection and Use of, Supplemental Information. U. S.
Department of Defense, Washington, D. C. 7
3.15 Kormany, T.; Barna, H. (1982): Wege zur Beurteilung der nattiriichen Lebensdauer von
Elektrolytkondensatoren. Nachrichtentechnik vol. 12, no. 10, pp. 391-392
142 3 Reliability of passive electronic parts
3.72 Bavuso, S. 1.; Martensen, A. L. (1988): A Fourth Generation Reliability Predictor. Proc.
Ann. ReI. & Maint. Symp., pp. 11-16
3.73 Bazovsky, I. (1961): Reliability Theory and Practice. Prentice-Hall Inc., Englewood Cliffs,
New Jersey
3.74 Bazowsky, 1.; Benz, G. (1988): Interval Reliability of Spare Part Stocks. Qual. Reliab.
Engng. Int., no. 4, pp. 235-246
3.75 DUrr, W, Meyer, H. (1981): Wahrscheinlichkeitsrechnung und schliessende Statistik.
Hanser-Verlag, Miinchen / Wien
3.76 Ellis, B. N. (1986): Cleaning and Contamination of Electronics Components and
Assemblies. Electrochem. Publ., Ayr (Scotland)
3.77 Moore, E. V., Shanon, C. E. (1956): Reliable Circuits Using less Reliable Relays. J. W.
Franklin Institute, pp. 191-208; 281-297
3.78 Mi.inchow, E., Erzberger, W. (1994): Wie zuverlassig ist zuverlassig? MegaLink
3.79 Munikoti, R., Dhar, P. (1988): Low-Voltage Failures in Multilayer Ceramic Capacitors: A
New Accelerated Stresss Screen. IEEE Trans. on Components, Hybrids, and Manufacturing
Technology, vol. 11, no. 4, pp. 346-350
3.80 Reinschke, K. (1973): Zuverlassigkeit von Systemen (Band I). VEB Verlag Technik, Berlin
3.81 Reinschke, K., Usakov, 1. (1987): Zuverlassigkeitsstrukturen. Verlag Technik, Berlin
3.82 Reiszmann, E. (1972): Messung und Bewertung mechanischer Umweltbeeinfli.isse auf
Gerate. Fernmeldetechnik, vol. 12, no. 3, p. 117
4 Reliability of diodes
4.1
Introduction
The diodes and the rectifiers are bipolar components with non-linear constants
which have a different behaviour, depending on the polarisation of the applied
voltage [4.1]. .. [4.13]. The silicon is used almost exclusively as semiconductor
material. The main constructive forms are planar diodes and MESA diodes.
Among the important characteristics that can't be exceeded, the reverse voltage,
the forward current and the maximum junction temperature (including data
concerning the thermal resistance at high temperature) may be mentioned.
For rectifier diodes four categories may be mentioned:
• Rectifier diodes for general purposes
• Avalanche rectifier diodes
• Fast rectifier diodes (with small reverse recovery time)
• Avalanche rectifier diodes with controlled reverse current decrease.
The rectifier diodes for general purposes (namely without avalanche breakdown)
are improper for high transient reverse voltages. The small irregularities of the
barrier layer may lead to a local breakdown, which may produce a local
overheating (hot spot) leading to chip deterioration. Despite of the important costs
for reliability assurance, new defects arise all the time, leading to perturbations in
the operation of the equipment. Since the majority of stress tests don't supply
information about the long time behaviour (failure rate or other comparable
reliability indicators can not be calculated), the only possible manner to obtain
reliability data is the trial operation under the most appropriate conditions for the
foreseen utilisation. With this end in view, the user has only the information that
in his equipment, in similar loading conditions and coming from the same
manufacture moment, the component will have results similar to those obtained
from tests. To complete the information, one may say that the trustworthiest data
can be obtained only by the equipment operation itself.
The completion of laboratory reliability data with failure rate data obtained by
testing at the user, in various and arbitrary established conditions (failure rates
being in a more or less marked relationship with the operational failure rates) must
be only a momentary solution. Consequently it is necessary to concentrate the
activity of a reliability laboratory on short duration tests, and not to obtain failure
rates from long duration tests (up to two years).
4.2
Semiconductor diodes
4.2.1
Structure and properties
The silicon wafer is subject to a series of diffusion processes with the aim of per-
forming the doping of the semiconductor barrier layer. Under practical operation
Manufacturer: A B c
Fig. 4.1 Comparison between failure rates of silicon rectifier diodes, for different stresses:
d. c. loading on barrier layer"""" operation under capacitive load =
conditions, high reverse voltages are applied on this doped layer. During opera-
tion, the diode should not exceed a certain well-defined temperature of this barrier
layer.
The possible influences, which lead to parameter modifications, are due to
electrochemical phenomena, caused - for example - by the residues of the neces-
sary cleaning processes that take place, during the manufacture, on the crystal
surface. Since at the edge of the structure, the barrier layer reaches the surface,
these phenomena appear in this region. The electrochemical reactions are acce-
lerated by the potential differences and by the increase of temperature [4.8].
4.2.2
Reliability tests and results
another test series - at capacitive load [4.8]. Important differences between the
failure levels have been found (Fig. 4.1). At capacitive load a higher failure per-
centage than at d. c. stresses has been obtained.
The Fig. 4.2 gives a survey of failure causes based on failure analysis for
silicon rectifier diodes. Various silicon rectifier diodes 1N4005 have been exposed
to humidity tests during 74 hours at 85°C and RH 85%; after 2 hours test, IR was
Breaking through
b 0.5 I
I
i
Burned silicon crystal
~
Eccentric position
I==:::J 7
Other causes
14
Interruption of I
1 .2
internal soldering
Fig. 4.2 The failure causes (in %) of the silicon rectifier diodes
measured (failure limit for IR is 5Jl.A). All samples were good, with one exception:
IR increased from lOnA to IOJl.A. This test doesn't correspond to a real stress, but
allows an useful comparison between technological variants. A supplementary
protection has been foreseen for good samples. By storing up together all con-
necting conductors, an increased humidity leakage path and a supplementary me-
chanic anchoring of conductors can be obtained.
In Fig. 4.3 the effect of temperature on various diode types, in accordance with
the physical failure model indicated in MIL-HDBK-217 is shown. The voltage has
a considerable accelerating effect on the failure rate [4.7], more accentuated at low
temperatures than at high temperatures (accelerating factor: 7 at 75°C, and 2 at
150°C).
148 4 Reliability of diodes
4.2.3
Failure mechanisms
ai. Totalfailure
• Poor soldering ~ interruption.
• Insufficient mechanical pressure ~ intermittent failures / interruption.
• Inadequate expansion coefficient ~ interruption.
• Scratches on the structure surface ~ increase of thermal resistance and of
breakdown voltage.
a2. Degradation failures
• Mechanical degradation (contact points or connection wires partially faulty ~
local overheating ~ hot spot ~ total failure).
hi. Totalfailures
• Too high voltages (or currents) ~ interruption (short-circuit).
• High temperature (for a relative short time) ~ degradation of electrical para-
meters.
• Voltage peaks ~ breakdown of pn barrier layer ~ short-circuit.
• Quick change of the polarity: reversal from forward direction to blocking di-
rection ~ breakdown ofbarrierJayer ~ short-circuit.
Fig. 4.3 Failure rate versus nonnalised temperature of the barrier layer, according to MIL-
HDBK-217; 1- silicon diode; 2 - gennanium diode; 3 - Z-diode
4 Reliability of diodes 149
4.2.4
New technologies
To fulfil the latest more and more severe specifications concerning these compo-
nents, some manufacturers have elaborated new diode fabrication technologies.
Thus, for example, the company Texas Instruments achieved a metallic contact
between the end of conductors and the semiconductor crystal, by adding a contact
material at the anode and at the cathode of the crystal. The high reliability is guar-
anteed by a special glass passivation technique.
1 3
Fig 4.4 "Superrectifier" technology with "glass" of plastic materials (General Instrument
Corp.). 1 - brazed silicon structure; 2 - sinterglass passivation; 3 - non inflammable plastic case
4.2.5
Correlation between technology and reliability:
the case of the signal diodes 1N4148 [4.9]
,
/
Fig. 4.5 The "double plug" technology. 1 - glass tube; 2 - structure; 3 - plug
In Fig. 4.5 the "double plug" technology is presented: a chip pressed between two
plugs foreseen with connections and sealed into a glass tube. One must distinguish
between "standard" technology in two variants (pressed contacts, and welded
contacts), and the technology ''without cavity".
Standard technology. A planar chip (Fig. 4.6) has the form of a parallelepiped
whose typical dimensions are 400llm x 500llffi x 200Ilffi. The two plugs are made
4 Reliability of diodes 151
1
-6
\
7
Fig. 4.6 Planar structure in the Fig. 4.7 Standard technology with the two plugs
standard technology. 1 - silver (FeNi-alloy). 1 - connection; 2 - structure; 3 - her-
excrescence assuring the anode metically closed glass body; 4 - plug; 5 - silver
contact; 2 - Si02 passivation as- outgrowth assuring the anode contact; 6 - cavity
suring the protection of the pn having about 200J.!m width; 7 - welding.
junction, at the surface; 3 -me-
tallisation of the cathode con-
tact.
Fig. 4.8 Technology "without Fig 4.9 Technology "without cavity", with the
cavity", with mesa structure. 1 - two silvered tungsten plugs. 1 - structure; 2 - wel-
metallisation of the anode con- ded contact; 3 - hermetically sealed glass body.
tact; 2 - metallisation of the ca-
thode contact; 3 - Si0 2 passiva-
tion assuring the protection of
jlllction on the lateral parts of
the structure.
152 4 Reliability of diodes
Technology "without cavity n. The "mesa" disk (Fig. 4.8) has the form of a par-
allelepiped with the typical dimension 400fll11 x 500fll11 x 100fll11. The two plugs
are made of tungsten, covered by silver. The two connections are made of FeNi
(Dumet). The glass tube is made of a very resistant non-alkaline compound. Con-
cerning the assembling (Fig. 4.9), the ensemble connection-plug is made by
welding at 680°C, the assembling plug-glass tube is assured by sealing at 700°C,
and the assembling chip-plug is realised by welding at 850°C (the welding is
made with eutectic).
The three operations are made in an oven by a single passing through. The
specific feature of this assembling technique (compared with the standard
technology) is the absence of the inner cavity at the chip level (Fig. 4.8 and 4.9).
In fact, a micro cavity can exist,but the short-circuit risk is cancelled by the
glassivation 1 of the chip edges.
Fig. 4.10 Intermediate technology Fig. 4.11 Intermediate technology: the glass body
between "standard" and ''without is in contact with the glassivation.
cavity": this is a planar structure,
but of bigger dimensions. 1 - (pas-
sivate) oxide; 2 - glassivation;
3 - cathode contact (metallisation).
I Glassivation: vitrous layer which covers the semiconductor chip, with the exception of
contact areas ("bonding pads"), intended to completely protect it against contaminants
aggression (particles, humidity, etc.).
Passivation: insulating layer (Si02 , Si 3N4 ), deposed on the surface of a semiconductor
pellet for the protection of junctions against the contaminants and to isolate the con-
ductive parts between them.
The two proceedings can be used together or separately on a sole chip; in contrast to the
glassivation, the passivation can be deposited even on a non-plane area.
4 Reliability of diodes 153
4.2.6
Intermittent short-circuits
Conductive particles with typical dimensions of 10 .. .50fllll may separate from the
metallisation of internal part of the plugs or from the chip (burr after cutting up). It
has been proved that the particles can originate from an absence of cleanliness
during the assembling operations. Then, the particles can move in the cavity of
200!J,m width (Fig. 4.12) and producing intermittent short-circuits between plugs
or between chip and plugs (Fig. 4.12) when the diode is subjected to vibrations,
shocks, and accelerations. This type of defect was not identified, because: 1) the
defect has a hidden character, appearing randomly and only if the diode is operat-
ing under vibrations, shocks and accelerations; 2) it is difficult to correlate the
effect the material defect with the diode defect; in addition, the fragility and the
small dimensions of these diodes make difficult its dismantling, especially if it is
encapsulated in Dobekan.
The detection and prevention methods are: (i) internal visual inspection - if the
package is transparent - and (ii) electrical testing, if the diode is simultaneously
subject to vibrations ("pind test") or shocks ("tap test").
Depending on specifications and on manufacturers, these tests are foreseen (or
not) by the quality assurance manual of the manufacturer. So, for example, in the
case of "tap tests", the target is to detect - and to destroy (by burning with a
voltaic arc) - the casual particles. To do this, a machine places the diodes (one by
one, automatically, and during several seconds) facing a multi contact measuring
head which exposes each diode, vertically mounted (as this position seems to be
particularly favourable) simultaneously to the vibrations (10 cycles, reversing the
mounting sense) and to a reverse voltage of llOV, higher than the break-down
voltage (VBR) specified in the data sheet of the models IN418 and IN4448 (whose
breakdown voltage is lOOV).
Using this test, manufacturers successfully eliminated about 5% of the tested
diodes. One may also notice that avariant of this test does exist: measuring VF or
IR during the diode exposition to microshocks.
Contact tears. The assembling of IN4148 and IN4448 diodes (not of the
1N4148-1 diode) is performed by pressure. In fact, this pressure is assured by the
difference between the dilatation coefficients of the materials (silicon, copper, and
glass), after the sealing at a temperature of 600°C. For a certain manufacturer, the
typical dimension of this construction is OAfllll (Fig. 4.12).
Therefore, the smallest deviation from the manufacturing proceeding can
produce inadequate contacts, which may be identified - particularly for an
environmental temperature greater than +70°C - by the increase of VF beyond
tolerances.
154 4 Reliability of diodes
4.3
Z diodes
4.3.1
Characteristics
4.3.2
Reliability investigations and results
Uz (V)
7
i"oo...
6
---
5
o 2 3 4 time (](JI hours)
Fig. 4.14 Behaviour of different Z diodes while ageing after storage at +70°C. Beyond 20 000
hours, the 6.3V Z diode does not operate reliable anymore
Table 4.1 Results of a comparative reliability study on 400mW Z diodes, allied and diffused,
respecflVelV
I
(iJU/Uo)lrl A) B)
2
- 10= IrnA 0
-2
-4 "\ ~
-6
-10 = 20rnA
4
2 /1""'"
~
/
-
0 f"""ooo.. ~
-2
-4
-6
103 104 2.104 3.104 103 104 2.10' 3.104 t (h)
Fig. 4.15 Behaviour at ageing of the breakdown voltages ofZ diodes measured at -ID = lmA and
20mA: A) Tj = 135°C; B) Tj = 90°C
4 Reliability of diodes 157
well correlated with the variation of Z voltage, after an operation time of 1000
hours.
Until now no quantitative substantiation on the correlation of the low frequency
noise and the estimated lifetime of the Z diodes appears in the technical literature.
As one already knows, the tests take much time and don't lead always to sure
conclusions. If a long test time is not available, it is recommended to undertake
short time investigations, under appropriate operation conditions, and to compare
the results obtained in this way with the existing data concerning the same type or
a related component type. The investigations that take into account simultaneously
one or more parameters (Fig. 4.13), but also - if possible - the comparative 1000
hours tests (Fig. 4.14 and 4.15) are conclusive.
The results of a comparative reliability study for allied or diffused Z diodes,
operating at 400mW, are presented in Table 4.1 (operating time: 1000 hours at
total charge; ambient temperature 25°C). Excepting the manufacturer C, the
failure rate varies between 10-6Ih and 7.10-6Ih, for a confidence level of 60%.
For a series of tests performed for four Z diodes manufacturers - and for
samples of minimum 100 items - the most significant failure mode was the
increase oflosses for two diodes manufacturers (Table 4.2).
Table 4.2 Compared reliability of Z diodes (% defects, after 168 hours operation, at P max)
Manufacturer Alloy,400mW
(3 ..SV)
Diffused,400mW
(6 ...33V)
Ii,Diffused,
(6 ...33V)
I
1\"'-1
-----ji'--- -----~
A 0 4.3
B 3.3*) 37.5*) I 1~8*) I
C 0 3.4*)
I 7,1*)_
D 0 0 O__
____ L .. _ _
--
*) Drift of IR
As for the encapsulation, for the same four manufacturers the results presented
in Table 4.2 were obtained, with the following specifications:
a) For the 400mW diodes, DO-35 and DO-7 packages, the manufacturer C
offered higher voltages for glass DO-35, but with greater losses. The
manufacturers A and B have used the DO-7 package, which is exposed to internal
contamination during the assembling, and therefore the life test results are poorer.
b) For the I W diodes, manufacturers A and B supply the device in an epoxy
package; the diodes are also exposed to failures in high humidity environment.
The utilisation of welded contacts leads to a drift of the losses. The manufacturers
C and D use DO-14 glass package due to the small dimensions of the die. This
explains why for A an B leakage shifts were measured. Many manufacturers
experienced a higher level of breakage and intermittence with plastic packages as
a result of automatic insertion. This is not the case with glass package.
158 4 Reliability of diodes
Table 4.3 Mean temperature coefficient (in 'YO/C) of the Z diodes, between +25°C and +125°C
The prediction system for reliability (Tables 4.4 and 4.5) plays an important
role in the improvement of the product reliability. On the one hand it is a tool for
estimating the reliability level of the product during the design and development
phases, which allows to optimise the selection of the components and circuits, of
the system structure and of the organisation of the logistic support. On the other
hand, it gives the target-values, which can be compared with the measurements
performed in operation.
4.3.3
Failure mechanisms
2 Mechanisms inherent to the semiconductor die itself are termed intrinsic. Such me-
chanisms include crystal defects, dislocations, and processing defects. Processing defects,
for example, may take the form of flaws in the thermally grown oxide or the chemically
deposited epitaxial layer. Intrinsic failure mechanisms tend to be the result of the wafer
fabrication.
160 4 Reliability of diodes
The results confirm that no generalised acceleration factors can be used for Z
diodes. There are so many variables and so dissimilar that a factor must be deter-
mined for each specific type and technology. Our choice to set up a component
reliability data bank was a successful strategic choice, which permits the integra-
tion with the reliability prediction system as regards the correlation between field
data and prediction models, and the evaluation of component field data.
4.3.3.1
Failure mechanisms of Z diodes
The main parameter of the Z diodes is the diode capacitance at the limits of the
operating voltage range. It is also usually to measure the leakage current and the
series resistance. The temperature coefficient of the capacitance is usually
determined as a type test.
IR drift occurs after extended operating life 3 • Usually it is caused by
contamination near the semiconductor junction, or under, into or on top of
passivation layers.
Shorts result usually from thermal runaway due to excessive heat in the
semiconductor junction area and are caused by power dissipation defects.
IR drift or shorts. After the humid environment test, they are caused by a lack of
hermeticity allowing moisture to reach the semiconductor chip.
Zz drifts are usually caused by changes in chip ohmic contacts.
The glass package has a good hermeticity, and a good metallisation system
which guarantees ohmic contact integrity on all Z diodes.
If the surface conductivity is increased by the impurities introduced by ionic
contaminants, a gradual increase of the leakage current is observed when the
devices are in the "off' state [4.14]. For diodes, a key parameter is the energy gap
between the valence and conduction bands of the semiconductor material. It has
been found [4.15] that mechanical stresses in silicon can reduce this energy gap,
and as a consequence it is possible to reduce the "on" voltage of the devices.
Thermal stress may cause degradation of device characteristics by impairing the
junctions. Migration of the dopant impurities can lead to short-circuits and
subsequently burn-out of the pn junction. Contact migration (another form of
aluminium migration; however, the physical process governing the movement of
aluminium atoms is different from that of electromigration) is a particular problem
of Schottky diodes. All diodes and thyristors must be designed within the
specifications required to prevent an electromigration of the metallisation.
The electrical overstress (EOS) is the major mechanism affecting these devices.
They are sensitive to static potentials, and can be destroyed by a permanent break-
down across the reverse-biased junction.
3 The use of silicon nitride onto planar junctions (as a barrier against ionic contamination)
virtually eliminates this type of drift.
4 Reliability of diodes 161
4.3.3.2
Design for reliability
4.3.3.3
Some general remarks
Most generalised data relate the failure rate A to the (ambient or junction)
normalised temperature of the device. It is important to note that temperature is
the only accelerated factor taken into account. Sometimes ambient and dissipation
effects are unrelated. Anyhow, operating voltage is not used as an acceleration
factor.
In order to facilitate the comparison between various results, acceleration
factors based on normalised junction temperature and the ratio of normalised
junction are given in Table 4.6.
From Table 4.7 one may see that degradation failure found in practice are
considerably higher than those given by the generalised sources (related to
catastrophic failures: open- or short-circuits).
162 4 Reliability of diodes
4.3.3.4
Catastrophic failures
4.3.3.5
Degradation failures
Mechanical degradation in the form of partially failed bonds or broken dice lead
to corresponding increases in the forward voltage drop with electrical degradation
as a result. However, such local degradations often lead to local thermal runaway
and total failure.
Misalignment may result in very small insulation path, where moisture or ion
concentration may lead to high leakage. High and often unstable leakage currents
may occur as a result of the oxide passivation being bridged by effects such as
purple plague.
4.4
Trans-Zorb4 diodes [4.15]. .. [4.21]
4.4.1
Introduction
A protection transient diode is, in principle, a Zener diode for current peaks with a
short response time. While these diodes are frequently subject to strong overloads
(for example, inductive loads or commuting capacity, electrostatic loads, "flash"
unloads), to obtain a good reliability, a long time test at their introduction in cir-
cuits is needed. Usually, such data concerning the lifetime does not appear in the
manufacturer's data sheet. For the user it is necessary to perform adequate tests
and to find the diode type that supports the overcharge for the longest time. Such a
test, its evaluation and the obtained results are presented in [4.3].
4.4.2
Structure and characteristics
To satisfy the rapid response, and strong breakdown currents specifications, the
Trans-Zorb diodes are designed as avalanche diodes of great surface, with
4 The denomination "Trans-Zorb" (transient Zener absorber) is a trade mark of the ameri-
can society General Semiconductors Ind., Inc.
164 4 Reliability of diodes
avalanche breakdown. For very high limiting voltage there are two diode circuits
in series into a package. The mechanical structure, the current distribution on the
chip surface area, the uniformity of silicon material, and the protection of the
edges of the crystal are decisive factors for the lifetime of a diode. So, for
example, for the diode IN5907, after approximately 400 pulses in the last load test
(125% Ipp3 ) an internal short-circuit appeared (due to the presence of a new
traversing alloy) [4.20]. At the pulse voltage test, no noticeable heating or thermal
fatigue of the Trans-Zorb diode was found out. The sheet data have been observed
or even exceeded.
4.5
Impatt (IMPact Avalanche and Transit-Time) diodes
This is a power microwave device (Fig. 4.16) [4.21] ... [4.25] whose negative char-
acteristic is produced by a combination of impact avalanche breakdown and
charge-carrier transit-time effects. Avalanche breakdown occurs when the electric
field across the diode is high enough for the charge carriers (holes or electrons) to
create electron-hole pairs. With the diode mounted in an appropriate cavity, the
field patterns and drift distance permit microwave oscillations or amplification.
Impatt diodes are used at higher junction temperatures and higher reverse bias
than other semiconductor devices. This has required the elimination of potential
failure mechanisms, which might not develop at lower temperatures. Surface con-
tamination can cause excess reverse leakage current. Devices with surface con-
tamination are eliminated during a high-temperature reverse-bias screen conducted
on all Impatt diodes. Process cleaning steps have also been developed to minimise
yield loss.
Gold ribbon
Chip
Fig. 4.16 Impatt diode chip in hennetically sealed package, with copper stud at bottom serving
as tenninal and heatsink. Other tenninal is at top
forming metallic spikes which extend into the GaAs; the metallic spikes so formed
short-circuit the junction either in the bulk or at the metal-GaAs interface. In
addition, since more than 90 % of the DC input power can be dissipated in the
high field region [4.28], the attendant rise in junction temperature can result in
concomitant increase in leakage current. Bonding and metallisation are generally
responsible for a high percentage of semiconductor failures. For Impatt diodes,
100% thermal resistance testing and 100% high temperature reverse bias testing
effectively screen the devices with weak die attach, metal contact or bonding.
Process controls developed through feedback from 100% testing have minimised
these fabrication defects. The result is a highly uniform and reliable product.
Small process changes are detrimental to high reliability Impatts with a required
MTBF greater than 105 h for operation at 175°C, since the performance of these
high efficiency diodes depends critically on the exact doping profile of the
epitaxial layer. The degradation indicates time-temperature-dependent changes in
the PtGa-PtAs layers.
Diffusion of the contact metal into the semiconductor material is another cause
of failure. This failure mode is controlled by the choice of metals used in the
contacting system, the control exercised while applying those metals, and the
junction temperature. For any given metallisation system, the diffusion of the
contact metal into the semiconductor is an electrochemical process. The failure
rate due to this diffusion can be described by the Arrhenius equation:
A = ,,1,0 exp-(¢/kT) (4.1)
where A = failure rate; Ao = a constant; ~ = activation energy (eV); T = tempe-
rature (K); and k = Boltzman's constant (8.63 x 1O·5 eV /o K).
The Arrhenius equation has been widely used and its validity has been
demonstrated for many semiconductor failure mechanisms. The value of ~
depends on the specific failure mechanism and is about 1. 8e V for metal
diffusion into silicon.
10
For a known mechanism, the activation energy can be used to project the
failure rate at one temperature to a corresponding failure rate at another
temperature. The acceleration factor is the ratio of failure rates at each temperature
(Fig. 4.17):
AT/·1·T2 = exp{-1.8Ik[(lIT]J - (lITzJ}). (4.2)
Failure rate due to surface leakage also follows the Arrhenius equation.
However, the associated activation energy is 1.0eV. Thus, if ionic contamination
is present, failure will result before metal diffusion occurs.
4.5.1
Reliability test results for HP silicon single drift Impatt diodes
All Hewlett Packard (HP) diodes of this type are burned-in for at least 48 hours at
a junction temperature Tj exceeding the maximum rating of 200°C. The following
tests were performed on HP standard production units [4.21], taken from inven-
tory:
Test 1. - Operating lifetest. Units were tested at the maximum recommended Tj .
(104 diodes tested at Tj = 200°C for a total device hours of 344 000. Failures: two.
A = 0.58 x lO'5h'l; MTBF = 172 OOOh).
Test 2. - Storage lifetest. Units were tested at the maximum recommended Tj .
(54 diodes tested at a storage temperature of 150°C for a total volume of device
hours of 153000. Failures: 0; A:;; 0.65 x lO'5h'\ MTBF 2153 OOOh).
4.5.2
Reliability test results for HP silicon double drift Impatt diodes
These diodes are all burned-in for at least 48 hours at a junction temperature ex-
ceeding the maximum rating of 250°C. The following tests were performed on HP
standard production units [4.21], taken from inventory:
Test 1. - Accelerated lifetest. Units were tested at a junction operating
temperature far exceeding the recommended maximum, in order to accelerate the
failure mechanism. [12 diodes tested at Tj = 350°C for a total volume of device
hours of 77 000. Failures: 3 (1 unit < 48 h; 1 unit < 96 h; 1 unit == 6700h).
Al = 3.9 x 1O.5/h at Tj = 350°C (extrapolating this result to Tj = 250°C gives a
A2:;; 0.01 x lO'5/h); MTBFI 225 667h; MTBF2 > 10 7h].
Test 2. - Operating lifetest. Units were tested at the maximum recommended
junction operating temperature. [29 diodes tested at 250°C junction operating
temperature for a total device hours of 249 000. Failures: 0; A:;; 0.4 x lO'5/h;
MTBF 2 249 400h].
Test 3. - Operating lifetest. [29 diodes tested at 225°C junction operating
temperature for a total device hours of 246 500. Failures: 0; A:;; 0.41 x 1O.5/h;
MTBF 2 246 500h].
These diodes are relatively easy to stabilise against bias circuit instabilities.
Simple biasing schemes (as those described in HP AN935 [4.22]) have been found
to result in reliable low noise operation under proper RF tuning conditions. These
4 Reliability of diodes 167
4.5.3
Factors affecting the reliability and safe operation
In most cases, it is possible to avoid the failures by taking into consideration four
predominant failure mechanisms: (i) fabrication defects; (ii) excessive Tj ; (iii) bias
circuit related burnout; (iv) tuning-induced burnout.
Fabrication defects. Excessive surface leakage current or metallisation
overhang in a defective diode can lead to early failure, even under normally safe
operating conditions. Careful screening with a high Tj burn-in procedure is also
recommended. Where extremely reliable operation in harsh environments is
required, a screening and preconditioning program is recommended.
Excessive ~. The long term intrinsic operating lifetime is directly related to the
average Tj . For a given Tj the failure rate is then critically dependent on the
particular metallisation scheme used to contact the silicon chip. The HP metalli-
sation system used on double drift Impatt diodes has been shown to result in
extremely high reliability under severe conditions. For example, the median MI'TF
(defined as the time to failure of 50% of a population of devices) - at an operating
Tj of 250°C has been calculated to be 2 x 10 6 hours.
Bias circuit related burnout. The frequency band of small-signal negative
resistance in an Impatt diode is limited by transit-time effects to approximately
1.5 octaves at microwave frequencies. When operated as a free-running oscillator
or amplifier under large-signal conditions, however, an Impatt diode develops an
induced negative resistance at lower frequencies; (this effect is less serious in
silicon than in GaAs Impatt diodes). An improperly designed biasing network that
resonates with the diode can thus result in bias circuit oscillations and excessive
noise. In certain cases, the transient current that results from the discharging of
any bias circuit capacitance shunting the diode can lead to failure. Shunt
capacitance should therefore be kept to an absolute minimum.
Tuning-induced burnout. Tuning-induced burnout can be easily avoided after
understanding the circumstances that result in these failures.
(a) Load resistance and safe operation (Fig. 4.18). Oscillation does not occur
for load resistance greater than Ro, the magnitude of the diode's small-signal
negative resistance. Output power increases as R\oad is reduced below Ro until the
maximum obtainable power is achieved for R\oad = R z. It has been experimentally
determined that the onset of power saturation in silicon double drift Impatt diodes
results from large-signal limiting of the RF chip voltage amplitude to a maximum
value of approximately 0.35 times the d. c. bias voltage. In general, Rz will be
between one-half and one-third of the small-signal negative resistance Ro. For
R\oad less than R z, the output power decreases sharply due to the saturation of the
RF voltage. Failure is likely to occur when R\oad is significantly less than R 2 .
168 4 Reliability of diodes
1
Ro 10 current
Fig. 4.18 The influence of circuit load resistance on output power for either a pulsed or CW
Impatt in a circuit which resonates the diode at a single frequency foo. The pulsed or d. c.
operating current is kept fixed at 10
One possible mechanism, which might be responsible for diode burnout under
this condition, has been described in [4.24]; it is suggested that the low-frequency
negative resistance induced by large RF modulation, could lead to a transversely
non-uniform current density within the diode.
(b) Threshold current and optimum tuning. Tuning-induced failure can - in
general- be avoided by paying careful attention to the relationship between power
output and bias current for a particular diode. The three curves in Fig. 4.1Sb
illustrate output power versus bias current corresponding to the three values of
~oad indicated in Fig. 4.1 Sa. For single frequency operation at faa there is an
unambiguous one-to-one relationship between the threshold current where
oscillations begins and the value Of the load resistance. Once the optimum load
resistance has been determined for a particular diode, the corresponding threshold
current can be used as an indicator of unsafe circuit loading. Figure 4.1Sb shows
that the threshold current ITH3 for a load resistance of R3 is considerably less than
ITH2 which corresponds to the optimum load resistance for the desired operating
current of 10. The observation of a threshold current less than ITH2 would therefore
indicate that an unsafe overload condition would exist if the bias current were
increased to 10 •
Although a load resistance of R3 would be unsafe for operation at a bias current
of 10, it would result in optimum performance at some lower bias current. A rough
but useful rule-of-thumb for double drift silicon Impatt diodes is that for optimum
tuning, the threshold current will be approximately one third of the desired
operating current. The threshold current corresponding to maximum output power
at a particular bias current is also a weak function of the fixed frequency of
oscillation. In general, the optimum threshold current will increase slightly as the
operating frequency is increased within the useful frequency range of a diode.
For diodes of the same type it is important to realise that the optimum threshold
current for operation at a particular output power or operating current may vary as
much as ± 10% from diode to diode, due to differences in the packages or chip
negative resistances.
4 Reliability of diodes 169
(c) Coaxial and waveguide cavities. The curves in Fig. 4.18a and 4.18b are
useful for achieving safe operation of diodes which remain resonated at a single
frequency approximately independent of the bias current and the RF voltage
amplitude. For this reason, single-transformer coaxial cavities are recommended
for initial device characterisation because they are broadband, well behaved and
relatively easy to understand. Noise, stability or resistive power loss
considerations may, however, ultimately require the use of a higher Q waveguide
cavity. Great care should be taken in this case to insure singly resonant operation
and avoid tuning-induced failures due to improper loading. Below the waveguide
cut-off frequency, the Impatt diode is decoupled from the external load and a
short-circuit may arise at the plane of the diode. The use of absorptive material in
the bias circuit can be an effective solution to this problem. The large harmonic
voltages that are easily generated in waveguide cavities can also play a part in
tuning-induced failures. It has been found that these failures can be eliminated if a
sliding load for the next higher frequency band replaces the commonly used
sliding short.
References
5.1
Introduction
realised. The technical problems raised by the circuits with power transistors are
simple and allow small manufacturing costs, small dimensions and small weight.
Moreover, high frequency operation produces fewer disturbances to the power
supply than classical device [5.3].
Generally, these components work in supply circuits having small source im-
pedance. Therefore, overvoltage with multiple causes can come out [5.4]. For
instance, voltage peaks arise in the inductive circuit at a current-off or due to the
disturbances transmitted on-line. These overvoltages represent a great danger for
the semiconductor device in an off state, because in this case the device acts like a
dielectric. This phenomenon occurs mainly for the components without a "con-
trolled avalanche" characteristic and, particularly, for transistors sensitive to sec-
ond breakdown. One must realise that the energies implied are very important and
their suppression is difficult because of the reduced source impedance.
The protection device, preventing the reaching of a dangerous value of the
circuit voltage, must satisfy the following conditions:
• Do not cause losses in normal operating,
• To be an effective limiter for the overvoltages with a rapid on-characteristic,
• To swallow-up, without being destroyed, the delivered energies.
Today, one knows that the difficulties encountered with power transistors arise
from bad using conditions. This means that the most important technological
problems linked to the component reliability have been solved. Researches on the
operating conditions (detailed definition of the specifications on protection, cor-
rect choice of components and the avoidance of the errors in use) have to be done.
Because their essential function is linked to high energy levels, the practical using
conditions for power transistors are a peremptory factor in the quality of the sys-
tems using power transistors. Experimentally, it has been shown that the lack of
this information always leads to failures. Consequently, the user must ask as fol-
lows:
• Has the transistor complete and correct specifications? Is it correctly mounted,
compiling with these specifications?
• If yes, an inherent reliability for the specified using mode can be obtain?
5.2
Technologies and power limitations
There are two basic variants for power transistors: bipolar transistors (operating
with minority carriers) and unipolar transistors (operating with majority carriers).
5.2.1
Bipolar transistors
Three main technologies, summarised in Table 5.1, each one containing some
variants, are used for the manufacturing of bipolar transistors. Only one of them
(voltlbase and collector) seems to be adequate for power transistors.
5 Reliability of silicon power transistors 173
The technological variants for power transistor are presented in Table 5.3. A
special attention must be given to the mounting operation and especially to the
wire bonding. A comparison between the main bonding techniques is presented in
Table 5.2.
Thermo- High temperature Gold wire; ribbon Very small Expensive for
compression and pressure surfaces high surfaces
Needle High temperature Gold wire More robust than A high contact
and pressure thermo- surface is needed
compression
Wire solder Wire inserted in Solderable wire Moderate cost A high contact
melted solder surface is needed
Clip solder Position the clip and Bronze clip with Low cost A high contact
solder phosphorus or surface is needed
nickel
Mesa epitaxial with double Rapid; small saturation resistance Relatively expensive; less
diffusion robust; medium losses
Mesa planar with double Rapid; small losses; small Less robust; expensive
diffusion saturation resistance
Base with mesa epitaxy Medium speed; small saturation Small voltage; medium losses
resistance
Collector and base with Robust; medium speed; small Relatively expensive
mesa epitaxy saturation resistance; high voltage
5.2.2
Unipolar transistors
The Field Effect Transistors (FET) have some important advantages, such as:
linearity, high input impedance, negative temperature coefficient for the drain
current (preventing second breakdown and protecting against short-circuiting
when the FET is placed at the output of an amplifier).
The MOS (Metal Oxide Semiconductor) transistor is another unipolar device.
As Rossel, P. et a1. [5.4] say, since 1974, various MOS transistors have been real-
ised. More recently, more complex MOS and bipolar devices, allowing more di-
versified electrical functions have been created. These developments offer the
opportunity for integrating the complete control circuit and the power device into
the same chip. In the MOS technology, two structural configuration are regularly
used: the vertical structures (the current flows vertically across the chip) and the
horizontal structures (the current always flows at the surface). There are integrated
devices: the current can be both vertical and horizontal.
5 Reliability of silicon power transistors 175
A vertical device is the VMOS transistor, where V is the shape of the etched
silicon. A comparison between bipolar and VMOS transistors is presented in Ta-
ble 5.4.
Another vertical MOS device is the VDMOS transistor (Vertical Double dif-
fused), which allows to obtain a higher voltage (higher than 400V) than the
VMOS ones.
The horizontal devices are basically double diffused lateral MOS transistors
(LDMOS) having a highly integrated gate and source geometry.
5.3
Electrical characteristics
5.3.1
Recommendations
• Current gain: To counter-balance the drift arising during the ageing of the
transistor, a safety margin of 15-20% must be taken.
• Leakage current: A long time, the leakage current CE was considered the
only representative parameter for the reliability of a power component. To-
day, the leakage currents can be made small and stable, so their importance
decreased. On the contrary, the difference between the reliability require-
176 5 Reliability of silicon power transistors
ments and the circuit demands are peremptory. An example is the surface sta-
bility problem [5.5].
• Breakdown voltage: Usually, for modem transistors a well-defined break-
down voltage is guaranteed and considered an absolute limit. For small en-
ergy components, the operation at values up to 70% of the maximum admis-
sible value is recommended. For power devices, values up to 90% of the
maximum admissible value are allowed.
• Residual current leBO: The measurements made in operating conditions for
power transistors (but also for bipolar small signal transistors) proved, for up
to 1% of the measured transistors, the existence of a residual current produc-
ing troubles. The residual current has three components: a space charge com-
ponent, an interface component and a leakage current component.
The first component arises from the space charge region of the CB junc-
tion. Even a depleted surface layer can have its contribution. The second
component is produced by the generation/recombination states existing at the
Si-Si02 interface. This component increases considerably when it reaches the
surface of a depleted layer and can be divided in a volume component (gene-
ration/recombination and diffusion) and a surface component (generation/re-
combination). The third component has three main causes:
i) contamination and humidity, ii) a current flow in the depleted layer and iii)
leading "path" for electrons in the oxide layer.
For real transistors, these three components are superposed, with different
intensities. The measurements identifying the voltage and temperature de-
pendence allow an estimate of the dominant component. To be noted that the
passivation plays an important role in taking under control the residual cur-
rents, but often, an increase of the noise factor was observed.
5.3.2
Safety Limits
Why one must take safety limits? Because it is important that the unexpected
cases be taken into account and because the power transistors must operate some-
times far beyond their physical limits.
To do this, the use of a safety factor for unexpected cases is not sufficient,
without performing an analysis, because:
5.3.3
The du/dt phenomenon
5.4
Reliability characteristics
The power transistors convert or switch high energies. This may lead to very high
stresses that often produce the component degradation. Consequently, the compo-
nent reliability is strongly linked to the operating conditions.
The component damage is due to a too higher junction temperature. The result
is an abnormal component operation (electron-hole pairs are created by thermal
178 5 Reliability of silicon power transistors
2 /
/
5 I
2
II
10"6 0
/
25 50 75 100 125 T; Ie)
Fig. 5.1 Failure rate vs. virtual junction temperature [5.10]
Peter [5.8] presented a diagram (Fig. 5.1) showing the failure rate dependence
on the virtuaP junction temperature for a blocking test at high temperature and at a
voltage approaching the maximum value. When an over-charge current occurs at a
given junction temperature, the current flow lead to an ageing by electromigration.
I It is known that the junction temperature is a physical basis parameter, very difficult to be
measured. Therefore, the manufacturers give tempearture limite values corresponding to the
absolute using limits. If such a limit value is surpassed, even for a short time and even subse-
quently these limits are not surpassed anymore, risks of a progressive damage or of an irre-
versible modifying of the characteristics occur.
5 Reliability of silicon power transistors 179
An important part of the testing program uses the temperature as a stress factor
to predict the time behaviour of the transistor. One may use the temperature be-
cause a correlation between the damage speed and the exponential factor of liT
takes place (Fig. 5.2, from [5.9]). The points A, Band C are calculated at three
different temperatures, for the same failure mechanism. If no new failure mecha-
nism (modifying the slope of the established characteristic) occurs, the character-
istics can be extrapolated and the result for another temperature (point D) can be
obtained.
A B
/ /
2
~
/'
7
1\.V
C
0.5
\
I\. V
\V
,,
0.2
0.1
,
0.05 '" , 0(85
0.02
18 2.0 2.2 2.4 2.6 2.8
Fig. 5.2 Correlation between the damage speed, expressed by the failure rate (A, in lO-slh) and the
reverse of the temperature, liT (in IO- JIK)
(VCEO)' For very short power pulse, the value allowed for the equalising currents
can be surpassed, up to the reaching of the higher allowed temperature. For a
faster sequence of pulses, the system is not completely cooled and, therefore, the
heating is kept constant.
Generally, when data on the components based on facts are missing, and an ab-
solute and fixed limit given by the manufacturer is solely known, the failure rate
of the system must be minimised by all means. The stresses corresponding to the
normal operation are known, but those corresponding to an accidental operation
are not. Therefore, the elements taken into account by the system designer are the
choice of the components and of the circuit, the definition of the protection means
and of the safety limits. The circuit designer must take into account the economi-
cal and technical requirements.
5.5
Thermal fatigue
The thermal fatigue is the slow degradation of the components, produced by tem-
perature variations. Generally, the phenomenon is linked to the mechanical
stresses generated by the different dilatation coefficients, which influence the
quality of the solder joints and of the metal/silicon and passivantisilicon joints,
respectively.
After thermal cycling, thermal fatigue is a current phenomenon for power tran-
sistors encapsulated in metal case. If a transistor is heated and cooled alternately,
mechanical stresses are produced, because the dilatation coefficients of the silicon
and of the metal used for chip mounting are different.
The transistor heat sink plays an important role in heat dissipation and, there-
fore, it is made from copper, from steel or from aluminium. The dilatation coeffi-
cients of these metals are different from the silicon ones (Table 5.5).
Material Coefficient
(lO-6fC)
Silicon 3
Steel 10.5
Copper 17
Aluminium 23
It is obvious why at the same temperature different stresses arise at the inter-
face chip-heat sink.
The link between the silicon chip and the case is made by a "soft" solder joint,
or by"hard" solder joint. In the first case, the melt consists mainly in lead, which
can swallow up the stresses between the chip and the case, because the lead is
modelled by plastical deformation. After deformation, the recrystalising restores
the metal, acting better at higher solder temperature and at longer times. However,
the formation of microscopic holes cannot be avoided. These holes lead to stress
5 Reliability of silicon power transistors 181
concentration and as soon as the twisting limit is reached locally, a crack appears
in this point, limiting the heat dissipation and modifying the thermal resistance.
In the case of a "hard" solder, used for transistors with power greater than
150W, the alloying of gold with silicon sends the stress entirely to the chip, which
is more fragile and can be broken. To protect the chip, the molybdenum can be
used, with a dilatation coefficient closed to the silicon one, and which can swallow
up the stress if the thickness of the molybdenum layer is well chosen. Even if the
melting temperature is not reached, the stress subsequent to many thermal cycles
is so big that the alloy molybdenum-copper is weakened and a crack may appear.
The produced heat is dissipated with difficulties and the thermal resistance in-
creases.
An important increase (with 25%) of the thermal resistance between the
junction and the heat sink certifies a thermal fatigue. Usually, in all practical cir-
cuits, the power transistors undergo thermal stresses. In many applications, these
stresses are very big and, therefore, it can lead to the physical destruction of the
chip or of the intermediate layers. The tests made by the manufacturers led to the
following conclusion [5.l 0][5.11]:
• Short cycles produce reduced ageing phenomena,
• The number of cycles leading to a significant and measurable ageing is re-
versely proportional to the maximum temperature and to the temperature gap
of a cycle. The absolute limit values of the producer define the limits that may
not be surpassed.
The user may identify the ageing of the solder joints of a component in operation
by the abnormal heating of the junction, leading to the component failure. The
behaviour at second breakdown is a sensitive parameter to thermal fatigue. As
soon as a microcrack is formed, the thermal resistance increases locally and the
behaviour at second breakdown will be worst. If a transistor must operate closed
to second breakdown, it can be suddenly destroyed, without a previous degrada-
tion of the connections by thermal fatigue.
To improve the component manufacturing, the producer may act in two ways
[5.l2]:
• A constant and careful surveillance of the manufacturing conditions, so that
an optimal quality being reached,
• To produce improvements by failure analysis, by the searching of the causes
and by the development of a new and improved technology.
The experience cumulated so far shows that for small stress variations the sol-
der joints transmit integrally the stresses, without fatigue phenomena (in the elas-
tical domain), while for important stress variations a weakness of the solder joint,
increasing with the duration of the stress, is observed.
5.6
Causes of failures
The primary cause of the failure is almost always (excepting the overvoltage) an
abnormal increase of the temperature, often spatially limited ("hot spot") and
182 5 Reliability of silicon power transistors
5.6.1
Failure mechanisms
2 If no irreversible physical transformations occur, the power transistor can regain the originar
characteristics.
3 This modification corresponds to irreversible transformations: therefore, the component IS
damaged or destroyed.
5 Reliability of silicon power transistors 183
• The contamination (of the glass or of the protecting layer). Leakage currents
directly proportional to the operating voltages and ambient temperatures are
produced.
• The lack of adhesion of the aluminium to the glass. A non-conform current
distribution in silicon leads to the phenomenon called hot spot.
Usually, the chip failure is produced by defects from the semiconductor crystal
structure, by unwanted impurity or by diffusion induced defects. Generally, these
defects can be discovered even during the final electrical control. Undiscovered
defects lead in time to wear-out failures.
As for other semiconductor devices, volume defects (epitaxy or diffusion-
induced defects, microcracks) arise also for power transistors. These defects may
lead to hot spot in CB junction. If the transistor is not efficiently protected (in
current and voltage), hot spot phenomena may lead to total destruction by the
breakdown of the junction, based on a well-known failure mechanism: a current
increase - entering in second breakdown - entering in intrinsic conduction of
silicon).
5.6.2
Failure modes
The external indicators signalling the failures are called failure modes. For failure
analysis (and also for the building of the screening methods and tests), the basic
knowledge on manufacturing methods and on the correlation between the failure
modes and the component design are essential.
Table 5.6 Failure sources (in %) for power transistors encapsulated in TO-3 and TO-220
Operator deftness 35 -
Metallic impurities 20 25
Internal connections 15 25
Moulding - 15
Series fabrication 10 10
Surface effects - 10
Tightness 5 -
Materials 5 -
Tests 5 5
Unidentified sources 5 10
• leBo is the most sensitive indicative parameter for a surface defect. A conti-
nuous increase of this parameter, often accompanied by a decrease of the cur-
rent gain, hFE is a sure indication of a surface with impurities.
• The short-circuits (especially CE) may announce the presence of hot spot
phenomena, due to chip problems or a circuit defect.
• The open circuit may indicate a bad solder joint or a melted conductor due to
an excessive current.
• The combinations between short-circuit and open circuit may be the result of
a melted conductor linked with the upper conducting layer.
• The intermittent open circuit, especially at high temperature, must be consid-
ered, usually, as a sign for a bad quality solder joint.
To establish the failure modes and mechanisms, life test results and operation
failures must be investigated. This information is useful for the establishing of the
error and failure sources. Then, the manufacturer uses it for the improvement of
the fabrication process. If the failure modes and mechanisms are known, acceler-
ated tests must be performed for each of the considered applications. Thus the
failure sources can be established in the laboratory. Based on this information, the
producer will work out an improvement programme for the technology. The fail-
ure analysis is also an important information source for the elimination of the
manufacturing defects, or, if any, of the utilisation errors. In accordance with the
RCA statistics [5.6][5.12] ... [5.16], the failure sources for power transistors encap-
sulated in T0-3 (metal package) and in T0-220 (plastic package) are given in
Table 5.6.
5.6.3
A check-up for the users
Because many failures result from an improper utilisation of the power transistors,
it is likely to check fIrst that suitable mechanical and electric procedures have
been used. Thus, it is recommended to use the following checking:
Mechanical problems
• Cooling elements correctly dimensioned and effIcient.
• Smooth and without defects mounting surface.
• Correct compression coupling.
• No excessive stresses.
• Correct use of the silicone paste.
• No contamination of the isolators or of the case (no leakage).
Electrical problems
• Does the power device work in the domain specifIed by the manufacturer?
• Are the limit values for current and voltage exceeded?
• The electrical tests do not damage the transistor?
• Is the power correctly measured?
• Are the used components correctly dimensioned to avoid overvoltages?
5 Reliability of silicon power transistors 185
Other problems
• Purchasing date.
• The quantity purchased and the storage conditions.
• Manufacture date.
• The number of transistors used for building equipment.
• The number of failed components.
• Analogue experiences with previous component batches.
• Operating conditions in the failure moment.
5.6.4
Bipolar transistor peripheries
The package, the chip connections and the chip-package connections are the tran-
sistor peripheries. The weak points of these peripherics are [5.17]:
• Material migration on chip.
• Insufficient package tightness.
• Silicon degradation in connection area,
• Setting-up of gold aluminium alloy.
.. Aluminium reliability in the area closed to the connection (identified by ultra-
sounds).
• Anodic decomposition of the aluminium (when the moisture penetrates).
• Insufficient adherence of the aluminium to silicon.
• Oxide residues in contact windows.
• Cracks and material residues on the leading path or along the wires making
the connection with the environment.
To emphasise these structural weaknesses, accelerated tests (high current, tem-
perature and humidity) and vibration tests are performed.
5.7
The package problem
Since plastic package was introduced (in 1962), important progress is accumulated
both in the field of plastic materials and in the packaging technology, respectively.
To assure high transistor reliability, the plastic material must adhere well to all
metallic parts, serving as a separating protection buffer during the component
lifetime. The dilatation coefficient of the plastic material must be comparable with
those of the other constitutive parts. This material needs to permanently take over
the heat emitted during operation.
Large programmes for studying the reliability, initiated by all important ma-
nufacturers allowed to evaluate the reliability of power transistors and also to de-
monstrate important properties such as: chip surface stability, package suitability,
parameter stability during long lifetimes [5.18].
For TO-3 package, three materials may be used: copper, aluminium and iron-
nickel alloy (Fe-Ni). The copper is used only for special cases (high reliability
equipment, special programs), because it is very expensive. For professional
186 5 Reliability of silicon power transistors
items, Fe-Ni is used, but for the majority of common applications aluminium
gives satisfactory results. Since the transistors in aluminium package are encap-
sulated at low temperature, tightness problems may arise. Adams [5.19] says that
the real average failure percentage due to tightness deficiencies is smaller than
2.2% from the total number of complaints.
As concerns the plastic encapsulated transistors, it was known that high tem-
perature packaging might cause chip-epoxy material interactions. Specific pro-
blems linked to the degradation, to the life duration and, especially, to the pheno-
mena occurring during the encapsulation were carefully investigated and it seems
that, today, plastic cases are as reliable as hermetic ones (see Chap. 12).
5.8
Accelerated tests
• The failure rate of the transistors depends on junction temperature, but also on
the way that this temperature was obtained.
5.8.1
The Arrhenius model
A model for the relation between the failure rate and the junction temperature of
the devices was developed, based on the Arrhenius law. The diagram from Fig.
5.2 allows estimating the probable reliability of the device for different junction
temperature produced in practical condition.
Generally, if the transistor has a heat sink, one can write:
(5.1)
where Tj is the junction temperature, PD - the dissipated power, R;c - the thermal
resistance junction-case (a technological characteristic specific to a transistor and
given by the manufacturer), Res - the thermal resistance case-heat sink (referring
to a conduction transfer; this resistance is smaller if the contact between the case
and the heat sink surface is good and this contact can be improved by using sili-
cone oil), RSE - the thermal resistance heat sink-environment (depending not only
on the size, form and structure of the heat sink, but also on its orientation and on
the air stream flowing around it), TA - the ambient temperature. Since the resis-
tances are in series, one may write:
U(V)
10 20 30 40 50
Fig. 5.3 Voltage dependence of the median time (lognormal distribution). Experimental data
were obtained from four samples withdrawn from the same batch of bipolar transistors undergo-
ing a life test at the same temperature, at the same dissipated power (Fmax), but at different com-
bination U" Ii (where Uix Ii = Pmaxfor all samples)
188 5 Reliability of silicon power transistors
In Fig. 5.3, the variation of 1m (the median time, for a lognormal distribution)
with the applied voltage is presented, for a single failure mechanism (a field in-
duced junction). Since the junction temperature is the same for all samples, tm
must be constant. The voltage dependence observed in Fig. 5.3 means that the
Arrhenius model (described by the dotted line) is no longer sufficient to describe
the temperature acceleration, because it seems that the way used to obtain this
temperature (by electrical and/or thermal stress) is also important. Consequently,
Biizu and Tazlauanu [5.23] proposed a new model, suitable for many electrical
and climatic stresses. The model can be used, for instance, for building accelerated
tests with humidity as a stress factor (see Chap. 2).
Martinez and Miller [5.24] studied the reliability of power RF transistors oper-
ating at temperature above + 150°C, the maximum junction temperature. Acceler-
ated tests, as follows, were performed:
• DC tests: 1000h at 180°C and 240°C, constant stress,
• RF tests: 168h, full power, step stress,
• Temperature increase up to 220°C, in 20°C steps, 200h at each step
The only failure mechanism found was the electromigration. Consequently, it
seems that the transistors can operate successfully at that high temperature and
accelerated life tests at +220°C are feasible.
Table 5.7 Testing conditions for temperature cycling testing of cases TO-3 and TO-220
5.8.2
Thermal cycling
The thermal cycling proved to be a very good method used in accelerated tests
for technological improvement evaluation. With this proceeding, the quality of
solder joints can be tested continuously.
In the purpose to compare the reliability of the same transistor encapsulated in
plastic packages TO-220 and in metal packages T0-3, the number of cycles up to
failure vs. the junction temperature are presented in Fig. 5.4 and 5.5 [5.6][5.15].
The testing conditions are summarised in Table 5.7.
· 5 Reliability of silicon power transistors 189
LlT, (,C)
200 * 30W
ISO
100
50
Fig. 5.4 Temperature range vs. number of cycles till failure (for power transistors encapsulated
in package TO-3)
LlT, ('C)
200 • 6.75W
150
100
50
Fig. 5.5 Temperature range vs. number of cycles till failure (for power transistors encapsulated
in package TO-220)
Failure rate
A (lrr 6h-')
2.0
1.0
0.8
0.5
0.2
0.1
o 0.2 0.4 0.6 0.8 Normalisedjunction temperature
Fig. 5.6 Correlation between failure rate and normalised junction temperature. For transistors
with dissipated power higher than 1W at an environmental temperature of 25°C, the values must
be multiplied by 2
190 5 Reliability of silicon power transistors
If the temperature is the main stress in operation, the curves given by the stan-
dard MIL-S-19500 (Fig. 5.6) may be used to predict the reliability. For instance, if
the thermal cycling produces the failure, the maximum number of cycles for a
transistor can be calculated.
In Fig.5.7, the dependence of the failure rate on the junction temperature, for
different reliability level, is shown for power transistors. One must note that the
use of screening test (such as JAN TX) can diminish the failure rate.
.... ....
........ •••••• Plastic
....
Hermetk .......... .
.... ........
.....
.....
.....
.....
.... ",
...... JANTX .... .....
....
.... .....
....
....
.....
....
10-6 '-----'---'------'-------'-----.
5.9
How to improve the reliability
A perfect flatness of the radiator part, which is in contact with the case, is indis-
pensable both for a good heat evacuation and to avoid the case deformation. To do
this:
• The radiator thickness must be greater than 2 mm.
• The fixing holes must not to be too large.
• The recommended value of the pressing coupling must be 0.8 x maximum
value.
• The radiator holes must be perpendicular to its surface.
• The relief collar, arising when the hole is made, must be completely elimi-
nated_
• Silicone oil must be used to improve the thermal contact.
Since the arising of inductive overvoltage is very dangerous for decoupling cir-
cuits, a rapid diode limiting the applied voltage to the transistor jacks may be used.
5 Reliability of silicon power transistors 191
This diode is efficient only if the connections are short enough. Moreover, the
diode must not be placed too close to the coil, but close enough to the power tran-
sistor and to the irregular voltage source. About soldering, one must note that at a
soldering gun temperature of about +35°C, the soldering duration may not over-
pass 10 seconds.
5.10
Some recommendations [5.26] ... [5.63]
• To place the operating point inside the curve SOA (Safe Operating Area), so
that the point be far from the limit given by second breakdown.
• To foresee safety margins for switching losses, for the maximum voltage, for
the maximum junction temperature (especially for high voltage components)
and for second breakdown.
• To achieve the contact between the component and the heat sink so that the
lack of flatness to be smaller than O.lm and the ruggedness smaller than 15m.
The contact resistance cannot be avoided, indifferently of the used com-
pression force.
• To use silicone oil that is a good heat conductor, eliminating the supplemen-
tary corrosion risks.
• To paint in black the radiator (absolute black body), but with a thin layer, to
avoid a supplementary thermal resistance.
In the case of power transistors, the Arrhenius relation plays an important role.
The surface problems and those related to the layers involved are complex and the
chemical processes limit the life duration. The solubility of the materials increases
with temperature and the stability decreases.
A single stress type cannot emphasise all the failure types. This means that for a
semiconductor device the screening proceedings have only a limited success.
For silicon power transistors that are metal packaged, a temperature range be-
tween -65°C and +200°C is recommended. Inside this range, the transistor reli-
ability is considered to be satisfactory. Outside this range, the transistor is unsta-
ble, it cannot be commended and, eventually, it fails. For this reason, the failure
rate increases with temperature 4 •
The curves showing the time distribution of the failures are not reproducible
(excepting the case when a specified dominant failure mechanism exists). The
early failures do not arise always. Consequently, the screening does not bring
always a reliability improvement.
The SOA Test, used close to the intersection area (maximum dissipated power,
second breakdown) of the characteristic parameter plane, Ic = f (V CE), is a global
test pursuing to verify the transistor capability to support the operation power.
This test is applied during 0.25-1.5 seconds, at a given power (lE,V CE). If the
voltage VCE decreases this means that the transistor operates defectively and there
are hot points, with a tendency towards short-circuits. This test allows to detect
solder joints defects, some volume defects (microcracks, base inhomogenities) and
some surface defects (adhesion losses of aluminium to silicon). Therefore, the
SOA Test is an "all or nothing" test and does not allow obtaining measurable
values.
In spite of all foreseen measures that can be take (IE must be applied before V CE,
to avoid oscillations and VCE must be interrupted in 1 s if the fixed limit is over-
passed). The SOA test can lead to the failure of the tested components, especially
of those with defects (e.g. EC junction breakdown).
References
5.23 Bazu, M.; Tazlauanu, M. (1991): Reliability testing of semiconductor devices in humid
environment. Proceedings of the Annual Reliability and Maintainability Symposium
(ARMS), Orlando, Florida, January 29-31, pp. 307-311
5.24 Martinez, E. C.; Miller, J. (1994): RF power transistor reliability. Proceedings of the An-
nual Reliability and Maintainability Symposium (ARMS), Anaheim, California, January
29-31,pp.83-87
5.25 Deger, E.; Jobe, T. C. (1973): For the real cost of a design factor in reliability. Electronics,
August 30, pp. 83-89
5.26 Grange, J. M.; Dorleans, J. (1970): Failure rate distribution of electronic components.
Microelectronics and Reliability, vol. 9, pp. 510-513
5.27 Kemeny, A. P. (1971): Experiments concerning the life testing of transistors. Microelec-
tronics and Reliability, vol. to, part I: pp. 75-93; part II: pp. 169-194
5.28 Lang, G. A., Fehnder, B. J., Williams, W. D. (1970): Thermal fatigue in silicon power
transistors. IEEE Trans. on Electronic Devices, ED-17, pp. 787-793
5.29 Redoutey, J. (1977): Les parametres importants des transistors de puissance. Sescosem
Informations No.5, April, pp.3-15
5.30 Gallace, L. J., Vara, J. S. (1973): Evaluating the reliability of plastic-packaged power
transistors in consumer applications. IEEE Trans. on Broadcast and TV, BTR-19, No.3,
pp.194-204
5.31 Preuss, H. (1969): Der Einfluss der Parameterdrift auf die Ausfallrate von Schalttransis-
toren. Femrneldtechnik, vol. 9, pp. 263-267
5.32 Happ, W. J.; Vara, J. S.; Gaylord, J. (1970): Handling and mounting of RCA moulded-
plastic transistors and thyristors, RCA Technical Publication, AN-4124, February
5.33 Ward, A. L. (1977): Studies of second breakdown in silicon diodes. IEEE Trans. on Parts,
Hybrids and Packaging, PHP-13, No.4, December, pp. 361-365
5.34 La Combe, D. J.; Naster, R. J.; Carroll, J. F. (1977): A study on the reliability of microwave
transistors. IEEE Trans. on Pints, Hybrids and Packaging, PHP-13, No.4, December, pp.
242-245
5.35 Schultz, H.-G. (1977): Einige Bemerkungen zum Rauschverhalten des Feldeffekt-
transistoren. Nachrichtentechnik Elektronik, vol. 27, H. 6, pp. 242-245
5.36 Harper, C. A. (1978): Handbook of components for electronics. New York: McGraw-Hill
Book Company
a
5.37 Cavalier, C. (1974): Contribution la modelisation des transistors bipolaire de puissance:
aspects thermiques. These, Universite de Toulouse
5.38 Davis, S. (1979): Switching-supply frequency to rise: power FETs chalenge bipolars. Elec-
tron Device News, January to, pp. 44-50
5.39 Ginsbach, K.H.; Silber, D. (1977): Fortschritte und Entwicklungstendenzen auf dem Gebiet
Silizium-Leistungshalbleiter. Elektronik, H.11, pp. ELl-EL5
5.40 Stamberger, A. (1977): Tendenzen in der Leistungelektronik. Elektroniker, H.11, p. EL34
5.41 Grafham, D. H.; Hey, J. C. (1977): SCR-manual. Fifth edition. General Electric, Syracuse,
New York
5.42 Biijenesco, T. I. (1981): Problemes de la fiabilite des compos ants electroniques actifs ac-
tuels. Masson, Paris
5.43 Biijenescu, T. I. (1981): Zuverliissigkeitsproblemlosungen elektroniker Bauelemente. IN-
FORMIS-Informationsseminarien 81-8, ZUrich, May 14 and October 20
5.44 Biijenescu, T. I. (1981): Ausfallraten und Zuverliissigkeit aktiver elektronischer Bauele-
mente. Lehrgang an der Techn. Akedemie Esslingen, February 17-18
5.45 Antognetti, P. (1986): Power integrated circuit
5 Reliability of silicon power transistors 195
5.46 Hower, P. L. (1980): A model for tum-off in bipolar transistors. Tech. Digest IEEE IEDM,
p.289
5.47 Sun, S. C. (1982): Physics and technology of power MOSFETs. Stanford electronics labs.
TR no. IDEZ696-1
5.48 Bertotti, F. et al. (1981): Video stage IC implementated with a new rugged isolation tech-
nology. IEEE Trans. Consumer Electronics, vol. CE-27, no. 3, August
5.49 Sakurai, T. et al. (1983): A dielectrically isolated complementary bipolar technique for aid
compatible LSIs. IEEE Trans. Electron Devices, ED-30, p. 1278
5.50 Zarlingo, S.P.; Scott, R.1. (1981): Lead frame materials for packaging semiconductors. First
Ann. Int. Packaging Soc. Conf.
5.51 Dascalu, D. et al. (1988): Contactul metal-semiconductor. Ed. Academiei, Bucharest (Ro-
mania)
5.52 Kubat, M. (1984): Power semiconductors. Spinger, Berlin Heidelberg New York
5.53 Regnault, J. (1976): Les defaillances des transistors de puissance dans les equipments.
Thomson-CSF, Semiconductor Division
5.54 Baugher, D. M. (1973): Cut down on power-transistor failures inverters driving resistive or
capacitive loads. RCA Technical Publication no. ST-3624
5.55 Sagin, M. (1977): Power semiconductors. Wireless World, May, pp. 71-76
5.56 Lilen, H. (1976): Les nouvelles generations de compos ants de puissance dependront des
tchnologies de bombardement neutronique etJou electronique. Electronique et microelec-
tronique industrielle, no. 225, October, pp. 22-25
5.57 Gallace, L. J.; Lukach, V. J.(1974): Real-time controls of silicon power-transistor reliabil-
ity. RCA Technical Publication AN-6249, February
5.58 Turner, C. R. (1973): Interpretation of voltage ratings for transistors. RCA Technical Publi-
cation AN-6215, September
5.59 Tomasek, K. F. (1970: Surveying the results of transistor reliability tests. Tesla Electronics,
vol. I, pp.17-21
5.60 Walker, R. c.; Nicholls, D. B. (1977): Discrete semiconductor reliability transistor/diode
data. ITT Research Institute
5.61 Bodin, B. (1976): Reliabilty aspect of silicon power transistors. Motorola Application Note
5.62 Thomas, R. E. (1964): When is a life test truly accelerated? Electronic Design, January 6,
pp.64-70
5.63 Baudier, J.; Fraire, C. (1977): Mesure sur les transistors de commutation de forte puissance.
Sescosem Informations, no. 5, April, pp. 26-30
5.64 Bulucea, C. D. (1970): Investigation of deep depletion regime of MOS structures using
ramp-response method. Electron. Lett., vol. 6, pp. 479-481
5.65 Grove, A. S. (1967): Physics and technology of semiconductor devices. John Wiley, New
York
5.66 Grove, A. S.; Deal, B. E.; Snow, E. H.; Sah, C. T. (1965): Investigation of thermally oxi-
dized silicon surfaces using MOS structures. Solid-State Electron., vol. 8, pp. 145-165
5.67 Das, M. B. (1969): Physical limitations of MOS structures. Solid-State Electron., vol. 12,
pp.305-312
5.68 Hofstein, S. R. (1967): Stabilization of MOS devices. Solid-State Electron., voL 10, pp.
657-665
5.69 Deal, B. E.; Snow, E. H. (1966): Barrier energies in metal-silicon dioxide-silicon struc-
tures. J. Pys. Chern. Solids, vol. 27, pp. 1873-1879
5.70 Bulucea, C. D.; Antognetti, P. (1970): On the MOS structure in the avalanche regime. Alta
Frequenza, vol. 39,pp. 734-737
196 5 Reliability of silicon power transistors
5.71 Sah, C. T. ; Pao, H. C. (1966): The effects of fixed bulk charge on the characteristics of
metal-oxide-semiconductor transistors. IEEE Trans. Electron Dev., vol. 13, pp. 393-397
6 Reliability of thyristors
6.1
Introduction
, Bidirectional thyristors are classified as pnpn devices that can conduct current in either
direction; commercially available bidirectional triode thyristors are triac (for triode AC switch),
and the silicon bilateral switch (SBS).
Anode
n ngate n n
pgate p p p
Cathode
Similarly, the collector of the pnp-transistor along with any p-gate current [IGrP)]
supplies the base drive for the npn-transistor:
(6.2)
Thus, a regenerative situation exists when the positive feedback gain exceeds the
unitary value.
The thyristor is a small power semiconductor switch with short response time,
able to close an electric circuit, but not to re-open it. For this, it must be brought,
for a short time, to a zero direct voltage, situation which is reproduced at each
halfperiod for the alternating current circuits. The thyristor is utilised for the control
of alternating currents (regulated motors, regulated heatings, lighting installations,
etc.).
The complexity of equipment - on the one hand - and the development of new
components - on the other - have forced industry to invest considerably effort in
finding means for controlling and predicting reliability. In many cases, the efforts
were accelerated by the desire of the military responsible to evaluate (and improve
where necessary) the reliability of new devices which offered the promise of
improvements in size, weight, performance, and reliability in aerospace and
weapons systems. One may note that after only two years, in 1960, the new
invented thyristor C35 of General Electric filled all requirements of the American
army and was successfully qualified according the first SCR military specification.
6.2
Design and reliability
The design of a new component has to assure that their performances, during the
entire lifetime, do not exceed the specified tolerances. This concerns particularly
the mechanical and thermal design of the components. In the case of thermal de-
sign, the stability of thermal characteristics is important because the junction tem-
perature represents the major limitation in applications. The deterioration of ther-
mal way can lead to a thermal stirring up and to component destruction. To assure
6 Reliability of thyristors 199
the compatibility of the thermal coefficients and to reduce the thermal fatigue, it is
necessary to select adequately the interfacing materials. Normally, the thermal
fatigue is attached to the stresses that affect the quality of the die-pellet or the
metal-silicon connections or the passivation-silicon medium. The thermal fatigue
can appear as a consequence of the thermal cycles. If a thyristor is successively
heated and cooled, stresses are produced in it, since the dilatation coefficients of the
silicon and of the metal on which the structure is fixed are very different.
3 4 5
-\;:----2
Fig. 6.2 Passivation and glassivation (National Semiconductor document). The passivation is a
proceeding permitting the protection against humidity and surface contaminants with a doped
vitreous silicon oxide film: 1 diffusion; 2 substrate; 3 glassivation; 4 conductive line; 5 metal; 6
passivation
6.2.1
Failure mechanisms
Failure mechanisms are chemical and physical processes leading eventually to the
device failure. The kinds of mechanisms that have been observed in the semicon-
ductor classification of component devices are shown in the Table 6.1. Also shown
in the table are those kinds of stresses to which each mechanism is likely to re-
spond. If some of these failure mechanisms arise, to any significant degree, in a
given device type obtained from a given process, it would not be reasonable to
expect to achieve a high reliability device. The dominant mechanisms to which the
device type may be susceptible will vary according to the peculiarities of the design
and fabrication process of that device. The failure mechanisms of the discrete semi-
conductors may be produced by three categories of defects:
200 6 Reliability of thyristors
~
Mechanical Thermal Electrical Miscellaneous
Failure 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
mechanism
Structural flaws
weak parts x x x x x x x x x
weak connect. x x x x x x x x x
loose particles x x x x
thermal fatigue x x x
Encapsulation flaws x x x x x x x
Internal contaminants x x
entrapped foreign gases x x
outgasing x x
entrapped ionisable contaminants x
base minority carrier trapping x
ionic conduction x x x x x
corrosion x x x x
Material electrical flaws
junction imperfection x x x
Metal diffusion x x
Susceptibility to radiation x
1 - static force" 2 - shock 3 - vibration" 4 - pressure (f uid)," 5 - static" 6 - shock" 7 - cycling; 8 - voltage;
9 - current; 10 - continuous power; 11 - cycled power; 12 - corrosion; 13 - abrasion; 14 - humidity; 15 -
radiation"
Mechanical defects are sometimes very easily detectable and surely very easy to
analyse. It must be cited, among others:
• inadequate soldering (thermocompression, ultrasonics, etc.); soldering is a
critical operation, asking careful controls, good organised tests and frequent
periodical inspections;
• defects of structure attaching (which lead to the growth of the thermal
resistance and to overheating);
• utilisation in the contact zone and for the connection wires of different metals
(such as gold and aluminium), incompatible with the operation conditions of
the device. An example in this respect is the formation of a compound gold-
aluminium; if the gold wires soldered on the contact zones of aluminium are
heated (thermally or electrically) at a temperature of +200°C ... +300°C this will
lead to the phenomenon named purple plague;
6 Reliability of thyristors 201
• the imperfect tightness permits the access of the contaminants and of the
humidity, which lead to surface problems (corrosion of the metallisation).
The surface defects are, probably, the predominant cause of the weak reliability of
thyristors. They can be produced by the thyristor surface imperfection, by external
contaminants collected in the encapsulation, or penetrating through an encapsu-
lation defect, or by a combination of these possibilities. Some stresses at which the
thyristor is exposed can lead to the following failure mechanisms:
• gas emissions (from the internal structure or from the encapsulation), parti-
cularly at high temperatures;
• taped humidity;
• package leakage during (or after) the manufacturing.
The surface defects comprise:
• contaminants (of glass and of the protection layer, through ionic residues of the
chemical products used by the fabrication, or produced by external agents)
which produce high leakage currents, (increasing with the applied voltages and
temperatures);
• lack of aluminium adhesion to the silicon (the hot points are due to an
inadequate distribution of the electrical currents in silicon).
The bulk defects are defects in the crystalline structure of the semiconductor,
undesirable impurities and diffusion defects. Generally, they can be detected by the
final electrical test of the thyristors. The undetected defects will contribute slowly,
in time, to the arising of wearout defects. It is considered that the structu-ral defects
result from the weak parts, from the manufacturing discrepancies or from an
inadequate mechanical design. Various tests performed during the fabrication
process are effective means to identify the structural defects and to eliminate the
inadequate thyristors.
Among the possible failure mechanisms, metal diffusion is the least significant.
The diffusion occurs over a long period of time, when two metals are in intimate
contact at very high temperatures; in this case the rate at which it progresses is too
slow to have tangible effects during the useful life. For example, many SCR's are
gold diffused at a temperature exceeding +800°C for time periods reaching two
hours. In this way it is possible to obtain desired speed characteristics. The
accomplishment of the equivalent gold diffusion at + ISO°C would require
approximately 3 x 108 h (34 000 years).
Structural flaws are generally considered to be the result of weak parts,
discrepancies in fabrication, or inadequate mechanical design. Various in-process
tests performed on the device - such as forward voltage drop at high current density
levels and thermal resistance measurement - provide effective means for the
monitoring of control against such flaws. These tests also provide means for the
removal of the occasional possible discrepant device. The failures modes generally
associated with the mechanical flaw category are excessive on-voltage drop, failure
to turn on when properly triggered,and open circuit between the anode and cathode
terminals. Because the corresponding types of failure mechanisms are relatively
rare, the incidence of these modes of failure is low.
202 6 Reliability of thyristors
Encapsulation flaws are deficiencies in the hermetic seal or passivation that will
allow undesirable atmospheric impurities - such as oxygen and moisture - to react
in such a way as to permanently alter the interface characteristics of silicon/metal.
A change in surface conductivity is evidenced by gradual increase of the forward
and reverse blocking current. Because the thyristor is a current actuated device, it
will lose its capacity to block rated voltage if blocking current degrades beyond
some critical point. This type of mechanism may eventually result in catastrophic
failure. The rate of degradation is dependent mostly on the size of the flaw and the
level of the applied stress, particularly temperature.
Failure modes} associated to the category of mechanical defects of a thyristor are
the excessive conduction voltage drop (which can be avoided if the thyristor is
correctly started) - and the open circuit between anode and cathode. As this defects
are rare, their incidence is reduced [6.1][6.3] ... [6.10].
The reliability a/thyristors depends on three main factors:
• design;
• manufacturing;
• application.
The five major stresses a thyristor can encounter in its life are:
• current;
• voltage;
• temperature;
• mechanical stresses;
• moisture.
From the reliability point of view the thyristors used in systems can be the weakest
point - for two main reasons:
a) Although the dangers represented by current, voltage, and temperature are
widely recognised, the importance of high mechanical stresses and of moisture for
thyristors is often underestimated.
b) Thyristors are most exposed to the external environment; their internal
impedance must be the lowest possible. Any form of overload (voltage or low
impedance) is immediately converted into heavy current flow that - in some cases
- can have catastrophic consequences.
6.2.2
Plastic and hermetic package problems
2 Failure mode: The effect by which a failure is observed. In failure analysis (and in adjusting of
screening tests and of tests methods), the knowledge of the fabrication methods and the
correlation between the failure mode and the device design are essential.
6 Reliability of thyristors 203
The following accelerated laboratory tests are normally used for this purpose':
Pressure cooker: 1211100 (+121 °C, 100% relative humidity RH) at 2.08 atm.
(This test can be carried out with or without bias).
85/85 (+85°C, 85% RH). This test can also be carried out with or without bias. It
is named TH (temperature humidity) and THB (temperature humidity bias),
respectively. The present trend is toward testing with bias, even though it is more
costly and causes more complex interpretation problems.
In the case of hermetically sealed thyristors, a sequence of fine and gross leak
tests can eliminate the occasional discrepant device. The use of radiflo and bubble
testing has been found very effective for the selection and elimination of inadequate
components.
The inclusion of a source of ionisable material inside a hermetically sealed
package - or under a passivation layer - can lead to failure. The failure
mechanisms are similar to those resulting from encapsulation flaws if the inclusion
is gross. If the inclusion is small - as compared with the junction area - the amount
of electrical change that occurs is limited. Thus the increase in blocking current is
not sufficient to degrade the blocking capacity of the device. This mechanism acts
even if a permanent change in the surface characteristics of the silicon does not
occur. The apparent surface conductivity of the silicon can be altered by build-up
and movement of the electrical charges carried by the inclusions. This condition is
often reversible, with recovery accomplished through the removal of electrical bias
and the employment of an elevated temperature. This category of failure
mechanism arises only if the forward blocking current can increase to the point
where forward blocking capability is impaired. The probability of occurrence is
extremely low, excepting the possible case of the small junction area, highly
sensitive devices. But this mechanism is often conteracted by a negative gate or
resistor biasing the circuit.
Removal of devices containing undesirable internal contaminants can effectively
be accomplished by means of a blocking voltage bum-in screen. the ionisation of
the contaminants under these conditions takes place rapidly, permitting a relatively
short term burn-in to be effective. Detection of discrepant devices is accomplished
by both tight end-point limits and methods to detect tum-on during the screening.
Basically this category of failure mechanism involves imperfections in junction
formation. Discrepancies of this nature are not generally experienced with SCRs
because of their relatively thick base widths and because the blocking junctions are
formed by the diffusion process, which allows consistent control of both depth and
uniformity of junction. Initial electrical classification would effectively remove any
such discrepant device.
J Since the new plastic devices are finnly encapsulated and have no internal cavity, conventional
methods of leak testing obviously are no longer applicable; it has been necessary to develop new
methods. One of these methods is the pressure cooker type, which has been found very effective
in detecting devices with defective passivation.
204 6 Reliability of thyristors
6.2.3
Humidity problem
When environmental humidity reaches the die, after a certain time, it can cause the
corrosion of aluminium. Corrosion - a very complex phenomenon - may be gal-
vanic or electrolytic.
Galvanic corrosion requires two metal and an electrolyte. The corrosion
processes are complicated by the fact that the metals are usually protected by oxide
films, which are themselves attacked by impurities, such as the cr
ion, which starts
the reaction.
On the other hand, electrolytic corrosion occurs when there is a cell consisting
of two metallisations (even of the same type of metal, here - aluminium), but with
externally applied bias. The presence of impurities sparks off the reactions [6.5]:
Al + 3CT -fAI3 + + 3CT + 3e (anodic reaction). (6.3)
The ionised aluminium is transported to the cathode, where we have:
A13 + + 3e -f Al (cathodic reaction). (6.4)
But the aluminium is not able to deposit in these conditions, and in the presence
of humidity the following reaction occurs:
(6.5)
Corrosion appears as an interruption (open circuit) in the aluminium or in the
bonds, preceded at times by the degradation of the electrical characteristics of the
device (e. g. increased leakage current). The corrosion is therefore accelerated by
the impurities carried by the H 20 when it crosses the resin and laps against the
metal surfaces of the frame and by the voltage applied to the device (electrolytic
corrosion). The phenomenon is delayed by passivating the die and by increasing the
thickness of the aluminium metallisations.
Humidity tests are therefore used to evaluate:
• plastic-frame adhesion and possible package cracks;
• permeability of plastic to water and corrosive atmospheric polluants;
• plastic, die attach, and frame-plating purity (ionic contamination);
• passivation quality (condensation occurs mainly in passivation cracks);
• design characteristics (i. e. aluminium thickness, quality, and morphology; inter-
nal slug and frame geometric design; passivation type; phosphorous content).
6.2.4
Evaluating the reliability
• The laboratory tests consider one stress (or a few simultanously stresses), as opposed to the
large variety of stresses encountered by the device during operation in the field. The laboratory
tests (for example pressure cooker or 85/85 tests) have a dual aim: i) If we know the
acceleration factors between one test and another, and between the conditions in the laboratory
and those in field, it is possible to evaluate a certain useful device life - if certain laboratory
tests are passed. ii) Laboratory tests are also used to compare different constructive solutions or
products from different suppliers, even if the acceleration factors of each test are not known
exactly.
206 6 Reliability of thyristors
(iii) Study of physical laws governing the various failure mechanisms, and
determination for each failure mechanism of its dependence on a particular stress.
(iv) Development of more and more sophisticated analysis techniques to find the
causes offailure of devices that fail during testing or while in use.
(v) Study and determination of screening techniques and preconditioning to
remove infant mortality before the product is used.
(vi) Theoretical studies of general laws governing reliability (reliability mo-
dels).
(vii) Study of the best systems for collecting and interpreting the data obtained
from laboratory test and from the field (data banks and statistical analysis for a
correct interpretation of the results).
(viii) Transfer to production people of the reliability knowledge acquired during
the design of devices and processes, designing at the same time suitable reliability
checks during the production process.
(ix) Transfer of acquired reliability knowledge to the designer of the thyristor
application in order to forecast and to optimise the reliability.
Since semiconductor technology is continuously evolving, obviously the
problem of studying the reliability of these device is also more and more complex.
6.2.5
Thyristor failure rates
An individual component part, such as a thyristor, does not lend itself to reliability
measurement in the same manner, as does a system. For this reason, the statistical
approach to estimating device reliability is to extrapolate the performance observed
by a sample quantity of devices to the probable performance of an infinite quantity
of similar devices operated under the same conditions for a given period of time.
The statistical measurement is based on unit hours of operation, using a sampling
procedure whose derivation takes into account the resolution with which the sample
represents the population from which it was withdrawn and the general pattern of
time behaviour of the devices.
Some practical observations:
(i) It would be extremely difficult to perform an accurate test demonstration to
verify even a failure rate of 1.0%/1 OOOh, since the test equipment and
instrumentation must have a greater MTBF in order not to adversely affect the test
results. The problem becomes more complicate as the failure rate being tested
decreases: not only test equipment complexity increases, but its MTBF must be
increased at the same time!
(ii) The terminology failure rate is perhaps a poor choice of words. To the
reliability engineer it relates the performance of a limited number of observations to
the probable performance of an infinite population. To those not familiar with the
used statistics, unfortunately the impression of actual percent defective is
transmitted.
Graphical presentations (Fig. 6.3 ... 6.5) have been found very useful to electronic
device users as a guide for reliability predictions.
6 Reliability of thyristors 207
Example: A sample of 950 devices C35 were subjected to full load, intermittent
operation of 1000h duration in formal lot acceptance testing to MIL-S-1950011 08.
Only one device was observed to be a failure to the specification end point limits.
The calculation of failure rate based on these results indicates the failure rate to be
no more than 0.41 % for 1000h at 90% confidence level.
6.3
Derating
The most probable thyristor failure mechanism is the degradation of blocking ca-
pability, as a result of either encapsulation flaw - or damage - or internal contami-
nants. The process can be either chemical or electrochemical, and therefore variable
in rate according to the degree of temperature and/or electrical stress applied. Thus
it is possible by means of derating (using the device at stress levels smaller than the
maximum ratings of the device) to retard the process by which the failure of the
occasional defective device results. This slowdown of the degradation process
results in lower failure rate and increased MTBF.
Example. A sample of 778 devices is tested under maximum rated conditions for
1000h with one failure observed. The calculated A is 0.5%11 OOOh and the MTBF is
200 OOOh. If the failed device would have remained within limits at the 1000h point
because of lower applied stresses, the calculated A becomes 0.3xlO,5h'! and the
MTBF increased to 333 OOOh.
The relationship of applied stress to General Electric SCR device failure rate is
shown graphically in Figures 6.3 ... 6.5. The model that describes the relationship of
these stresses to A, is the Arrhenius model:
A= i + BIT; (6.4)
where A = failure rate expressed in %11 OOOh; T; = junction temperature (Kelvin
degrees); A and B = constants.
The Arrhenius model has been successfully applied by the General Electric to
extensive life tests data involving thousands of devices and millions of test hours.
The data was obtained from product design evaluations, military lot acceptance
testing, and several large scale reliability contracts.
A thorough examination of the data on all General Electric SCRs revealed that
these three graphical presentations could describe the results of derating failure rate
for the entire family of SCRs with reasonable accuracy. The use of these graphical
presentations is quite straightforward. Suppose, for example, that one intends to put
a C35D thyristor under some stress conditions (200 volts peak and a junction
temperature of +75°C) in a circuit. This circuit will become inoperative when the
electrical characteristics of the SCR change to values outside of the specification
limits. This is a definition of failure and this means that the solid lines on the
graphical presentations must be used. Since the rated junction temperature of the
C35D thyristor is + 125°C, Fig. 6.4 must be used. Projecting a horizontal line from
the intersection of the +75°C junction temperature ordinate and the applicable per
cent of rated voltage curve (50% in this example), we obtain an estimated A of
0.08% per 1000h at 90% confidence level. If - due to a change in the design of the
208 6 Reliability of thyristors
circuit - only devices which failed catastrophically (opens or shorts) would cause
the circuit to become inoperable, the dashed curves could be used. This would
result in an estimated A of 0.008% per 1000h at 90% confidence.
A (%/lOOOh) at
90% confidence level
10 ~----~----~----~----~----~
100
75
50
0.1 25
0.000001
175 150 125 100 75 50 Junction temperature ('C)
Fig. 6.3 Estimated A of a standard SCR depending on junction temperature, reverse and/or
forward voltage, and failure definition for a maximum rated junction temperature of +1OocC
A (%/lOOOh) at
90% confidence level
10 ~----~----~----~------~----
100
75
50
0.1
. : : ::: : : : : :. . t::::::::~
...... .........:::::::::::..... ~~
25
0.01
100
0.001 75
50
0.0001 25
A (%/lOOOh) at
90% confidence level
10 ~----~----~----~----~----,
100
75
50
0.1 25
0.01 100
75
0.001 50
25
0.0001 percent of rated reverse voltage
and/or forward blocking voltage
0.00001
0.000001
175 150 125 100 75 50 Junction temperature ('C)
Fig. 6.5 Estimated A of a standard SCR depending on junction temperature, reverse and/or
forward voltage, and failure definition for a maximum rated junction temperature of + 150 a C
6.4
Reliability screens by General Electric
According to the fact that as more effective procedures are developed, the reliabil-
ity screens [6.1] are updated. In the following an example of one of these reliability
screens specifications is given:
J 00% preconditioning tests
1. High temperature bake at 150a C for 168h minimum.
2. Temperature cycle, MIL-STD-202C, method 107B, test condition F, excep-
ting that 10 cycles instead of 5 are performed.
3. Thermal resistance Gunction to case) = 2°C/w.
4. Blocking burn-in TA = +122°C ± l.5°C, PRY = VBO = 400V, time = 100h
minimum.
5. Forward and reverse leakage TA = +25°C, PRY = VBO = 400V,
IR = Is = 2mA maximum.
6. Gate trigger voltage TA= +25°C, VGT = 3V maximum.
7. Gate trigger current TA = +25°C, IGT= 40mA maximum.
8. Forward voltage drop TA = +24°C, Irt.peak) = 50A, Vr= 2V maximum.
9. Forward and reverse voltage TA = +125°C, PRY = VBO = 400V,
IR = Is = 5mA maximum.
10. Gate trigger voltage TA = +125°C, VGT = 1.5V maximum, O.25V minimum.
II. Gate trigger current TA = +125°C, IGT = 30mA maximum, O.5mA minimum.
210 6 Reliability of thyristors
6.5
New technology in preparation:
the static induction thyristor 51TH
c d c' Wn
D D'
The region of interest for the analysis is divided with quasispace grids. Poisson's
equation is discretised in a manner of five-diagonal band matrix. A minimum
potential appears at the channel axis (Fig. 6.7).
Multiplying the potential by electron charge, the resulting energy distribution
along the channel axis has an extreme point (as shown in Fig. 6.8) which acts as a
barrier to the electrons. Obviously no electrons can be injected from cathode region
as long as the barrier is high enough [6.12][6.13]. As for the holes injected from the
p+ layer of anode region (if possible), they cannot find a way to take part in the
conducting. So it must be concluded that essentially there is no injected holes in the
channel. Therefore there is no significant current flowing in the blocking state
device.
212 6 Reliability of thyristors
Potential ¢ (V)
0.6
0.4
0.2
o /
-0.2 7
\ /
\ '\
-0.4
/
-0.6 /
-0.8 ~ ../
V
o 2 4 6 8 10
Channel axis x (JIlII)
Fig. 6.7 Potential distribution in SITH along channel axis
20
-1.5 15
-2.5 10
-3.5 5
Fig. 6.8 Electron energy distri- Fig. 6.9 Barrier height versus gate bias
bution along channel axis
However, the barrier vanishes when the gate bias changes in its polarity as
shown in Fig. 6.8. Then the device transfers to on-state which allows larger
currents. In contrast, it is possible to tum off the device by changing the gate bias to
a sufficient negative value. An important feature of the device is that the barrier
height can be scaled artificially either by the gate bias or by the anode bias. In Fig.
6.9 the manner in which the barrier height varies with the gate bias is shown. At
6 Reliability of thyristors 213
References
6.1 Grafham, D. R; Golden, F. B. (eds) (1979): SCR Manual, sixth edition. General Electric,
Auburn,N. Y.
6.2 Motto, 1. W. (1977): Introduction to Solid State Power Electronics. Westinghouse Electric
Corp., Pennsylvania
6.3 Locher, R. E. : Thermal Mounting Considerations for Plastic Power Semiconductor
Packages. Application Note 200.55, General Electric, Auburn, N. Y.
6.4 Antognetti, P. (Ed.) (1986): Power Integrated Circuits. McGraw-Hill, New York
6.5 Borri, F. R.; d'Espinosa, G. (1986): Power Integrated Circuit Reliability. In: Antognetti, P.
(ed.) Power Integrated Circuits. McGraw-Hill, New York
6.6 Biijenesco, T. I. (1981): Probh:mes de la fiabilite des composants electroniques actifs
actuels. Masson, Paris /Arm, Suisse
Biijenescu, T. I. (1985): Zuverlassigkeit elektronischer Komponenten. EDV Verlag, Berlin
Biijenescu, T. I. (1996): Fiabilitatea componentelor electronice. Editura Tehnica, Bucharest
6.7 Cristoloveanu, S.; Li, S. S. (1995): Electrical Characterization of Silicon-on-Insulator
Materials and Devices. Kluver Academic Publishers
6.8 AEG-Telefunken (1985): Gate Turn-Off Thyristors. Technical data
6.9 Lawson, R. W. (1974): The Accelerated Testing of Plastic Encapsulated Semiconductor
Components. Reliability Physics
6.10 Ajiki, T. et al. (1979): A New Cyclic Biased THB Test for Power Dissipating IC's.
Reliability Physics
6.11 Li, S. Y. et al. (1995): Theoretical Analysis of Static Induction Thyristor. Proceedings of
the Fourth International Conference on Solid-State and Integrated-Circuit Technology.
Beijing (China), October 24-28
6.12 Bulucea, C.; Rusu, A. (1987): A First-Order Theory of the Static Induction Transistor.
Solid-State Electron., vol. 30, pp. 1227-1242
6.13 Akira, Y. (1987): Investigation of Numerical Algorithms in Semiconductor Device
Simulation. Solid-State Electron., vol. 30, pp. 813-820
6.14 Biijenescu, T. I. (1984): Sur la fiabilite des thyristors. Electronique, vol. 4, pp. 26-31
6.15 Grafham, D. H.; Hey, J. C. (1977): SCR-manual. Fifth edition. General Electric, Syracuse,
New York
6.16 Bodea, M. (1989): Diode si tiristoare de putere (Power diodes and thyristors). Ed. Tehnica,
Bucharest (Romania)
6.17 Ackmann, W. (1976): Zuverlassigkeit elektronischer Bauelemente. Hiithig-Verlag, Heidel-
berg
6.18 Anderson, R. T. (1976): Reliability Design Handbook. lIT Research Institute, Chicago
6.19 Bell Communications Research (1985): Reliability Prediction Procedure for Electronic
Equipment. (TR-TSY-OOO 332), Bell, Morristown NJ
6.20 Dombrowski, E. (1970): Einfiihrung in die Zuverlassigkeit elektronischer Gerate und Sys-
teme. AEG-Telefunken, Berlin
6.21 Doyle, E. A. Jr. (1981): How parts fail. IEEE Spectrum, October; pp. 36-43
214 6 Reliability of thyristors
6.22 Kao, J. H. K. (1960): A summary of some new techniques on failure analysis. Proc. Annual
Symp. Reliability, pp. 190--201
6.23 Kapur, K. L., Lamberson, L. R. (1977): Reliability in Engineering Design. 1. Wiley and
Sons, New York
6.24 Siemens, SN 29 500 (1986): Failures Rates of Components. Zurich, Siemens-Albis
6.25 Villemeur, A. (1988): Sfuete de fonctionnement des systemes industriels. EyroIles, Paris
7 Reliability of monolithic integrated circuits
7.1
Introduction
Even from the beginning, the semiconductor industry was characterised by a high
innovation rate. A spectacular moment was the appearance of the integrated circuits
on the market, allowing high cuts of price and performance growth. The first inte-
grated circuit (reported by Jack Kilby and Robert Noyce) was not a sudden discov-
ery, being prepared by previous devices. Invented in 1958, the solid-state circuit
was developed in 1959, when the planar technique arises. This was the milestone
for subsequent development of the monolithic integrated circuits, containing bipo-
lar and unipolar (mostly MOS) transistors, based on a silicon substrate. The global
market for semiconductor devices increased with 15% per year in the last twenty
years, reaching $ 140 billion in 1997.
The huge progress obtained in the integrated circuit field led to smaller dimen-
sion electronic equipment and reduced costs, but also to improvements in power
capability, reliability and maintainability. The predictions are that the strong world-
wide increase of computer and communication market will lead to an even higher
growth rate of semiconductor industry in the next decade, 20% per year, with a
level of $ 300 billion immediately after the year 2000 [7.1].
The complexity of les increased every year. In fact, Gordon Moore, in the 70's,
talked about a doubling of Ie complexity every 18 months, with a corresponding
decrease in cost per function'. This became the so-called Moore's Law, proved to
be true for more than the subsequent twenty years. Many factors contributed to
keeping this model on course: the improvement of design tools and manufacturing
technologies, but also the permanent growth of the reliability level. The intrinsic
reliability of a transistor from an Ie improved with two orders of magnitude (the
failure rate decreases from 1O·6h·] in 1970, to 1O·8h·! in 1997). But also, in the same
period of time, the number of transistors per device increased with 9 orders of
magnitude! Therefore, the Ie reliability increased even faster than the prediction
given by Moore's Law. The model for reliability growth was called "Less's Law",
taking into account the known philosophy from the architectural design: "Less is
, The cost per function is made up by two terms: alN (increasing with complexity [7.2] and
representing the chip cost) and C/N (representing assembly and testing costs), where N is the
number of functions and a, b, c are constants.
More" [7.1]. Actually, Less's Law means a tremendous increase of the require-
ments for the IC's failure rate: from 1000 failures in 109 devices x hours (or 1000
Fits), now only some Fits in a single digit are required. It is worthwhile to note the
change in the predictions made by the Semiconductor Industry Association (SIA)
in the editions 1994, 1995 and 1997 of the National Technology Roadmap for
Semiconductors [7.1][7.3]. From Table 7.1, one may see that the forecast was
overpassed by the reality: the performances previewed in 1994 and 1995 for 1998
were attained earlier, in 1997.
Table 7.1 Predictions for Si CMOS technology development: 1994, 1995 and 1997 editions of
the National Technology Roadmap for Semiconductors
0.7
...... eu
0.5 ~,.:::::::: ..... AI
0.3
0.1
1980 1985 1990 1995 2000 2005 Year
Fig. 7.1 Evolution of the metallisation technology and corresponding allowed current densities
7 Reliability of monolithic integrated circuits 217
Oxide layer - - - .
P P
I I P
a b
.
C
N N N
P P P
d e f
Fig.7.2 Main sequences of the planar process: a starting material; b deposition of an epitaxial n
layer; c passivation (with an oxide layer); d photolithography; e diffusion of a p+ layer; f metalli-
sation
The fabrication consists of a series of sequences (see Fig.7.2). The starting mate-
rial is a wafer of monocrystalin semiconductor (silicon, but also galium-arsenide)
with a width of 400filll and a diameter of 3-5inches. First, an oxidation is per-
formed, by heating the wafer at high temperature (lOOO-1200°C) in oxygen atmos-
phere. In this way, a uniform oxide layer, with a thickness ofO.l-1.5Ilm, is formed
on the whole wafer. The local removement of the oxide is made by photolithogra-
phy: a photographic process, with the aid of a photoresistant layer, allowing to
218 7 Reliability of monolithic integrated circuits
obtain very small windows (some 11m2) by etching the undesired arias with appro-
priate chemicals. In these windows, doping impurities are diffused or implanted.
The diffusion and ionic implantation are accurate procedures for modifying the
electrical properties of the silicon layers, the essential element of the integrated
circuit technology. One of the most important subsequent operations is the metalli-
sation, meaning the interconnection, by a deposited metallic layer (most often alu-
minium), of all diffused elements. After the achievement of the processed wafer,
the back-end fabrication begins. The wafer is tested, scribed and broken (the chips
are separated). Than, each chip is soldered on a header. The operation is called die
bonding. Next, each metallic area (called pad) is connected by means of gold or
aluminium wires (with a diameter of 25-351lill) to the terminals, in an operation
named wire bonding. Finally, a package made by metal, ceramic or plastic material
covers the whole assembly.
Basically, two types of integrated circuits were developed since now: bipolar and
MOS, taking into account the basic cell: bipolar transistors or MOS ones, res-
pectively. In the beginning, the MOS ICs were n-channel MOS ICs or p-channel
MOS ICs, but sooner complementary MOS ICs (or CMOS IC) including both types
were developed. The main characteristic of CMOS circuits is the small supply
voltage. As the portable-electronics market increases, low-power and low-voltage
technologies, such as CMOS, became the most used. Also, the technological im-
provements leading to the remove of sodium contamination in the Si-Si02 system
encourage the use of CMOS ICs, because a high reliability level becomes possible
to get. Recently [7.4] reduced standard digital CMOS power supply voltage of3.3V
was obtained, reducing the power consumption by 70%. These ULP (Ultra low
power) CMOS ICs were deeply investigated [7.4] and proved to have a large po-
tential.
The last challenge in the IC family is the microsystem. Arisen in the early 90's,
the microsystem represents a superior integration step compared with common ICs:
the "intelligent" element (the signal processing part) is integrated with micro sen-
sors and with micro actuators, in a single component, basically still an IC [7.5]. In
fact, the micro system is a "smart" sensor, able also to actuate. This device deter-
mines the development of some new micro technologies. Many disciplines being
involved, hybrid terms, such as: mechatronics, chemtronics, bioinformatics were
used [7.6], but the term microtechnology (technology of microfabrication) seems to
be the most adequate. In Europe, the term microtechnology includes both microe-
lectronics (the "classical" devices) and micro system technology (MST) [7.76].
Other related terms are MEMS (Micro-Electro-Mechanical Systems), BIO-MEMS
(BIOlogical MEMS) and MEOMS (Micro-Electro-Opto-Mechanical-Systems).
Silicon is still the basic material and the CMOS technology can be used for the
manufacturing.
Recently, a new term, nanotechnology, was proposed, because the structures
have now characteristic features of a few nanometers. Accordingly, the tools used
for manufacturing these new technologies (micro- and nano-) are called micro-
machines and nanomachines, respectively.
7 Reliability of monolithic integrated circuits 219
7.2
Reliability evaluation
7.2.1
Some reliability problems
7.2.2
Evaluation of integrated circuit reliability
Generally, three main problems arise at the evaluation of integrated circuit reliabil-
ity.
1. For modem devices, the failure rates decrease under a certain limit and the con-
ventional methods become less usable. To overcome these difficulties, two solu-
tions may be discussed:
a) To perform on a very high number of integrated circuits reliability tests in
normal operational conditions, with the duration of a couple of years. Obvi-
ously, this solution is unacceptable. As an example, if one has to test a failure
rate of 1O-9h- 1 (called also 1 FIT), 1000 devices must be tested for 114 years
and only one device to be found defect.
b) To perform on some integrated circuits reliability tests in higher than normal
conditions, the so-called accelerated tests. This method may be applied only if
at the accelerated tests the failure mechanisms are the same as for normal op-
erational conditions. And this fact must be indubitably demonstrated.
The accelerated tests are used in the purpose to obtain quickly and with a minimum
of expense information about the reliability of the product. The used stresses are
220 7 Reliability of monolithic integrated circuits
higher than for normal operational conditions, the results are extrapolated and the
failure rate for normal conditions is obtained. Usually, the accelerated tests contain
combinations of stresses such as: temperature, bias, pressure, vibrations, etc. [7.32].
If the temperature is the only variable of the accelerated tests, the Arrhenius model
may be used. To obtain reliable results, relatively short testing times must be used2•
So, using various levels of the same stress factor one may follow the real behav-
iour3 • The analysis of the physico-chemical process leading to failure allows ob-
taining the correlation between the speed of these phenomena and the stress and, as
a result, the real dependence of time to failure on stress levels.
2. The rapid development of the manufacturing technology for integrated circuits,
needed by the aim to improve the control and to reduce the costs, makes difficult
the reliability evaluation. Usually, any modification in the technology or used mate-
rials is followed by the appearance of a new failure mechanism. Consequently, any
manufacturing modification must include a new reliability evaluation.
3. The last problem is linked to the increasing complexity and costs of integrated
circuits vs. discrete devices. Although the cost of a certain electronic function de-
creases substantially by integration, the basic costs are always higher for an inte-
grated circuit than for a discrete component fulfilling the same function.
The definition of the failure criteria is, unavoidably, very difficult because the
complexity ofICs is increasingly higher. Even for a simple device, like a transistor,
it is hard to define the failure limits. For an integrated circuit, the basic parameters
are more complex and hard to be measured and the degradation of these parameters
differs from an utilisation to another.
For evaluating the various stresses able to be used in reliability accelerated tests,
the following aspects must be taken into account [7.7][7.8]:
• The stress must be encountered in the operational environment. In principle,
one must note that the failure rate of integrated circuits is influenced by the
thermal, electrical and mechanical conditions of the operational environment.
But for common industrial use, mechanical shock and vibrations have a little
influence on the integrated circuits encapsulated in epoxy packages, able to
assure the necessary mechanical stability and a good protection. For instance,
the acceleration measured at a sudden stop of a running car reaches 40g, for
airplanes take-off and landing - up to 5g and for missiles - up to 50g. Compare
these values with the acceleration level used for periodic tests: 30,OOOg.
Consequently, mechanical factors will be used only rarely for accelerated tests.
On the contrary, the temperature is the most used stress for this kind of tests.
The experimentally observed correlation between failure rate and temperature
is based on the fact that the speed of chemical reactions arising in the device is
thermally increased.
• The failure mechanisms must be allways those arising in the operational
environment.
2 Even if the purpose is to minimise the testing time, a too stronger stress level must not be used,
because new failure mechanisms may be induced.
3 If the time is the accelerated variable, this means that an hour of tests at high stress level
produces the same effect on the component reliability as n hours at normal operation time.
7 Reliability of monolithic integrated circuits 221
• All samples of integrated circuits used in accelerated tests must behave in the
saem way at a stress modification: the same circuits should be the first to fail at
any stress level.
7.2.3
Accelerated thermal test
The use of accelerated tests starts from the presumption that the possible failure
mechanisms are well known. The systematic use of the temperature as a
accelerating stress was introduced by Peck [7.9], at the beginning of the 60's.
Received initially with hesitations and scepticism, the technique became useful for
component producers and users. The experience in the utilisation of electronic
components shows that the life duration is not an infinite one and the operational
conditions are important. For initially "good" integrated circuits, failures before the
end of the normal life duration were found, such as:
• catastrophic failures, breaking the normal operation,
• drift failures, producing defective operation by an important time variation of
the electrical characteristics.
One must understand that the appearance of a failure is not a proof that the life
duration is smaller. The drift failure is hard to define, depending of the drift
threshold stated as a failure criterium. In practice, the accelerated thermal tests are
not sufficient for estimating the reliability of a product. Step stress tests must also
being performed [7.1 0). In this case, the initial hypoteses are that the stress has no
memory and that the wearout does not arise. Samples withdrawn from the batch
undergo these test to an increasingly higher stress, such as: temperature, bias,
mechanical stress, etc. A careful analysis of the results allows an accurate
estimation of the product reliability [7.33]. For plastic encapsulated integrated
circuits the leak test has no sense because the package is voided. On the contrary,
humidity penetration tests are reccomended. Concerning the equivalence between
the operating hours at the standard temperature of 55°C and the operating hours at a
higher or smaller temperature, the acceleration factors are presented in Table 7.3
[7.11], for various activation energies. (In Table 7.2, the acceleration factors for
functionning at 125°C vs. the normal ambient temperature of 25°C, for various
activation energies are given).
It is important to note that without knowing the value of the activation energy,
no correct analysis of the data obtained from laboratory tests is possible. It is worth
to be mentioned that the international standardisation body do not take allways into
account the importance of the activation energy for data processing. For instance,
the standard MIL-HDBK-217D chose for the bipolar technology an overall
activation energy of O.4eV and MIL-STD-883C, method 10005-2 uses for all
devices the value of 0.7eV. The accelerated thermal stress has a big disadvantage:
there is a high probability that new failure mechanisms occur at certain stress level.
This disadvantage disappears for comparative evaluation of subsequent batches of
integrated circuits. In any case, the thermal acceleration is not a panacea for saving
time or money in estimating the life duration of integrated circuits.
222 7 Reliability of monolithic integrated circuits
Table 7.3 Acceleration factors for various activation energies and testing temperatures vs. a
testing temperature of 55°C
7.2.4
Humidity environment
Compared with high temperature testing, the experience aquired in high humidity
testing is very small. However, the law for failure of integrated circuits due to high
humidity seems to be a log-normal one [7.32] and the average life time to depend
directly proportionally to the vapour pressure of the humid environment. In the
early days, to perform tests in such an environment, a temperature of 2SoC and a
relative humidity of 75% was reccomended. A test performed in these conditions
and with a duration of20 days, with bias, simulated an ageing of20 years [7.12].
7 Reliability of monolithic integrated circuits 223
Lately, a more efficient test is used, the so-called "85/85 test" (85°C and 85%
relative humidity).
In the 70's, the plastic package had a high permeability, which may led to
catastrophic failures by the corrosion of the aluminium metallisation. In any case, it
was difficult to predict te behaviour of plastic encapsulated circuits, because the
tightness was not warranted. For special utilisation conditions, the metallic or
ceramic packages were recommended. The latest developments in plastic packages
(see Chap. 12) led to the achievement of high reliability plastic packages, usable in
the most hostile environment.
7.2.5
Dynamic life testing
The failure of an IC in operational life is an unpleasant event not only because the
owner of the equipment must replace it, but also because this failure may induce
serious damages to the equipment, loss of important information or even of human
life. Therefore, it is desirable to replace a IC before failure. From economical
reasons, this replacement must take place shortly before the anticipated failure.
This implies that the lifetime of the IC be accurately estimated. This operation may
be done only if laboratory tests simulating as closed as possible the real operational
life are performed. In this respeCt, in the laboratory, not only static, but also
dynamic testing must be done. The purpose is to quantifY the performance
degradation during IC operation. An example of such testing is given by Son and
Soma [7.13]. First, the IC parameters which will be monitored during dynamic life
testing are chosen, by two criteria: i) to be measurable at existing pin-outs, and ii)
to predict progressively IC degradation. Than, the typical failure and degradation
mechanisms must be studied. In fact, there are two major types of degradation
mechanisms: electrical ones (such as: latchup, ESD, hot-carrier effect, dielectric
breakdown, electro migration, etc.) and environmental ones (produced by thermal
and mechanical stress, humidity, etc.). By means of appropriately chosen electrical
parameters (such as: static / transient current level change, noise level in current,
cut-off frequency, input offset voltage of CMOS differential amplifier, etc.), these
mechanisms are monitored during dynamic life testing.
Eventually, aging models for various failure mechanisms must be elaborated. In
[7.13] a model for hot-carrier effect is given. Starting from a widely accepted
empirical relationship between parameter deviation and the elapsed stress time for
the hot-carrier degradation mechanism, given in [7.14], an aging curve due to hot-
carrier effect under the static or periodically repeated AC stress was obtained
[7.13], defined by the equations:
(7.1)
the electron charge (1.6 x 1O-19C), P - the hot electron mean free path, Ech - the
channel electric field, /).VN 0 - circuit aging and t - elapsed stress-time. In a
loge/).VN 0) vs. log t plot, a straight line with slope a and y-intersection log k is
obtained (see Fig. 7.3).
From the case study given in [7.13], one may understand the procedure for IC
replacing before to occur a failure by hot-carrier effect. For a 31-stage inverter
chain, designed according to MOSIS 0.8 Iill1 HP technology rules, the device
operation was simulated, the ageing being modeled by randomly changing device
parameters. Based on this model, the probability of survival until the next
inspection may be quantified at each inspection of dynamic-life testing. Then, the
optimal moment for replacement may be calculated with respect to maintainance
cost (the recovery cost of an unanticipated failure and the wasted cost of replacing
an IC too early).
logk
log t
Fig.7.3 A log (llVNo) vs. log t plot for hot-carrier degradation mechanism
7.3
Failure analysis
7.3.1.
Failure mechanisms
Any user of an integrated circuit wants to eliminate from the beginning the future
failing devices. To do that, the typical failure mechanisms must be known. In the
following, some examples of typical failure mecahnism for integrated circuits will
be given.
In general, the failure mechanisms of integrated circuits are divided in three
categories, refering to: wafer (chip), die-package connections and package,
respectively.
In the 60's, the chip and wire solders were the main critical problem, leading to
20-30% defects at these operations. The numerous technical and technological
advances obtained lately improved substantially the situation, without solving all
the problems. The increase of the integration degree brings about many problems
linked to the chip and case reliability, respectively. With the integrated circuit
complexity, the failure potential increases too, because the external factors (static
7 Reliability of monolithic integrated circuits 225
7.3.1.1
Gate oxide breakdown
This is a typical mechanism for MOS ICs. Shorts through the thermal oxide
between the metallisation and the silicon may arise. The thinner gate oxide (several
hundreds of angstroms) may be affected by this phenomenon, especially if defects
or impurities are present in the oxide layer. As a screening test, voltages higher than
the rated value are applied and the devices with too thin oxide layer (or with
defects) are removed. Also, some gate protection circuits (reverse-biased pn
junction with controlled breakdown characteristics) may be used to absorb large
pulse energies.
This mechanism is time-dependent, being known as TDDB (Time Dependent
Dielectric Breakdown), a major problem for MOS ICs. The model SAG (Shatzkes
IAv-Rom I Gdula) [7.41] tried to explain TDDB for this gate oxide. They take into
account only a single defect, attributed to very small weak spots and arising at
metal-oxide interface, where the barrier height (2.3eV) for electron injection into
the oxide is lower than the barrier height (3.2eV) of the defect-free area. Further, by
taking into account the interaction (synergies) between applied stress (temperature,
electric field, etc.), other models were developed [7.42]. By using proportional
hazard approach, with Weibull or lognormal distributions, an accurate model was
obtained [7.43].
226 7 Reliability of monolithic integrated circuits
7.3.1.2
Surface charges
One knows that for a MOS transistor the conductivity type and the resistivity of the
semiconductor surface are modified by the presence of a zone situated in the closed
neighbourhood or being separated by a thin dielectric layer. Such a charging
phenomenon produces on an unprotected silicon surface an absorbtion of the
organic mobile ions, moved by the action of an electric field. This phenomenon
leads to the deplacement of the pn junction on the surface and may produce a
charging area4 • These areas may extend and can arrive in contact with ground
linked regions, producing a shortcircuit. The symptoms are: smaller breakdown
voltages and higher leakage currents. The use of phosphosilicate glass (PSG) as a
passivatrion layer onto the thermal oxide is often used as a getter for sodium ions
(they are fixed and do not migrate through the oxide).
7.3.1.3
Hot carrier effects
The electrons or holes from the channel of a MOS transistor can gain high energy,
being able to penetrate the gate dielectric by ionisation impact and producing a
current multiplication by creating aditional electron-hole pairs. Then, if they
continue to gain energy, an injection into the silicon, by surpassing the energy
barrier can occur. Consequently, the carriers become trapped in the oxide. The
number of trapped carriers depends on the density of available traps from the
silicon dioxide.
One may distinguish three types of hot carriers: channel ones (the carriers
traversing the channel, and undergoing a low number of lattice collisions under the
influence of the strong lateral electric field), substrate ones (thermally generated in
the substrate and drifted by an electric field towards the interface) and avalanche
ones (created in avalanche plasma and undergoing, due to the strong lateral electric
field, a high number of impact ionisations). The substrate current produced by
impact ionisation can induce bipolar latch-up in CMOS structures and the hot
carriers injected in gate oxide form interface states and trapped oxide charge. In
time, this charge causes instabilities and parameter drift. These serious reliability
problems are increasing with the decreasing of the device geometries. A correction
method is to limit the source-drain voltage to values below the threshold for the
generation of hot carriers.
7.3.1.4
Metal diffusion
4 The higher the semiconductor resistivity, the smaller the charge value.
7 Reliability of monolithic integrated circuits 227
Temperature
('C)
250
200
150
100
50
7.3.1.5
Electromigration
It is known that in any conductor transporting an electric current, only few metallic
atoms are activated. During the activation period, the atoms undergo the action of
two contrary forces: an electrostatic force and an impact force (due to the electron
collisions). The action of these forces becomes manifest by the movement of the
activated atoms along the conductor. Therefore, for a current density of l06A1cm 2,
in an aluminium conductor on a silicon chip (Tj = 150°C), after 3-4 days a migration
228 7 Reliability of monolithic integrated circuits
of the aluminium occurs, creating hillocks and voids and increasing the current
density in the rest of the conductor. Consequently, the migration is multiplied and
the holes in the aluminium become a conductor interruption.
One can distinguish two types of electromigration: solid-state one and
electrolytic one, respectively [7.34]. Solid-state electromigration starts at a local
temperature above 150°C and current densities above 104A/cm2. It is a well-studied
phenomenon and efficient measures to avoid it have been proposed. Some
examples will be presented. The addition of copper [7.35] or titanium [7.36] allows
higher current densities before electromigration arises. Other methods proposed
are: to encapsulate conductors with dielectrics [7.37], to cover the aluminium
conductor with a chemical-vapour deposited silicon dioxide layer or to grow an
anodic layer onto the aluminium conductor [7.39]. Wada et al. [7.40] suggested
some surface treatments, proved to be very effective. An oxygen plasma treatment
(in a barrel reactor), preceded or followed by an annealing at 450°C, for 30
minutes, in forming gas led to significan improvements of the mean time to failure.
Details are given in Table 7.4.
No - 1
40 min. After 3
80 min. After 4
Obviously, from Table 7.4 one may conclude that oxygen plasma treatment must
be performed after annealing. Another method is a water dip treatment (after
aluminium metallisation, the wafer is dipped in H20 for 4 minutes, before or after
resist strip and annealed at 450°C, for 3 minutes, in forming gas). The result was an
improvement more than 3 times of the mean time to failure.
7.3.1.6
Fatigue
In a semiconductor device, the internal mechanical forces work in the areas of high
contact, where it is difficult to match the dilatation coefficients of the copper, kovar
and steel, for instance. To improve the situation, intermediate layers ofmolibdenum
or wolfram are used. After repeated temperature cycles, the structure of these
7 Reliability of monolithic integrated circuits 229
materials is modified, the cohesion force between the granules decreases and cracks
may occur.
7.3.1.7
Aluminium-gold system
For aluminium metallisation connected with gold wire, five main failure
mechanisms are known [7.15]:
Purple plague produced by metallic compounds formed at high temperatures
between aluminium metallisation and gold wires. As a consequence of this
phenomenon, an important degradation of the semiconductor reliability occurs,
because the gold / aluminium solder point becomes frail and any mechanical stress
(even a weak one) may lead to an open contact.
Electrolythic corrosion is a permanent menace for the aluminium metallisations,
especially for plastic encapsulated chips, functionning in a humid environment
Electromigration (mentioned previously) occurs at high current densities
(> 10 5Ncm2) and high temperatures. As a consequence, in the aluminium pad,
initially uniform, thinner regions arise, leading to device destruction.
Aluminium / silicon interraction (at the ohmic contacts) may lead to the total failure
of the device (by shortcircuit), especially at high current densities.
Protection layers from evaporated aluminium are often formed by too thin metallic
layers, leading to too higher contact resistances and producing regions with higher
current densities.
7.3.1.8
Brittle fracture
The die-case connection may be affected by the brittle fracture of the die. Initiated
by the cracks forming during previous wafer manufacturing processes (crystal
growth, wafer scribbing and slicing, die separating), this failure mechanism is
produced by thermal expansion mismatch of the different materials used for
assembly. After die bonding, the cooling process induces excessive mechanical
stress in the die. If the crack size exceeds the critical size for the induced stress, as
calculated with the aid of appropriate models [7.44], pre-existing cracks can cause
brittle overstress failures. Voids in the die attach can further exacerbate the failure,
not only by increasing the thermal resistance, but also by acting as a stres
concentration site [7.45]. It is interesting to note that because the wire bond is still
connected, the device may pass a functional test without signaling a possible
failure.
7.3.1.9
Electrostatic Discharge (ESD)
This failure mechanism appears at all types of Ies, generally during testing,
assembling or handling. The phenomenon is produced by voltages higher than
lOOOV. Protection circuits or other measures [7.40] can be used to avoid ESD.
230 7 Reliability of monolithic integrated circuits
Early failures are very annoying for component users [7.17]. For instance, if an
equipment has only 500 integrated circuits and in the first 30 days the failures
proportion is 0.1 %, it results that, on the average, 50% of the equipments fail in the
first month of operation.
For the integrated circuits mounted with beam lead technique, the mechanical
defects explain almost all the early failures (excepting complex MS circuits, where
the oxide defects are the main failure cause). Data from various sources indicate
completely different time periods from the component lifetime, for SSIIMSI
circuits vs. MOS LSI (especially for dynamic RAM), as one can see from Fig. 7.5.
For both categories, the average failure activation energy is around 0.4 eV 5•
Temperature
(C)
400
300 .......... \ \
250 ....
.
\
200 ..... ...... \
150 ". ". ~c
100
b/-····.········.. \
50 .................. \
10" 10' Lifetime (h)
Fig. 7.5 Comparison of data refering to early failures and long term failures: a) typical domain of
long term failure mechanisms for commercial plastic encapsulated les; domain of early failures
for bipolar commercial SSIIMSI; domain of early failures of commercial MOS LSI [7.21]
In fact, in this case, the term "early failures" covers manufacturing defects,
becoming failures in a physical or electrical environment (scratches of the wafer,
open or almost open connections, voids, passivation defects, etc.). These early
failures differ one to another by its nature. The early failure period proved to be
important for solving many practical problems. In this period, one can estimate the
failure rate of an equipment or define the condition for a burn-in needed for
reaching a prescribed quality level for the equipment.
In Fig. 7.6, the replacement rates for MSI and SSI circuits are compared, during
the infant mortality period of commercial plastic encapsulated TTL. The high chip
dimensions and the increased complexity of MSI circuits lead to a higher
replacement rate than SSI circuit one's. The results of failure analysis are synetised
5 Goarin [7.18] has shown that the observed activation energies are bellow the 0.4 eV and 0.7 eV
values, estimated previously for bipolar and MOS circuits, respectively.
7 Reliability of monolithic integrated circuits 231
in Table 7.5. From this table, one may understand that the early failures are
important and must be taken into account at reliability evaluation.
SST
Fig. 7.6 Replacement rate of commercial TTL res in plastic package (in RIT, during infant
mortality period) [7.21]
7.3.3
Modeling Ie reliability
First, only simulators for one or two subsytems or failure mechanisms were arise,
such as: RELIANT [7.20], only for predicting electromigration of the interconnects
and BERT EM [7.21]. Both use SPICE for the prediction of electromigration by
derivating the current. Other electromigration simulators were CREST [7.22], using
switch-level combined with Monte-Carlo simulation, adequated for the simulation
ofVLSI circuits and SPIDER [7.23].
Other models were built for hot-carrier effects: CAS [7.24] and RELY [7.25],
based also on SPICE. An important improvement was RELIC, built for three failure
mechanisms: electromigration, hot-carrier effects and time-dependent dielectric
breakdown [7.26].
A high-level reliability simulator for electromigration failures, named GRACE
[7.27], assured a higher speed simulation for very large ICs. Compared with the
previously developed simulators, GRACE has some advantages [7.27]:
• an orders-of-magnitude speedup allows the simulation of VLSI many input
vectors;
• the generalised Eyring model [7.28] allows to simulate the ageing and
eventually the failure of physical elements due to electrical stress;
• the simulator learns how to simulate more accurately as the design progresses.
232 7 Reliability of monolithic integrated circuits
Table 7.5 Incidence of main failure mechanisms (in %) arising in infant mortality period
Electrical 4 60 17 35 9
overcharge
Oxide 2 1 51 - 53
defects
Surface 18 - 24 - -
defects
Connections 37 5 7 29 27
Metallisation 30 34 - 4 2
Various 9 - 1 22 9
Process
defect
distributions
f
~
Layout /
Failure Defect
distributions Probabilities
Calculation of
failure probabilities
If the typical failure mechanisms are known, by taking into account the
degradation and failure phenomena, models for the operational life of the devices
can be elaborated. Such models, in contrast with the regular CAD tools determining
only wearout phenomena, predicts also the failures linked to the early-failure zone.
7 Reliability of monolithic integrated circuits 233
7.4
Screening and burn-in
7.4.1
The necessity of screening
Failure percentages Number of failures Repair cost (SFr) Repair cost expressed in
(%) % of the equipment cost
0.1 10 6250 2.5
1 100 62500 25
2 200 125000 50
3 500 312000 125
One may notice that by using efficient intermediate and final control, for failure
percentages higher than 0.1 % all the repair costs from the last column can be
234 7 Reliability of monolithic integrated circuits
spared. By definition, AQL (the acceptable quality level) is the prescribed limit
percentage of failed devices at which the batch is still acceptable by the buyer.
For high reliability systems, the user does not accept a failed device. To do this,
in these cases a 100% input control was introduced. Such a control cost fewer than
the subsequent replacement of the equipped card. A problem to solve is to have a
method for identifying the les which will fail subsequently. Usually, thermal tests
are used. In fact, a screening sequence contains mechano-climatic and electric tests.
As an example, the stipulations of MIL-STD--883 for aerospace and defense
applications are presented in Table 7.7.
Table 7.7 Screening tests for aerospace and defense applications (MIL-SID-783)
S B
Radiography Yes No
The costs are similar for bipolar and MOS circuits, but MOS Ie having a higher
density, for systems of equal complexity, the screening tests are cheaper for MOS
les. One must know that the sensitive parameters of MOS circuits (such as
threshold voltage or residual current) may evidence after few hours the future
failures. The degradation of these parameters is a sure signal for some types of
early failures. For other types of early failures, an appropriate burn-in may be used.
There is no method to warranty the reliability ofIes. However, the screening and
high stress tests are useful means for the researcher allowing to obtain sufficient
confidence in reliability evaluation. In Table 7.8, the efficiency of some screening
tests is presented, together with some emphasised failure mechanisms. Generally,
7 Reliability of monolithic integrated circuits 235
the minimum cost is for SSI ("small scale integration") and the maximum cost is
for LSI ("large scale integration").
Table 7.8 A comparison between various reliability tests: efficiency, failure percentages, cost
(MIL-STD-883, class B)
* Metallisation
* Silicon processing
* Connections (wires)
Thermal * Package Good 2.5 0.1.18 0.13 0.25
cycles
* Seal
* Header (surface)
* Connections (wires)
* Thermal coefficient
mismatch
* Header (surface)
• Connections (wires)
* Electrical instability
* Metallisation
* Corrosion
7.4.2
Efficiency and necessity of burn-in
obtained. As a treatment, the bum-in must select the early failures. Only the
"remainder" of the "bath-tub" curve will be delivred to the customer. In the opinion
of many specialists [7.8][7.30], the bum-in is the most efficient treatment for
detecting and removing early failures, both for bipolar and for MaS circuits.
Birolini says that bum-in removes about 80% of the chip-related failures and about
30% of the package-related failures [7.40].
Generally, four types of stress are used:
• High temperature and bias: a cheap method, but less efficient,
• HTRB (high temperature reverse bias: high temperature, supply voltage, all
inputs reversely biased): a medium cost and medium efficiency method,
• High temperature, bias, dynamic inputs, maximum load for all inputs: an
efficient, but expensive method,
• HTOT, a method combining the optimum bias with temperatures between
200°C and 300°C: an inadequated method for plastic cases.
In accordance with the standard MIL-STD-883C, the test is performed at 125°C, for
160 hours. For special metallisations and for ceramic cases, 16 hours at 300°C are
used. To obtain the same results, 1 000 000 hours at 125°C would be needed. It
seems that the efficiency of this test depends on temperature and time. Control
activities well-organised by IC producers led to the conclusion that, on the average,
5% of the total integrated circuits fail at bum-in [7.8]. This percentage varies
between 0 and 20%. An efficient treatment may eliminate up to 90% of the future
failed devices in high systems [7.31]. One may say that bum-in is an expensive
method. In this respect, the repair cost for the system must be considered, when the
equiped boards may have hidden defects. It is obvious that the bum-in increases the
delivery cost, but the replacement at the user may be much more expensive.
7.4.3
Failures at screening and burn-in
Generally, the failures arised at screening and bum-in are directly linked to wafer
impurification and metallisation corrosion. This kind of defects may result from an
insufficient control, a nonqualified manufacturing, an improper design or an
insufficient knowledge of the material behaviour and, eventually, may lead to
short-circuit or open-circuit. Many failure mechanisms became "classical" ones,
such as: purple plague or aluminium migration. Other failure mechanisms are due
to faults of circuit designers or to insufficient control/testing (especially for
microprocessors or memories).
In Table 7.9 a syntesis of the typical failures evidenced by screening tests is
presented. Also, a comparison between the failures of transistors and those of
integrated circuits is given in Table 7.1 O.
The data from both tables (obtained in 1975 [7.31]) have not only an historical
character, because some of the devices produced in that period are still operational
somewhere in the world.
The analysis of the failed integrated circuits allows to obtain the failure rate
distribution. This distribution depends on the used technology and on the circuit
complexity.
7 Reliability of monolithic integrated circuits 237
Failure Data published by TI (%) [7.S0) Data published by RAC (%) [7.S4)
types Transis- SSI MSI LSI MOS/ TTL CMOS
tors LSI
Metallisation 6 10 18 26 7 50 25
Diffusion 10 8 12 25 13 2 9
Foreign - 5 11 13 1 6 7
particles
Various 6 5 12 13 21 - -
Oxide 31 18 20 13 33 4 16
Bonds 38 14 7 4 5 13 15
Package 9 5 3 2 5 25 28
Incorect use - 35 17 4 15 - -
Table 7.11 Distribution offailure causes (in %) for various utilisation fields
Component failures 25 64
External failures 58 20
Good circuits 17 15
One may establish also the failure cause distribution. From a comparative study
[7.46], completely different distributions were obtained for transmission equipment
used in various environmental conditions: regular microclimate, reduced external
stress, etc.), as one can see from Table 7.11. This differences may be explained by
238 7 Reliability of monolithic integrated circuits
the fact that the transmission equipment is more often exposed to the overcharge
danger than the switching elements of a telephone exchange.
The electrical failure statistics for the components may be used at the equiped
card level and, then, to the equipment level and optimum configurations for the
circuit layout may be obtained. For SSI circuits, these statistical data are easy to
obtain, but for more complex circuits it is difficult to obtain reliable statistics.
In a report elaborated by RADC (The Rome Air Development Center) in 1971
[7.47], the failed components repr~sents 5% of the total quantity delivered by the
microelectronic industry. Other sources form the early 70's [7.48][7.49] have
shown a failure level of 1-2% for the integrated circuits used in equipped cards.
These results are consistent with the failure rate, at that time, for electronic
components: 10-5h- 1• Afterwards, the spectacular improvements made in the
microelectronic industry allow to obtain failure rates of 10-7.. 10-8h- 1.
I==::J Various
Surface
Tightness
Photolithography
Metallisation
Solders
0 10 20 30 40
Various
Metallisation
Photolithography
Mounting
Wires
Electrical failures
Electrical overcharge
Oxide
o 10 20 30 40
Also in the early 70's, RADC spent more than 1 million dollars on the
systematical study of the reliability of integrated circuits to get sure data. These
7 Reliability of monolithic integrated circuits 239
studies, refering mostly to bipolar circuits, led to the failure distribution from Fig.
7.8. From similar studies, performed for MOS circuits, Peattie [7.50], obtained the
results presented in Fig.7.9.
One may note that the predominant failures for MOS technique (such as:
imperfections of the oxide layer, electrical overcharge, drift of the electrical
parameters, etc.) are completely different from the failures arised for bipolar
circuits (metallisation or diffusion defects). About 50% for all failed MOS circuits
have shown electrostatic damages, overcharges orland utilisation problems. Gallace
and Pujol [7.51] stated the distribution offailure mechanisms presented in Fig 7.10.
Some comments are needed. If the gate oxide is shorted, a residual current arised
at the input, but also a decrease of the noise sensitivity for the functionning
parameters and for the output parameters was observed. Without taking into
account the complexity of the intergrated circuits, the basic failures take place
inside the small cells formed almost exclusively by MOS transistors and MOS
capacitors. The most frequently encountered type of failure mode is the open circuit
(in the inside of the MOS component or in the connection network leading to the
component). Even if at the delivery the component works, the failure may be
produced by a high current density or by a thermal I mechanical shock. Most
frequently, a damage may be induced by the ultrasonic cleaning, a method used for
removing the etching.
Scratches
Electrical overcharge
Oxide
Mechanical stress
o 10 20 30 40
In the raw of failures of MOS circuits, the following cause is the short-circuit,
produced by various types of defects, such as:
• impurification of two semiconductor areas connected to different electrical
potential,
• metal deposition (photoresist defects, mask defects, etc. ),
• insufficient cleaning,
• metallic particles at the surface of the wafer,
• over-alloying of the surface metal with silicon,
• oxide break (short-circuit between the surface metallisation and the substrate).
240 7 Reliability of monolithic integrated circuits
Finally, the degradation effects may be produced by the migration of ions (Na+, for
instance) in silicon or by surface charges which may produce surface inversion.
The electrostatic discharges are also a major cause of failure. And this type of
failure arises not only at MOS, but also at bipolar circuits.
7.5
Comparison between the IC families TTL Standard and
TTL-LS
In the TTL standard technology, the circuit complexity is limited only by the
thermal characteristics of the package. In this respect, a comparison between CI
TTL-Standard and TTL-LS is presented in Table 7.12.
Table 7.12 A comparison between two bipolar IC families: LS vs. TTL Standard
7.6
Application Specific Integrated Circuits (ASIC)
by changing the metallisation layout. But this diversity of types, usually not found
in a company catalogue, has a detrimental effect on the reliability of these devices:
expensive reliability tests are seldom performed, because the required quantities are
small. Consequently, other methods to evaluate the reliability of ASICs must be
used. These methods are refering to design and testing.
Design margins must be appropriately chosen with a view to preclude
operational failures produced by a high range of causes: process variability, hostile
environment (high temperatures, radiation, humidity), etc. Taking into account that
ASIC designers use Computer Aided Design (CAD), specific computer methods,
such as Worst Case Analysis (WCA), may be employed.
The design process of digital ASIC has some steps [7.52]: i) partitioning of the
system function, ii) CAD of primitive gate level\ based on ASIC supplier's design
library, iii) computer simulation of various operating conditions with various
Design Rule Checks (DRC).
The basic timing parameter is the maximum operating speed (the maximum
clock frequency for a correct operation of the ASIC). As this parameter depend on
the temperature, the design may be optimised by determining the actual operating
temperature and calculating the resulting margin required for operation over the
entire temperature range (for military applications: -55°C ... +125°C). Design
margins of 10-15% are currently used. The effect of the environment and ageing
phenomenon is also checked by computer simulation.
The testing must solve the problem of fault coverage (the percentage of possible
logic elements tested by test vectors). The goal is to obtain 100% fault coverage, a
result hard to get for complex ASICs. A mathematical model allows developing
digital ASIC fault coverage guidelines for complex ICs [7.53]. The model is based
on established probabilistic relationship between the fabrication yield of IC, fault
coverage and defect level of finished device, combined with an estimated
probability of using in operation untested logic elements:
DL = 1 _ yJ-FC (7.5)
where DL (Defect Level) is the probability that any given ASIC has defective
untested elements, Y is the yield and Fe - the fault coverage. The authors believe
that by using the concept of design for testability and standard techniques for
testability implementation, a fault coverage in excess of99.9% may be reached.
References
7.1 Spicer England, J.; England, R. W. (1998): The reliability challenge: new materials in the
new millenium Moore's Law drives a discontinuity. International Reliability Physics
Symp., Reno, Nevada, March 31-ApriI2, pp. 1-8
7.2 Noyce, R.N. (1977): Large-scale integration: what is yet to come? Science, vol. 195, March
18, pp. 1102-1106
6 The gate of ASIC may be: AND, OR, NAND, NOR, EXOR, D Flip-Flop (DFF), etc.
242 7 Reliability of monolithic integrated circuits
7.3 Driiganescu, M. (1997): From solid state to quantum and molecular electronics, the
depending of information processing. Proceedings of the International Semiconductor
Conference CAS'97, Oct.7-11, Sinaia (Romania), pp. 5-21
7.4 Schrom, G.; Selberherr, S. (1996): Ultra-low-power CMOS technologies. International
Semiconductor Conference, Oct. 9-12, Sinaia (Romania), pp. 237-246
7.5 Dasciilu, D. (1998): Microelectronics - an expensive field for the present perriod. In:
Curentul Economic (the Economic Stream), vol. 1, September 9, p. 28
7.6 Fluitrnan, J.H. (1994): Micro systems technology: the new challenge. International
Semiconductor Conference, Oct. 11-16, Sinaia (Romania), pp. 37-46
7.7 Peck, D.S.; Zierdt Jr., C.H. (1974): The reliability of semiconductor devices in the Bell
System. Proceedings of the IEEE, vol. 62, no. 2, pp. 185-211
7.8 Colbourne, E.D. (1974): Reliability ofMOS LSI circuits. Proceedings of the IEEE, vol. 62,
No.2, pp. 244-258
7.9 Peck D.S. (1971): The analysis of data from accelerated stress tests. Proc. Int'l Reliability
Physics Symp., March, pp. 69-78
7.10 Biijenescu, T.1. (1982): Look for cost / reliability optimisation of ICs by incoming
inspection. Proc. of EUROCON'82, pp. 893-895
Biijenescu, T.1. (1983): Pourquoi les tests de deverminage des composants. Electronique,
no. 4, pp. 8-11
7.11 Adams, J.; Workman, W. (1964): Semiconductor network reliability assessment.
Proceddings ofIEEE, vol. 52, no. 12, pp. 1624-1635
7.12 Preston, P. F., (1972): An industrial atmosphere corrosion test. Trans. Ind. Metal finish
(Printed Circuit Suppl.), vol. 50, pp. 125-129
7.13 Son, K.I.; Soma, M. (1977): Dynamic life-estimation of CMOS ICs in real operating
environment: precise electrical method and MLE. IEEE Trans. on Reliability, vol. 46, no. 1,
March, pp. 31-37
7.14 Hu, C.; Tam, S.C.; Hsu, F.C. (1985): Hot-carrier induced MOSFET degradation: model,
monitor and improvement. IEEE Trans. on Electron Devices, vol. 32, Feb., pp. 375-385
7.15 Gallace, L. J. (1975): Reliability of TP A-metallized hermetic chips in plastic packages - the
gold chip system. Note ST-6367, February, RCA, Sommerville, USA
7.16 Biijenesco, T.1. (1975): Quelques aspects de la fiabilite des microcircuites avec enrobage
plastique. Bulletin SEV, vol. 66, no. 16, pp. 880-884
7.17 Peck, D.S. (1978): New concerns about integrated circuit reliability. Proc. Int'l Reliablity
Physics Symp., April, pp. 1-6
7.18 Goarin, R. (1978): La banque et Ie recueil de donnees de fiabilite du CNET. Actes du
Colloque International sur la Fiabilite et la Maintenabilite, Paris, pp. 340-348
7.19 Moosa, S.M.; Poole, K.F. (1995): Simulating IC reliability with emphasis on process-flaw
related early failures. IEEE Trans. on Reliability,vol. 44, no. 4, Dec., pp. 556-561
7.20 Frost, D.F.; Poole, K.F. (1989): RELIANT: a reliability analysis tool for VLSI intercon-
nects. IEEE J. Solid State Circuits, vol. 24, April, pp. 458-462
7.21 Liew, BJ.; Fang, B.; Cheng, N.W.; Hu., C. (1990): Reliability simulator for interconnect
and intermetallic contact electromigration. Proc. Int'I Reliability Physics Symp., March, pp.
111-118
7.22 Najm, F.; Burch, R.; Yang, P.; Hajj, I. (1990): Probabilistic simulation for reliability
analysis of CMOS VLSI circuits. IEEE Trans. Computer-Aided Design, vol. 9, April, pp.
439-450
7.23 Hall, J.E.; Hocevar, D.E.; Yang, P.; McGraw, MJ. (1987): SPIDER - a CAD system for
modeling VLSI metallisation patterns. IEEE Trans. Computer-Aided Design, vol. 6,
November, pp. 1023-1030
7 Reliability of monolithic integrated circuits 243
7.24 Lee; Kuo; Sek; Ko; Hu (1988): Circuit aging simulator (CAS). IEDM Tech. Digest,
December, pp. 76-78
7.25 Shew, B. 1.; Hsu, W.; J.; Lee, B. W. (1989): An integrated circuit reliability simulator.
IEEE J. Solid State Circuits, vol. 24, April, pp. 473-477
7.26 Hohol, T.S.; Glasser, L.A. (1986): RELIC - a reliability simulator for IC. Proc. In!'l Conf.
Computer-Aided Design, November, pp. 517-520
7.27 Kubiak, K.; Kent Fuchs, W. (1992): Rapid integrated-circuit reliablity-simulation and its
application to testing. IEEE Trans. on Reliability, vol. 41, no. 3, Sept., pp.458-465
7.28 McPherson, J.W. (1986): Stress-dependent activation energy. Proe. Int'l Reliability Physics
Symp., April, pp. 1-18
7.29 Schaefer, E. (1980): Burn-in, was ist das? Qualitiit und Zuverlssigkeit, no. 10, pp.296-304
Jensen, F.; Petersen, N.E. (1982): Bum-in; an engineering approach to the design and
analysis ofburn-in procedures. 1. Wiley and Sons, New York
7.30 Loranger Jr., J.A. (1973): Testing IC: Higher reliability can cost less. Microelectronics, no.
4,pp.48-50
7.31 Loranger Jr., J.A. (1975): The case of component bum-in: the gain is well worth the prices.
Electronics, January 23, pp. 73-78
7.32 Bazu, M.; Tazlauanu, M. (1991): Reliability testing of semiconductor devices in humid
environment. Proceedings of the Annual Reliability and Maintainability Symp., January 29-
31, Orlando, Florida (USA), pp.237-240
7.33 Biizu, M.; Bacivarof, 1. (1991): A method of reliability evaluation of accelerated aged
electron components. Proceedings of the Conference on Probabilistic Safety Assessment
and Management (PSAM), February, 1991, Beverly Hills, California (USA), pp. 357-361
7.34 Krumbein, K. (1995): Tutorial: Electrolytic models for metallic electromigration failure
mechanisms. IEEE Trans. on Reliability, vol. 44, no. 4, December, pp. 539-549
7.35 Ghate, P.B. (1983): Electromigration induced failures in VLSI interconnects. Solid State
Technology, vol. 3,pp. 103-120
7.36 Fischer, F.; Neppl, F. (1984): Sputtered Ti-dopped Al-Si foe enhanced interconnect
reliability. Proc. In!'l Reliability Physics Symp., pp. 190-193
7.37 Black, 1.R. (1969): Electromigration - a brief survey and some recent results. IEEE Trans.
on Electron Devices, vol. ED-4, pp. 338-347
7.38 Wada, T. (1987): The influence of passivation and package on electromigration. Solid-State
Electronics, vol. 30, no. 5, pp. 493-496
7.39 Learn, A. J. (1973): Effect of structure and processing on electromigration-induced failures
in anodized aluminium. J. Applied Physics, vol. 12, pp. 518-522
7.40 Birolini, A. (1994): Reliability oftechnical systems, Springer Verlag, 1994
7.41 Shatzles, M.; Av-Ron, M.; Gdula, R.A. (1980): Defect-related breakdown and conduction.
IBM J. Research & Development, vol. 24, pp. 469-479
7.42 McPherson, J.W.; Baglee, D.A. (1985): Acceleration factors for this gate oxide stressing.
Proc. 23nd In!'l Reliability Physics Symp., pp. 1-5
7.43 Elsayed, E.A.; Chan, C.K. (1990): Estimation of thin oxide reliability using proportional
hazard models. IEEE Trans. on Reliability, vol. 39, August, pp. 329-335
7.44 Dasgupta, A.; Hu, 1. M. (1992): Failure mechanical models for brittle fracture. IEEE Trans.
Reliability vol. 41, no. 3, June, pp.328-335
7.45 Chiang, S.S.; Shukla, R.K. (1984): Failure mechanism of die cracking due to imperfect die
attachement. Proc. Electronic Components Conf., pp. 195-202
7.46 Boulaire, J.Y.; Boulet, J.P. (1977): Les composants en exploitation. L'echo des recherches,
July, pp. 16-23
244 7 Reliability of monolithic integrated circuits
7.47 Dummer, G. (1971): How reliable is microelectronics? New Scientist and Science Journal,
July 8th, pp. 75-77
7.48 Arciszewski, H. (1975): Analyse de fiabilite des dispositifs a enrobage plastique. L'onde
eiectrique, vol. 50, no. 3, pp. 230-240
7.49 Benbadis, H. (1972): Duree et efficacite du vieillissement accelere comme methode de
selection. Actes du congres national de fiabilite, Perros-Guirec, Sept. 20-22, pp. 91-99
7.50 Peattie, C.G. (1974): Elements of semiconductor reliability. Proceedings of the IEEE, vol.
62,no.2,pp.149-168
7.51 Gallace, T.; Pujol, A. (1976): Failure mechanism in COS/MOS integrated circuits.
Electronics Engineering, December, pp. 65-69
7.52 Wiling, W.E.; Helland, A.R. (1994): Implementing proper ASIC design margins: a must for
reliable operation. ARMS 94, pp. 504-511
7.53 Wiling, W.E.; Helland, A.R. (1998): Established ASIC fault-coverage guidelines for high-
reliability systems. ARMS 98, Anaheim, California, January 19-22, pp. 378-382
7.54 Signetics Integrated Circuits, Sunyvale, California, 1976
7.55 Biijenesco, T.I. (1978): Microcircuits. Reliabilty, incoming inspection, screening and
optimal efficiency. Int. Conf. on Reliability and Maintainability, Paris, June 19-23
7.56 Biijenesco, T. I. (1981): Problemes de la fiabilite des composants electroniques actifs
actuels. Masson, Paris
7.57 Biijenescu, T. I. (1982): Eingangskontrolle hilft Kosten senken. Schweizerische Technische
Zeitschrift (Switzerland), vol. 22, pp. 24-27
7.58 Biijenescu, T. I. (1982): Look Out for CostlReliability OptiH633andmization of ICs by
Incoming Inspection. Proceedings ofEUROCON '82 (Holland), pp. 893-895
7.59 Biijenescu, T. I. (1983): Dem Fehlerteufel auf dem Spur. Elektronikpraxis (West Germany),
no. 2,pp. 36--43
7.60 Biijenescu, T. I. (1984): Zeitstandfestigkeit von Drahtbondverbindungen. Elektronik
Produktion & Priiftechnik (West Germany), October, pp. 746-748
7.61 B1ijenescu, T. I. (1989): A Pragmatic Approach to the Evaluation of Accelerated Test Data.
Proceedings of the Fifth lASTED International Conference on Reliability and Quality
Control, Lugano (Switzerland), June 20-22
7.62 Biijenescu, T. I. (1989): Evaluating Accelerated Test Data. Proceedings of the International
Conference on Electrical Contacts and Electromechanical Components, Beijing (P. R.
China), May 9-12, p. 429--432
7.63 Biijenescu, T. I.: (1989): Realistic Reliability Assements in the Practice. Proceedings of the
International Conference on Electrical Contacts and Electromechanical Components,
Beijing (P. R. China), May 9-12, pp. 424--428
7.64 Biijenescu, T. I. (1991): A Pragmatic Approach to Reliability Growth. Proceedings of 8th
Symposium on Reliability in Electronics RELECTRONIC '91, August 26-30, Budapest
(Hungary), p. 1023-1028
7.65 Biijenescu, T. I. (1991): The Challenge of the Coming Years. Proceedings of the First
Internat. Fibre Optics Conf., Leningrad, March 25-29
7.66 B1ijenescu, T. I. (1991): The Challenge of the Future. Proc. ofInt. Conf. on Computer and
Communications ICCC '91, Beijing (P. R. China), October 30 to November 1
7.67 Biijenescu, T. I. (1996): Fiabilitatea componentelor electronice. Editura Tehnidt, Bucharest
(Romania)
7.68 Biijenescu, T. I. (1997): A personal view of some reliability merits of plastic encapsulated
microcircuits versus hermetically sealed ICs used in high-reliability systems. In:
Proceedings of the 8th European Symposium on Reliability of Electron Devices, Failure
Physics and Analysis (ESREF '97), Bordeaux (France), October 7-10,1997
7 Reliability of monolithic integrated circuits 245
7.69 Bajenescu, T. 1. (1998): A particular view of some reliability merits, strengths and
limitations of plastic-encapsulated microcircuits versus hermetical sealed microcircuits
utilised in high-reliability systems. Proceedings ofOPTIM '98, Brasov (Romania), 14-15
May,pp.783-784
7.70 Hewlett, F. W.; Pedersen, R. A. (1976): The reliability of integrated logic circuits for the
Bell System. Int. Reliability Pysics Symp., Las Vegas, April, pp.5-1O
7.71 Kemeny, A. P. (1974): Life tests of SSI integrated circuits. Microelectronics and
Reliability, vol. 13, no. 2, pp. 119-142
7.72 Bazu, M. et al. (1983): Step-stress tests for semiconductor components. Proceedings of
Ann. Semicond. Conf. CAS 1983, October 6-8, pp. 119-122
7.73 Bazu, M.; Ilian, V. (1990): Accelerated testing of integrated circuits after storage.
Scandinavian Reliability Engineers Symp., Nykoping, Sweden, October
7.74 Bazu, M. (1990): A model for the electric field dependence of semiconductor device
reliability. 18th Conf. on Microelectronics (MIEL). Ljubljana, Slovenia, May
7.75 Bazu, M. (1995): A combined fuzzy logic & physics-of-failure approach to reliability
prediction. IEEE Trans. Reliab., vol. 44, no. 2 (June), pp. 237-242
7.76 Dascalu, D. (1998): From micro- to nano-technologies. Proceedings of the International
Semiconductor Conference, October 6-10, Sinaia (Romania), pp. 3-12
7.77 Dietrich, D. L.; Mazzuchi, T. A. (1996): An alternative method of analyzing multi-stress,
multi-level life and accelerated-life tests. Proceedings of the Annual Reliability and
Maintainability Symp., January 22-25, Las Vegas, Nevada (USA), pp. 90-96
7.78 Caruso, H. (1996): An overview of environmental reliability testing. Proceedings of the
Annual Reliability and Maintainability Symp., January 22-25, Las Vegas, Nevada (USA),
pp.102-107
7.79 Smith, W. M. (1996): Worst-case circuit analysis: an overview. Proceedings of the Annual
Reliability and Maintainability Symp., January 22-25, Las Vegas, Nevada (USA), pp. 326-
331
7.80 Tang, S. M. (1996): New burn-in methodology based on IC attributes, family IC bum-in
data, and failure mechanism analysis. Proceedings of the Annual Reliability and
Maintainability Symp., January 22-25, Las Vegas, Nevada (USA), pp. 185-190
7.81 Knowles, I.; Malhorta, A.; Stadterman, T. J.; Munamarty, R. (1995): Framework for a dual-
use standard for reliability programs. Proceedings of the Annual Reliability and
Maintainability Symp., January 16-19, Washington DC (USA), pp. 102-105
7.82 Pecht, M. G.; Nash, F. R.; Lory, J. H. (1995); Understanding nand solving the real
reliability assurance problems. Proceedings of the Annual Reliability and Maintainability
Symp., January 16-19, Washington DC (USA), pp. 159-161
7.83 Peshes, L.; Bluvband, Z. M. (1996): Accelerated life testing for products without sequence
effect. Proceedings of the Annual Reliability and Maintainability Symp., January 22-25,
Las Vegas, Nevada (USA), pp. 341-347
7.84 Mok, Y. L.; Xie, M. (1996): Planning & optimizing environmental stress screening.
Proceedings of the Annual Reliability and Maintainability Symp., January 22-25, Las
Vegas, Nevada (USA), pp. 191-195
7.85 Johnston, G. (1996): Computational methods for reliability-data analysis. Proceedings of
the Annual Reliability and Maintainability Symp., January 22-25, Las Vegas, Nevada
(USA), pp. 287-290
7.86 Yates III, W. D.; Beaman, D. M. (1995): Design simulation tool to improve product
reliability. Proceedings of the Annual Reliability and Maintainability Symp., January 16-
19, Washington DC (USA), pp. 193-199
7.87 Mukherjee, D.; Mahadevan, S. (1995): Reliability-based structural design. Proceedings of
the Annual Reliability and Maintainability Symp., January 16-19, Washington DC (USA),
pp.207-212
246 7 Reliability of monolithic integrated circuits
7.88 Cole, E. I.; Tangyunyong, P.; Barton, D. L. (1998): Backside localization of open and
shorted IC interconnections. IEEE International Reliability Pysics Symp. Proceedings,
Reno, Nevada (USA), March 31-ApriI2, pp. 129-136
7.89 Huh, Y. et at. (1998): A study of ESD-induced latent damage in CMOS integrated circuits.
IEEE International Reliability Pysics Symp. Proceedings, Reno, Nevada (USA), March 31-
April 2, pp. 279-283
7.90 van der Pool, J. A.; Ooms, E. R.; van't Hof, T.; Kuper, F. G. (1998): Impact of screening of
latent defects at electrical tesst on the yield-reliability relation and applicaiton to bum-in
elimination. IEEE International Reliability Pysics Symp. Proceedings, Reno, Nevada
(USA), March 31-ApriI2, pp. 363-369
8 Reliability of hybrid integrated circuits
8.1
Introduction
The word hybrid means that this technique is placed between a complete integra-
tion (monolithic integrated circuits) and a combination of discrete elements. In this
way conductors, resistors and - until a certain degree - small capacitors and in-
ductors are produced, integrated on a substrate. The passive elements (such as great
value capacitors and, if necessary, inductors) are incorporated in the integrated
circuits [8.1].
IMicroelectronics I
I
icrocomponents IIntegrated CircUits I
Fig. 8.1 The place of hybrid circuits in the general framework of microelectronics
Several circuit elements are placed on the same isolator substrate. In the thick-
film technique this is done with the aid of the stencil process (the paste is pressed
on a ceramic substrate and then submitted to a baking process). In the thin-film
technique, the layers are obtained by evaporation or sputtering.
The hybrid integrated circuits can be much more reliable than the corresponding
circuits formed by distinct components, due to the smaller number of soldering
points, to the more stable substrate; to the greater resistance at mechanical stresses
and due to the replacement of several cases by one single case. In Fig. 8.1 the inter-
dependence and the place of the hybrid integrated circuits in the general framework
of microelectronics are shown.
It is often difficult for design engineers to decide between thick- and thin-film
technologies in the design and fabrication of electronic systems. (In the case of
thick-film, the deposited pattern of conductors, resistors, capacitors and inductors is
I The thick-film systems offer some advantages: simple processing, fast and inexpensive tooling
systems, economy - using wider tolerance active devices -, higher reliability and multilevel
circuit capabilities.
2 The initial enthusiasm and optimism concerning the immediate and wide-ranging applications
for thin- and thick-film hybrid circuits has largely failed to be realised. However, today's fore-
casts suggest that the present world-wiae production capability will be unable to cope with the
demand over the next few years.
8 Reliability of hybrid integrated circuits 249
• smaller dimensions;
• better reliability of the wire connections (smaller number of connections);
• economics (for great series);
• lightly interchangeable tested modules;
• very good reproducibility.
Faced to monolithic ICs, the hybrids have the following advantages:
• great design liberty (various resistors and capacitors, bipolar and unipolar semi-
conductors, analogical and digital functions, all in a single circuit);
• short research/development time;
• smaller development and set-in-function costs;
• shorter times to obtain the models;
• higher currents, voltages and powers;
• resistant to higher shocks, vibrations and accelerations;
• higher working frequencies;
• greater flexibility of active components (mixed technologies);
• economical possibilities to replace the circuits, even after a great series began;
• the design of the circuits can be easily modified;
• the small and moderate series are lucrative;
• the passive components, particularly the resistors, can be produced with a high
precision, and for a large range of values.
But there are also some disadvantages: thus, on the one hand - in comparison with
the printed circuits technology - the costs are higher for small quantities and
doubtless some problems may arise; on the other hand - in comparison with the
monolithic ICs - only a smaller package density can be obtained, and the costs are
higher in the case of an important number of items.
The plastic materials used for encapsulation must fulfil the following conditions
[8.2]:
• good dielectric characteristics;
• small dielectric constant (for high frequency circuits);
• good compatibility with the resistors having thick layers;
250 8 Reliability of hybrid integrated circuits
8.2
Thin-film hybrid circuits
These circuits are made onto ceramic substrate. On the whole surface of this sub-
strate a NiCr-layer is deposited by evaporation, covered then with a photoresist and
exposed to light through a mask. After exposure, the photoresist is removed from
the areas where the conductive lines will be placed. On these photoresist free areas
copper or gold are galvanically deposited. Afterwards, the rest of photoresist is
removed and a new photoresist layer is deposited, also exposed through a mask. In
the areas where the resistors are to be placed, the photoresist remains, and the rest
of photoresist is removed. The remaining photoresist and the already deposited
copper or gold layer will protect the internal NiCr-layer. Then, the NiCr-layer un-
protected by photoresist is baked, and the photoresist scraps are washed.
With this method a take-away process forms the resistors. The thin-film forming
phenomenon is the same, independently of the circuit type. The sole difference is
due to the mask used at photoresist exposure. The possible partitioning in elemen-
tary circuits (repeating module) is made by scribing (a chemical attack after mask-
ing, with laser or ultrasonic). The semiconductor chips and the capacitors are then
introduced in circuit and interconnected. Afterwards, the circuit is encapsulated.
During the manufacturing process, optical and electrical controls are performed.
The final control is made after encapsulation and includes climatic, mechanic and
hermeticity tests.
The advantage of gold conductive paths - in the case of thin-films - is the possi-
bility to correct discrete components (for example, non-encapsulated chips) by
means of gold conductors, assuring the safety in functioning.
Mounting and soldering of discrete components in hybrid circuits is highly
automated and supervised by computers.
8.2.1
Reliability characteristics of resistors
0.01
4 10 21 56 112 224
0.001 number of cycles
time (h)
Fig. 8.2 Drift of nitride tantalum resistors, Fig. 8.3 Stability of nitride tantalum re-
under load, is smaller than 0.1 % after 10 3 sistors depending on number of cycles of
working hours damp heat
---
LWR(%)
--
200°C
0.4
--
./
/ ISSoC
0.3
----
/' 12SoC
0.2 I,,{\Or
0.1
V-
V
70°C
V- 20°C
o
time (hours)
Fig. 8.4 The results of high temperature storage of nitride tantalum resistors,
at various temperatures
8.2.2
Reliability of throughout-contacts
The factors that can influence the reliability of throughout-contacts are the tem-
perature, the temperature changes and current load. During a reliability study, a
252 8 Reliability of hybrid integrated circuits
number of26 000 throughout-contacts were tested more than 1000 hours at 125°C,
loaded at 700mA. Since no failure was observed, it results that:
As < (1/2.6 X 1O]h'1 = 3.85 x 10'% (8.1)
and:
(8.2)
Therefore, at a test current IT = 700mA, for a maximum load current 1M = 35mA,
the estimated value of the mean time to the first failure [8.4], with a confidence
level of90%, is:
MTTF (90%) = 0.43 MTTFs(IIIMJ2 > 0.43 x 2.6 x 107(700/351 h (8.3)
8.3
Thick-film hybrids
Thick-film hybrids [8.5] ... [8.11] are fixed on ceramic substrates by soldering. To
do this, pastes having the desired characteristics and a stencil process are utilised.
Both, for conductive lines and resistors, pastes containing glass and noble metals
are utilised. Firstly, the conductive lines are pressed on the substrate. After drying,
they are backed. Further on, in the same manner, the resistor bodies are disposed
and backed. Under the denomination "resistors", the manufacturers offer pastes
having different resistance values, indicated in n/o.
At present, the experience [8.12] indicates what for dimensions must have the
resistor bodies for the desired characteristics and resistance values.
After all the resistors are deposited on the substrate, the ensemble is backed and
the various layers acquire their final characteristics. A computer is utilised for the
calculation of resistor's form and dimensions. Since with this method a too large
distribution of the resistance values is obtained, the resistors are laser adjusted, so
that finally they have a tolerance of ± 0.5%.
Today, special elements in miniature form are available, carefully encapsulated,
measured and selected, whose terminals can be reflowed. Not only transistors or
integrated circuits are available, but also tantalum or ceramic capacitors, and high
frequency inductors, all of them isolated and having the desired form. Although all
these component types are more expensive than the types having wire terminals, by
correlating their utilisation with the preferred mounting technique for hybrids inte-
grated circuits - the reflow method -, the financial effort is justified [8.2]. In the
case of reflow method, the substrate is firstly selectively tinned and endown with
8 Reliability of hybrid integrated circuits 253
the fluid agent. Afterwards, the isolated and already tinned components are posi-
tioned. The partitioned substrate is heated for short time over the tinning tempera-
ture, until the solder becomes fluid (rejlow). In this manner a very great number of
reliable soldering points are made, in the shortest time, and by the fluid soldering
surface a supplementary selfcentering takes place. By the thinning of certain sub-
strate portions, it is possible to cover also the desired soldering points with a tin-
ning paste, which favours the component catching on the substrate, before the
proper tinning. Then, the terminals are tinned by the same reflow method or by the
normal soldering method, and the circuit is ready.
An interesting characteristic of thick-film technique is that it allows obtaining
crossing line conductors.
Pastes
Depending on their composition and destination, three paste types can be differ-
entiated:
Pastes for conductive paths containing a noble metal powder. The most re-
commended combination is Pd-Ag.
The resistor pastes have the following characteristics: range of the resistance value,
temperature coefficient of the resistor, dissipated power per cm2 (the mean value is
5W/cm2), electrical noise, temperature drift, loading drift, stability\ sensitiveness to
the microclimatic conditions, ratio length/width, print profile. The surface resis-
tance varies between 3n10 and 10MnlO and depends on the paste composition and
on the thickness of the dry layer. The precision without compensation varies be-
tween 15 and 30%, and with compensation it can be obtained a precision of 1% or
greater. Due to the semiconductor character of the thick-film resistor, the noise
curve has the form IIf(/= frequency) it is expressed in dB (Fig. 8.5) and it is pro-
portional with the applied voltage. The noise voltage (in I-IV IV) corresponds to each
frequency decade. The elements with specific chemical surface resistance (Rs) have
values between 2!lVIV (or + 5dB) and 5!lVIV (or + 15dB). In general, the noise of
a resistor layer depends on:
• specific surface resistance (the pastes with high Rs have a strong noise);
• composition (the complex pastes have a higher noise than the simple ones);
• geometry and compensation.
The dielectric pastes are utilised for crossing lines and protection coverings. Tita-
nium dielectrics allow obtaining very high dielectric constants, so that capacitors up
to 20 OOOpF /cm2 and breakdown voltage of 50 .. .1 OOV are feasible.
The glass pastes have smaller printing temperature and can also be utilised as re-
sistors.
8.3.1
Failure types
J For Birox 1400 the mean tolerance is 0.24% (for the series 17 of Dupont, even 0.1 %, and - in
general - the performance is maintained under 1%).
254 8 Reliability of hybrid integrated circuits
quality and the reliability of these circuits depend on different materials, compo-
nents and manufacturing methods.
In Table 8.2 the most frequent types and causes of failures for thick-film hybrid
circuits [8.10] are presented. One may notice that numerous and different types of
failures depend directly on the manufacturing method and on the used materials.
Noise (dB)
+ 10
o
1.44
-10
some - 25 to +5dB
-20 for2 ... 3mm2
-30
0.65 1.3 4.5 13 resistance surface (mm2)
Fig. 8.5 Noise characteristics of Birox 1400 pastes before and after laser adjustment, depending
on the resistor surface (for Birox 1400, 178, and 17G pastes of Du Pont better noise figures may
be obtained)
8.3.2
Reliability of resistors and capacitors
A few reliability data concerning the thick-film hybrids are available. In accordance
with the Sprague report [8.3] the following failures rates have been ascertained:
• resistors: after 1000 working hours at nominal loading and + 70°C, a failure rate
A. = 1.2 x 10-%, with a maximum drift of 0.5 ... 0.7% of the nominal value was
obtained.
• capacitors: after 1000 working hours at + 85°C and the double of the working
voltage, a failure rate A. = 3.4 x 10-6/h was obtained, for a capacity drift smaller
than ± 20% and an isolation resistance greater than 103MQ at the end of the cy-
cle, in comparison with 104MQ at the beginning of the test.
These failure rates indicate the magnitude order of the reliability level obtained by
the manufacturing in great series of the hybrid integrated circuits.
8.3.3
Reliability of "beam-leads"
In 1962, Bell Telephone (USA) elaborated the interconnection and mounting tech-
nology of semiconductor components named beam-leads. This technique has nu-
merous advantages, but it is doubtless that the most important is the higher relia-
8 Reliability of hybrid integrated circuits 255
cillayer materials
• Intermittent interruption • Purple plague (AuAh)
of microconnections • Surface loads
• Connection's fragility • Bad soldering
(soldering by thermo- • Bad positioning; excess of
compression) soldering material
in case • Intermittent interruption
of the circuit; bad elec-
trical contact
• Permanent or intermittent
short-circuit
Connections • Open circuit • Bad soldering
• Breakdown between two • Insufficient quality
conductors
Output wires • Circuit's interruption; • Bad adherence of the stencil zone
coming out of a parasitic
resistance
Cases • Hermeticity defects (me- • Bad closing
tallic case) • Porosities or gases occlusions in
• Hermeticity defects (cera- closing materials, fissures I
mic cases)
------- I
256 8 Reliability of hybrid integrated circuits
bility. In the case of beam-leads, the chip has strip connections going beyond the
edges.
With the aid of a special machine, it is possible to obtain all the connections in a
single operation.
In accordance with the published data, the standard failure rate of beam-leads
has the magnitude order of A ~ 1O-81h. After screening tests, these circuits have a
failure rate of A = 5 x lO- lolh, a remarkable result. Queyssac (Motorola) explains
that by reasons linked to the manufacturing technology:
• complete passivation of the active chip (silicon nitride);
• gold/gold-soldering (no purple plague);
• no (or small) mechanical stress at mounting; practically all fissures or scratches
are excluded; this leads to a better long term reliability;
• chemical separation of the chips; no microfissures;
• no internal soldering of terminals (in this way, about 30% of normal failure
causes of conventional circuits are eliminated).
R E L I A B I L I T Y
Hermeticity No Yes No No
Surface protection Fair Excellent Fair Fair
Soldering reliability Poor Excellent Fair Fair
Possibility of soldering
control Yes Yes No Yes
Manufacturing Standard Standard, until the Standard, until Standard,
emitter diffusion metallisation excepting the
soldering
Thermal characteristics Excellent Excellent Fair Excellent
C 0 S T S
Structure cost Small High High Fair
Reparation facilities Yes Yes Yes No
Facilities for building a
multistructure in a
single case Very small Fair/good Excellent Poor
System level cost Very high Fair/small Small High
The beam-leads circuits have particularly good mechanical characteristics and can
undergo successfully the following tests:
8 Reliability of hybrid integrated circuits 257
8.4
Thick-film versus thin-film hybrids
An advantage of the thick-film hybrids is the possibility to obtain with the aid of
various pastes very different values of the resistance (in practice, from 100 to
lOMO) in the same circuit. By adjustment, resistors with tolerances of 0.5% may
be obtained; however, the thick-film resistors are not so stable (2%) as the thin-film
resistors. The last ones are resistors with metallic film, having well-known remark-
able properties. If the specifications are not so demanding concerning the stability
and the distribution, rather one must use the thin-film technique. The resistors of
this type can be laser adjusted until ± 0.1 %; their stability is 0.3%, and the tem-
perature coefficient (40 ± 20)10·6K 1 is better than thick-film resistors ones (250 x
1O.6K 1). But the resistance domain is smaller (200 to IMO).
An advantage of the thin-film circuits is the solubility of conductive lines and of
the resistors, which is smaller than that of thick-film circuit one. This leads, in
principle, to a smaller volume. The dimensions are determined not only by the line
solubility, but also by the size of discrete components.
Another advantage is that thickcfilm circuits permit to obtain crossing lines. On
the other hand, the crossing lines can be avoided by a proper mask selection (the
crossings are placed under the discrete components). Moreover, in the case of
thick-films, by the crossing manufacturing, often two different printing stages are
utilised, increasing the circuit costs.
Even if for both circuits type the starting point is ceramic substrates with the
same thickness, their composition is nevertheless different. The purity of the ce-
ramic layer for the thick-film circuits is 96%, and that for thin-film circuits is
99.6% (that is why the last ones are a little bit thicker). This is because the ceramic
surface for the thick film circuit must be more rugged to assure a good adhesion of
the paste during the stencil process. On the contrary, the substrate of a thin-film
circuit must be flat and smooth to obtain reproducible metallic layers.
The thin-film circuits have better noise and high frequency characteristics than
the thick-film circuits. The other relalive characteristics, such as the stability of
resistors and of their temperature coefficients are better too.
Another difference is the size of the ceramic substrate that can be processed at
once. In the thin-film technique, more circuits can be set on the same substrate. If
unencapsulated structures must be used, the thin-film technique has the advantage
that their conductive lines are coated with a gold layer, and this make possible their
firm and sure cormection with the gold terminals. For the introduction of the unen-
258 8 Reliability of hybrid integrated circuits
capsulated structures in the thick-film circuits first the contact points must be per-
formed with the aid of a paste containing gold, and this paste is relatively expen-
sive. The experience indicates that about 50% of all circuits are made in thick-film
technique and the rest in thin-film technique.
loly [8.13] gives an example (a telecommunications circuit for military applica-
tions) of circuit realised in both technologies. In accordance with the performed
mechanical and screening tests (2000 working hours at +125°C), the hybrid circuits
still remain in the range value obtained at the initial measurements: no failures (for
both technologies). Based on these results the technical and economical conse-
quences of the two technologies were studied. The comparison is valid for hermetic
cases and unmounted chips, but different substrates.
Comments
For the thin-film circuits, the integration density is greater (on the same substrate
surface, 10 thin-film circuits can be integrated, versus 4 circuits for thick-film tech-
nology).
• The necessary number of photo patterns is 6 for thick-film, and 2 for thin-film
circuits.
• For the thick-film circuit, the thermo compression remains a very difficult manu-
facturing method.
• The cathodic spraying technique (for the adjusting of resistors) is an expensive,
time consuming and difficult to automatise method. The laser technique allows
obtaining a good stability of the components (for both technologies), but the
time consumption is 2-3 times greater for the thick-film circuits.
• The infrastructure is 2-3 times more expensive for the thick-film circuits.
• The noble metal content of the thick-film circuits is 4 times greater than that of
thin-film circuits.
• The drifts of temperature coefficients and of the resistor stability are roughly the
same.
Relative costs
thick-films
thin-films
quantity
Complexity
Fig. 8.6 Evaluation of the relative costs for the thick- and thin-film integrated circuits
8 Reliability of hybrid integrated circuits 259
In Fig. 8.6 [8.14] the costs of the two technologies are shown; it results that the
thick-films are more adequate for the simple integrated circuits; for the complex
circuits is more advantageously to use the other technique. The intersection point
depends - in a small measure - on the production volume and shifts towards thick-
film circuits if the number of manufactured ICs growths.
If several thousands items are manufactured monthly, the production costs are a
little bit smaller. For a small number of items, the thin-film technique leads to
greater production costs. The two technologies are not rivals, but complementary
each other.
8.5
Reliability of hybrid les
Although almost all-electronic components are available in the form of chips usable
in hybrid ICs, only capacitor chips and semiconductor chips are generally used. The
general specifications are:
• small substrate surfaces, since the costs growth with the surface growth; the
resistors with great ohmic resistance require a greater substrate surface, and the
precision capacitors are very expensive and difficult to maintain;
• minimisation of the number of hybrid elements whose mounting asks an inten-
sive work, increasing the costs.
Besides the utilisation of expensive components, reliable circuits, and basic tests,
during the research works other approach modes (such as tolerance analysis, drift
analysis, testability and MTBF forecast) have been enclosed, too. For circuits with
high dissipated power or for circuits that must have high temperature stability, a
thermal analysis is often undertaken with a triple aim:
1) discover the hot spots;
2) detect the temperature growth of critical components because of micro-
climate (evaluation of the influence of the selfheating on the drift);
3) determine the MTBF with the aid of MIL-HDBK-217. An important utilisa-
tion of MTBF is in the comparison of the alternative manufacturing possibilities
with the aim of selecting that one, leading to higher MTBF values. Another meas-
ure in this sense is the devaluation.
During the manufacturing, the principal measures are: input control of all materials
and components, careful supervision of all manufacturing phases (visual control of
equipped and soldered substrates) to identify the scratches on the semiconductor
chips and the areas of bad soldering on the capacitor chips, documentation of fabri-
cation and maintenance of the definite conditions for the microclimate (with clean
rooms, for example).
A statistical evaluation of the measured parameters at testing allows often to
obtain some conclusions about possible problems, especially if the measurements
are made during a life time test.
To avoid the early failures during the normal life, usually - before delivering -
the finished products are exposed to extreme conditions with the aim to detect all
260 8 Reliability of hybrid integrated circuits
the hidden failures. For each type of failure the components are exposed to specific
screens. The selection mode of adequate tests can eliminate the components having
weak points. The failures produced before the end of the normal period of life are
due to the used methods and materials having a random character. If the testing of
materials is made with the greatest care, if the fabrication process is 100% mas-
tered, and carefully supervised, the final test should identify only that components
with defects non detectable during the fabrication. The final test will find out and
eliminate these components.
In the ideal case, the used methods and materials determine the lifetime. The in-
creasing oflifetime is possible only if better methods and/or materials are utilised.
Platz [8.16][8.17] has indicated that an IBM circuit has a MTBF of 108 hours,
the volume of tests being 3 x 10 10 circuits x hours. In general, these tests are per-
formed twice:
a) for normal working conditions (to calculate the predicted failure rate);
b) for higher stress (to emphasise the failure mechanisms).
By comparison with classical circuits, on small boards, the principal advantage of
hybrids is the smaller number of connections. For example [8.16][8.17], a resistor
integrated in a hybrid circuit is far more reliable than a discrete resistor, soldered on
a board. In accordance with IBM data, the MTBF value is greater than 106 years!
The reliability level of a hybrid circuit depends on the size of the series: the greater
the series, the better the reliability. In accordance with the MIL-HDBK-217, the
predicted failure rate is:
Ap = Ab ( 1ft. 7rQ. 7rsJ failures / 106 hours. (8.5)
The following coefficients must be known:
1ft - temperature,
7rF - function,
7rQ - quality,
7rE - environment,
and the terms of the following relation:
Ab = As + AsAc + LARNR + LAcANcA + As7rs (failures / 106 hours) (8.6)
represent the contribution of different parts as follows:
As + AsAc + LARNR - contribution of the substrate;
LAcANcA - contribution of the components included in hybrid circuit;
As7rs - contribution of the package.
In Fig. 8.7 [8.18] a comparison between A.o the observed failure rate and Ap the
predicted failure rate [relation (8.5)], for a hybrid circuit, based on the data obtained
from a user [long observation period; without burn-in data; confidence level 75%
(i), exponential failure distribution] is shown.
The measured failure rates of a simple hybrid module, formed by two PNP tran-
sistors 2N2007 and some resistors, during the operation life [8.19] are:
,11 = 0.2 x 10.9 HI for resistors, and ,12 = 12 x 10.9 h· I for transistors.
8 Reliability of hybrid integrated circuits 261
/7
I!. f / ! - mullichip
0.1 .·15
0.01
,/(v;,·. . v \.
0.01 0.1 10 100 Ap (failures!UI hours)
Fig. 8.7 The experience of users (A ... L) versus predicted failure rates
In Fig. 8.8 the primary causes of failures of small power hybrid circuits are
shown. The majority of failures are either breakdowns or soldering failures (espe-
cially for therrnocompression).
8.6
Causes of failures
Himmel and Pratt [8.20] arrive to the conclusion that 60% of failures are failures
of active components, 23% failures of the connections, 9% failures of integrated
Soldering 33.3%
Connections 32.4%
Active components adhesion 10.8%
Active components 10%
Contamination 6.36%
Olher 7.2%
Fig. 8.9 The primary causes of the failures (power hybrid circuits)
262 8 Reliability of hybrid integrated circuits
LINEAR CIRCUITS
Thin-film Thick-film
Metallisation of interconnections 11.2% 11.5%
Resistive films 11.1% 11.75%
Encapsulation 11.1% 17.64%
Structure 44.2% 11.77%
Foreign material - 17.6%
Miscellaneous - 17.98%
DIGITAL CIRCUITS
MetallisatlOn ot mterconnectlOns 13.72% -
Resistive films 5.88% -
Encapsulation 17.64% -
Structure 25.48% 3.3%
Wires soldering 21.58% 1.1%
Foreign material 13.72% 28.8%
Substrate - 66.71%
Miscellaneous 1.98% -
Fig. 8.10 Statistical reliability data for hybrid circuits
T-Tu('C)
140
120
without
100 cooling
radiator
80
60
40 with
cooling
20 radiator
o
50 100 150 200
Power dissipation density (W/inch 2)
Fig. 8.11 Without cooling radiator, the enamelled layer works at a smaller temperature than that
of an equivalent aluminium oxide chip. As consequence, for the aluminium oxide, a cooling
radiator has a better power dissipation. 1 - enamelled layer; 2 - aluminium oxide; 3 - beryllium
oxide
8 Reliability of hybrid integrated circuits 263
Table 8.4 The efficiency of screening tests (MIL-STD 883, method 5004, class B)
Table 8.5 Typical failure rates of components for hybrids (FIT), versus the working temperature
(0C). [It is recommended to be used only for the cost evaluation and circuit classification, since
the data are strongly dependent on process]
Component
Thick-film resistor
25
5
50
10
Temperature
75
15
eq
100
20
=:=l
---~~-
25
Capacitor-chip 10 15 25 60 250
Wire-contact (thermocompression):
Au-AI 0.05 0.2 10 60
AI-Au 0.1 0.1 0.1 0.1 0.5
AI-AI 0.1 0.1 0.1 0.1 0.1
Au-Au 0.04 0.04 0.04 0.04 0.04
Crossovers 0.05 0.05 0.06 0.08 0.1
Transistor-chip (small power) 3 9 27 70
Power-transistor-chip 50 100 300 900 1700
Diode-chip 3 9 27 70
Integrated circuits:
Four-gates (or equivalent) 20 36 180 820 2400
Dual-flip-flop (or operational amplif.) 40 72 360 1640 4800
SSI 125 225 1125 5120 15000
MSI 250 459 2250 10200 30000
LSI 500 900 4500 20400 60000
264 8 Reliability of hybrid integrated circuits
If the hybrids are classified only depending on the layer thickness, one may
found the situation published by RADC [8.18] and shown in Fig. 8.11.
Concerning the efficiency of screening methods stipulated by MIL-STD 883
(method 5004, class B), Caldwell and Tichnell [8.26] published the data presented
in Table 8.4.
In Table 8.5 a survey of typical failure rates of the components utilised by the
manufacture of hybrids is given.
8.7
Influence of radiation
The integrated circuits used today in military projects must resist to the radiation. A
number of users look forward for a good stability and a normal working, even if the
circuits have been exposed long time to the radiation. From this point of view, the
thick-film resistors have a very good behaviour. The performed tests indicate that
these integrated circuits resist even in extreme conditions, and work in the allowed
power limits. The typical modifications of the resistance are minimal.
These advantages of thick-film hybrids are possible only if the methods and the
materials are according to the specifications. That is why, careful researches on
materials and current controls, essential during the manufacturing process, are
needed. So, for example, only pastes prepared by exactly observing the tolerances
must be used. The main parameters of a fabrication batch must be completely
specified, without neglecting the quality control with the aid of long duration cur-
rent tests.
8.8
Prospect outlook of the hybrid technology
The enamelled metallic layers are important achievements of the last years (Fig.
8.12). Their advantages are good heat dissipation and the possibility to manufacture
substrates having the desired forms and a good mechanical resistance.
Another new development is the polymeric paste for thick-film, with an ex-
pected cost reduction. This paste contains carbon conductive particle suspensions in
an organic medium. Plastic materials are used as substrate.
By using non-encapsulated semiconductors (especially in integrated circuits de-
posited onto substrate with the technology chip and wire) more complex integrated
circuits can be produced with the aid of an automated method named Tape Auto-
matic Bonding (TAP), enriching so the scale of products.
In Fig. 8.13-8.20 the main manufacturing phases of a thick-film circuit for the
transmitting band filter LOV-21 produced by Ascom Ltd., Berne are shown.
8 Hybrid integrated circuits 265
Fig. 8.12 A good example of thick-film circuit: a band filter (Ascom Ltd .. Berne)
....'
••
Fig. 8.13 Conductive lines printed on ceramic substrate: drying at +150°C; baking of the
conductive lines at +85°C
266 8 Hybrid integrated circuits
Fig. 8.15 Printing of the second resistor paste; drying at +150°C; pastes baking at +850°C
8 Hybrid integrated circuits 267
Fig. 8.16 Printing the protection layer (glazing); drying at +Isoac; baking the glazing at +sooac
Fig. 8.17 Printing the soldering (which remains wet for component mounting); mounting of
capacitors ; r~flow-soldering
268 8 Hybrid integrated circuits
Fig. 8.18 Measuring of all capacitors; calculation of nominal values of resistors (97% of nominal
value); ageing of substrate (70 hours at +150°C)
Pins bending
ofICs
Some advantages of using hybrids [8.25] compared with discrete circuits are the
following:
• Electrical properties: (i) higher-frequency performance; (ii) higher density;
(iii) predictability of design; (iv) long-term stability and reliability; (v) low-
temperature coefficient of resistance; (vi) small absolute and relative
tolerances ; (vii) ability to trim components for both passive and functional
response; (viii) high thermal conductivity of substrates.
270 8 Hybrid integrated circuits
lower warranty costs; (vi) easy serviceability and replace ability in the field;
(vii) relatively simple processing and assembly techniques; (viii) low
development cost.
8.9
Die attach and bonding techniques [8.31] ... [8.35]
8.9.1
Introduction
8.9.2
Hybrid package styles
Chips. The need for specialised equipment for die attach (connecting the base of
the chip to the circuit) and wire bonding (connecting the chip top contact to the
circuit) limits the use of the chips. The number of assembly operations is less for
other hybrid package styles, so assembly costs are usually higher for chips. High
volume production can be an exception because automatic equipment for die attach
and bonding becomes economically feasible.
Die attach. Chips may be mounted using eutectic solders ranging from AuSi
(370°C) to AuSn (280°C) as well as conductive epoxies. Eutectic die attach may be
performed using either substrate heating or localised heating techniques. To insure
observable eutectic flow and/or filleting, generally a 0.005" border around the chip
is suitable. The localised heating technique involves the use of an accurately
controlled stream of hot inert gas directed at the chip and the immediate area. It
offers advantages in rework and lower substrate assembly temperatures.
GaAs FET chips. The FET chip can be die attached manually using a pair of
tweezers or automatically using a collet. In either case, provide a flow of nitrogen
over the workstage area. Start with a workstage temperature of 280°C and rise as
required. The chip should not be exposed, however, to a temperature higher than
320°C for more than 30 seconds. An 80120 gold/tin preform 25~ thick with the
same area as the chip is recommended. A standard round preform with the same
272 8 Reliability of hybrid integrated circuits
volume may also be used. When using tweezers, make sure that the chip is able to
facilitate subsequent wire bonding.
GaAs material is more brittle than silicon and should be handled with care.
When using a collet, it is important to have a flat die attach surface. By using a
minimum of downward force, the chance of breaking the chip is reduced (Fig.
8.22).
(Controlled atmosphere) Force
Solder prefonn (or
conductive epoxy) Film metallisation
Substrate
Bipolar chips. The bipolar chip is die attached with gold silicon eutectic under
nitrogen ambient. The eutectic temperature is 370°C. Start with a workstage tem-
perature of 380°C and raise the temperature until eutectic flow takes place. The
chip should be lightly scrubbed using a tweezer.
Diode chips. Table 8.8 shows the preform type and die attach conditions for dif-
ferent types of diode chips. The die attach operation should be performed in a re-
ducing atmosphere such as forming gas or in an inert atmosphere such as nitrogen.
When a single station is used, the operator holds the chip down for a few seconds
until the preform melts and a fillet appears around the edge of the chips or until
eutectic flow is observed. For higher volume operations a belt furnace is used.
Weights are placed on the chips to assure good adhesion when the preform melts.
Temperature, weight, and time are adjusted experimentally to accommodate differ-
ent chip size, circuit configuration, and heating equipment.
Lead bond. The criteria for choosing a specific technique are generally the size
of the contact area on the chip, sensitivity to temperature, and the available equip-
ment. To avoid damage to circuit, use minimum values that provide an adequate
bond. Wire ribbon, or mesh is used. When the bonding pad is small, wire diameter
is usually 18 to 2Sj.UIl in order to keep the wire inside the bonding pad. Typical
starting temperatures are 22SoC for the work stage and IS0°C for the bonding tool.
The bonding tool may be a wedge or a capillary. Pressure is applied to deform the
wire or ribbon about 50%. Approximately a force of O.024gf per square j.UIl (I5gf
per square mil) is needed.
8 Reliability of hybrid integrated circuits 273
Beam lead. The beam lead device is a silicon chip with co-planar plated gold
tabs that extend parallel to the top surface of the chip approximately IOmils beyond
the edge. If size is the major concern, beam lead diodes, not chips, are the cotTect
choice. Handling must be done with care, since the pressure of tweezers may distort
the leads. However, the diodes will stick to the tip of a tweezer point or to the
rough edge of a broken Q-tip. A vacuum pickup may be used, but the needle must
be small enough to prevent passage to the diode. Schottky batTier beam lead diodes
are easily damaged by static electricity, just as packaged diodes are. Contact to the
circuit should never be made with the free side of the diode because this would
allow static electricity from the operator's hand to flow through the diode. Instead,
the side of the diode to be attached should be contacted first. If there is any chance
that the two circuit attachment points are at different potentials they should be
brought together with a grounding lead before contacting the diode.
I I Tab Metallisation
~~~====~~==C-~
L ) - - - - - - - - - - - - - - - - - . . . J Ssubslrate
Fig. 8.23 Beam lead attachment requires thennocompression bonding or parallel gap welding to
the substrate metallisation
soldered to the circuit on a hot plate, belt furnace, or with a gap welder, or epoxy
may be used. Thermocompression bonding is recommended for attaching the leads.
This package style is particularly well suited to shunt diodes, but series applications
are possible by soldering the ministrip to the conductor on the substrate and bond-
ing the lead across a gap in the conductor.
The microstrip post was developed for PIN switches and phase shifter circuits.
The accurate location of the chip centre makes this model useful for phase shift
circuits at frequencies as high as 20GHz. The pedestal may be attached to the sub-
strate with conductive epoxy or low temperature solder. The temperature must be
kept below 280°C (the soldering temperature used to attach the chip to the pedes-
tal). The wires may then be thermocompression bonded to the substrate metallisa-
tion pattern.
8.10
Failure mechanisms
Table 8.9 Comparative A for various bonding techniques (in 0/011000 h) [8.25]
The major failure mechanisms arise in the add-on components (chip resistors,
chip capacitors, transistors, diodes, ICs and wire bonds) - Table 8.9.
Although a single wire bond is very reliable, there may be more than 200 wire
bonds on a complex hybrid, and they may have a major contribution to the failure
rate.
4 However, the films will drift with time (typically 0.25% for thick film and 0.1 % for thin film).
Such drifts should be allowed in any worst-case analysis.
8 Reliability of hybrid integrated circuits 275
References
8.24 Proceedings of the Custom Integrated Circuits Conference, Santa Clara, California
(USA), May 11-14, 1998
8.25 Jones, R. D. (1982): Hybrid Circuit Design and Manufacture. M. Dekker, Inc., New York
and Basel
8.26 Caldwell G. L.; Tichnell, G. S. (1977): Guidelines for the custom microelectronics hybrid
use. Quality (February), pp. 16-19; (March) pp. 22-26
8.27 Schauer, P. et al. (1995): Low frequency noise and reliability prediction of thin film
resistors. Proc. of ninth Symposium on Quality and Reliability in Electronics RELEC-
TRONIC '95, October 16-19, Budapest, Hungary, pp. 401--402
8.28 Loupis, M. I.; Avaritsiotis, J. N. (1995): Simulated tests of large samples indicate a loga-
rithmic extreme value distribution in electromigration induced failures of thin-film inter-
connects. Proc. of ninth Symposium on Quality and Reliability in Electronics RELEC-
TRONIC '95, October 16-18, Budapest, Hungary, pp. 353-358
8.29 David, L. et al. (1995): Reliability of multilayer metal-nGaAs interfaces. Proc. of ninth
Symposium on Quality and Reliability in Electronics RELECTRONIC '95, October 16-
18, Budapest, Hungary, pp. 379-384
8.30 Xun, W. et al. (1995): Newly developed passivation of GaAs surfaces and devices. Proc.
of the fourth Internat. Conf. on Solid-State and Integrated-Circuit Technology, Beijing
(China), October 24-28, pp. 501-505
8.31 Hewlett Packard Application Note 974
8.32 Howes, M. J.; Morgan, D. V. (1981): Reliability and Degradation. John Wiley & Sons,
Chichester
8.33 Kadereit, H. G.(1977): Adhesion measurements of metallizations of hybrid microcircuits.
Proc. Eur. Hybrid Microelectronic Conf. (ISHM), Bad Homburg, Germany, Session IX
8.34 Hieber, H. et al. (1977): Ageing tests on gold layers and bonded contacts. Proc. Eur.
Hybrid Microelectronic Conf. (ISHM), Bad Homburg, Germany, Session IX
8.35 Gosele, U. M.; Reiche, M. (1995): Wafer bonding: an overview. Proc. The fourth intern at.
conf. on Solid-State and Integrated-Circuit Technology, Beijing (China), October 24-28,
pp.243-247
9 Reliability of memories and microprocessors
9.1
Introduction
Silicon technology was (and still is) the dominant technology of the semiconductor
industry; silicon devices have more than 95% market share of the over $140 billions
semiconductor business at the present time. Greater integration, higher speed,
smarter functions, better reliability, lower power and costs of a silicon chip are the
permanent goal in order to meet the increasing requirements of information
technology. The industry progress has closely followed two laws. The first is the
Moore's law, the 1975 observation by Gordon Moore that the complexity ofICs had
been growing experimentally by a factor of two every year. He attributed this to a
combination of dimension reduction, die size increase, and an element which he
called "circuit and device cleverness" - improved design and circuit techniques
which allowed more function per unit area at a given lithography. With a slowing
down of the rate of progress to a factor of two every 1.5 years, Moore's law
continues to hold well today. The second law is the law of Jr, a somewhat tongue-in
cheek statement that memory chips, in a given generation, sell for about n dollars
when they reach their peak shipping volume, and eventually reach a selling price of
nl2 dollars. The law has not really held, though in constant dollars it is not too bad,
but the point is that the cost of a chip has only gradually increased from generation
to generation, held down by the ability of the industry to yield larger and larger
chips while making them smaller on increasingly larger wafers. Device
miniaturisation was the main trend (Fig. 9.1), and the silicon device technology
progress followed the scaling-down principles and Moore's law for the last three
decades. In the past 40 years, semiconductor business continued to grow at a large
growth rate. Today there are two key technologies which play the role of drivers. At
a first stage, the bipolar technology contributed to the large growth rate of
semiconductor business. In a second stage, the MOS technology much improved the
performances of logic arrays, memory devices and microprocessors. Both bipolar
and MOS technologies are based on silicon, and on pn junction. Originally used in
microwave and radio-frequency applications because of its low susceptibility to
noise, a new semiconductor technology, based on the GaAs, emerged as a contender
for use in advanced devices. GaAs is now thought of as a highly reliable, radiation
resistant, ideal medium for use in ultra-fast switching circuits, wide bandwidth
instrumentation and high-speed computers. Continuos improvements are being
made to the manufacturing process, ironing out the problems. Fabrication
techniques are the main area of concern, since mechanical stresses and impurities
introduced at this stage have a considerable influence on device performance.
The gigabit generation will very likely require a new breakpoint if the trends are
to be continued. A few major areas of technology innovations have been the key to
the requirements, such as the lithography shrink ability (lAx each generation), the
levels of metallisation and fundamental limitation of device scaling to meet
performance goal (1.25x chip level), the high dielectric-constant materials, used to
meet cell capacitance in sells of reduced area, etc. As device and process
technology is moving toward 0.25 ... 0.181illl design rule regime, till the year 2000,
semiconductor manufacturers will introduce development and production phases at
a scale of 1Gb. Projections of 4, 16 and even 64Gb DRAMs are not uncommon,
despite the requirement that an extrapolated 16Gb DRAM requires not only s
0.11illl lithography, but also the ability to fabricate devices and features at
corresponding dimensions. It is obvious that the industry is approaching some
limits in its ability to manufacture devices; but limits can be eluded. (Optical
lithography limits were considered to be around 1 Iilll in the late 70's, but
predictions are 0.1..0.21illl at present). The National Technology Roadmap for
Semiconductors [9.1] confidently predicts continuing exponential progress with a
generation every three years, culminating in the year 2010 with 64Gb DRAMs
manufactured on 14001illl2 chips with 0.071illl lithography, and microprocessors
having 800 millions transistors on 620mm2 chips. But some important limits [9.2]
concern not only the lithography, but also the speed of the light, tunneling, device
fields, soft errors, power, cell size, fabrication control, etc. The semiconductor
industry will continue to progress, since all these limits are more practical than
fundamental. However, overcoming the challenges will become increasingly
difficult, and the industry will continue to struggle against perhaps the most
important limit of all to its growth: costs.
0.1
Realm of quantum mechanics
Molecular dimensions
.
Decrease of device dimensions
.------ -----------.
Increase of complexity
.
Electronics
.
Optics
.-------
Micro~ectronics Micrrptics
-------.
Nanoelectronics
In the future, optics will play, too, an essential role. Fig. 9.2 shows that - pa-
rallel to the electronic ones - optical devices and components have also become
smaller over the years, and could lead to the use of molecules or atoms.
To overcome the technology difficulties and manufacturing costs, new materials
and processes as well as cost reduction methods will be introduced, such as high
dielectric-constant materials (BST or PZT), ferroelectrics, and new processes like
silicon on insulator SOL
Furthermore, fast new ultra large scale integration (ULSI) testing methods and
new yield-enhancing redundancy techniques - resulting in cost reduction - will be
increasingly needed to achieve high reliability for ULSI with 109 ... 10 10 devices on a
single chip. Sophisticated microprocessors using O.l51llll MOSFETs could possible
appear at the beginning of 21st century. Simultaneous achievement of high
performance, high packaging density, and high reliability will become increasingly
difficult. Therefore, there is an urgent need to reduce the fabrication process costs
by developing new approaches such as single wafer processing [9.4] and tool
clustering, and increased automation of process and factory control.
For scaled MOSFETs, hot-carrier effects are still important even for less than
3V supply voltage. ULSIs have been developed permanently keeping in mind their
reliability; for each generation, device/memory structures, fabrication processes and
materials have so far been determined by the need to overcome reliability
problems: soft-error phenomena in ULSI memories, dielectric breakdown in the
insulators, electro- and stress migration in the interconnection, etc. Although this
tendency will continue, a new strategy for ULSI technology must be introduced to
realise giga-scale and nanometer LSIs.
The trend of the device parameters for each DRAM generation is shown in Fig.
9.3. It should be noted that the downscaling of capacitor size and capacitor
dielectric thickness are levelling away due to physical limits, in spite of still
monotonous cell size decrease. This trend demands complicated and three
dimensional cell structures at least until 256Mb DRAMs, resulting in increased bit-
cost (Fig. 9.4).
280 9 Reliability of memories and microprocessors
10
2
wiring
. capacitor
MOSFET
. isolation
0 . ..... well
It was found that a drastic decrease of process step in the case of ferroelectric
DRAMs occurs, and it is necessary and urgent to enhance the quality of ferro-
electric films up to the desired production level. It should be noted that the
dielectric constant is decreasing with the decrease of film thickness, and this
physical mechanism is not clear yet. Elucidating this mechanism will also lead to
an ideal ferroelectric non-volatile DRAMs, making a good use of polarisation of
PZTs. Recently, flash non-volatile memories made good progress with higher
speeds than DRAMs (Fig. 9.5), aiming at the application to the personal digital
assistant (PDA). In the same way as for DRAMs, a key factor for flash memories is
the high quality oxide/insulator technology permitting in particular to satisfy
105.•. 106 write/erase cycles, a condition close to the intrinsic oxide breakdown.
Therefore, new robust oxides such as oxynitrided oxides N 20 are needed.
An important element for future PDA and multimedia applications is that flash
memory cell can be easier scaled down, compared with DRAM cell.
There are three approaches to reduce hot-carrier degradation in scaled MOS
devices: (i) hot-carrier resistant device structures such as double diffused drain
9 Reliability of memories and microprocessors 281
(DDD), lightly doped drain (LDD), and gate-<irain overlapped device (GOLD) -
drain/gate engineering - ; (ii) reduction of power supply voltage (l.5 .. .3V), and (iii)
making good use of alternating current effects, including the duty ratio. GOLD
structure provides higher hot-carrier breakdown voltage than LDD structure, and
moreover, higher channel current without severe down-scaling of gate length.
DRNJII;;~
107 •••••.•.•••••••••••••••• !~ ..... +.,.~::~.............. !~........
!flash memory
106 ---~~~:!~~~-~-------------------i-----------------------------------
10 5 I
1980 1990 2000 year
Fig. 9.5 Record density trend in DRAM and other media [9.5]
The testing-in reliability (TIR) approach became more and more extensive as the
semiconductor technology developed, device dimensions decreased, circuits
became more complex, chips grew in size, and the need to realise products of ever-
higher quality increased. The concept of wafer level reliability (WLR) explored and
developed in the 80's proved to be no panacea. However, this last concept led to
the development of additional tools which helped to monitor the process changes
that might affect reliability, and explored the impact of specific failure mechanisms
on the product. These tools had to be used with care because in extrapolating from
stress to use conditions, measurement results could reflect manifestations of a
failure mechanism not representative for product-use conditions. Test data from
tens of test structures must be extrapolated to circuits with millions of transistors;
and test data extending out to no more than I % tail of the failure distribution must
be extrapolated to orders of magnitude further out in the tail for the projected levels
of product reliability [9.6]. Realising the limitations ofWLR for predicting product
life, the industry has continued to use product life testing, but now is facing a
dilemma with product life testing that involves ever-more aggressive market-entry
demands, the measurement of ever-lower failure rates, the need to use ever-larger
sample sizes, and fundamental measurement uncertainties.
A good example of the "paradigm shift" is the new movement from simple
failure analysis by sampling the output of a manufacturing line to the building-in-
reliability (BIR) approach. This approach was introduced at the International
Reliability Physics Symposium (lRPS), in 1990, where Crook [9.9] stated that it
would not be enough test time nor test parts to obtain confidently the low failure
rates that were projected for the end of the century. BIR is an ongoing,
comprehensive, and integrated approach to build reliability into the product; it
282 9 Reliability of memories and microprocessors
involves a continuing search for those factors that affect reliability and for ways to
improve control over them, making the product more resistant to process and use
variations, and to failure mechanisms, defects, and the degradation of material and
devices. To pursue this technique, greater importance will be attached to a deeper
physical understanding of the significant relationships between the input variables
and product reliability.
Such a comprehensive approach requires that all organisations involved in the
design, development, and manufacture of the product take responsibility for pro-
duct reliability. Reliability is therefore an integrated effort that also includes the
suppliers of the materials and equipment used in the manufacture of the product.
Table 9.1 shows the chronology of the XS6 microprocessor implementation
since the !!p SOSO in 1974. With the increased performance and complexity shown,
there has been a reduction in the time between successive generations from over 50
to 31 months as a means to increase market share and company profits.
9.2
Process-related reliability aspects
I Previously, bipolar teclmology reliability could not - in any way - be challenged. This slant
towards MOS happens because the sodium contamination in the Si-Si0 2 system was controled
and removed.
2 The conclusions are based on minority carrier lifetime evaluations of 100mm diameter
Czochralski wafers commonly available for industrial production.
284 9 Reliability of memories and microprocessors
lifetime values are related to silicon defects. AQ study involved sampling of silicon
wafers throughout the world and comparing it with silicon wafers available in the
most European countries. Simple MOS capacitance and minority carrier lifetime
mapping were used for the evaluation.
Computer simulation has shown that modelling with existing silicon material
and wafer processing techniques does not yield the degree of confidence necessary
for submicron technology. This limitation is due to the defects in silicon and
improved silicon wafers are needed to sustain submicron technologi.
Fifteen to thirty samples from three different wafer lots were normally consi-
dered for each type of device. The sample preparation procedure mainly consists in:
• chip step-by-step removal through wet chemical or plasma etching;
• chip cross-sectioning delineation techniques;
• traditional chemical delineation techniques.
The examination techniques are optical and scanning electron microscopy and -
when needed - electron microprobe, Auger electron spectroscopy, and secondary
ion mass spectrometry. Conventional life tests are followed by failure analysis.
Bum-in failures, incoming inspection rejects, as well as field-returns are also
analysed.
Process-related defects
• Manufacturing defects are often related to former process steps; e.g. the passiva-
tion integrity strongly depends on the underlying metal quality, the aluminium
step-coverage is related to the PSG softening, to the contact opening profile, and
to the mask alignment accuracy.
• As expected, newly implemented fabrication techniques may prove to be trou-
blesome, but - surprisingly enough - conventional and presumably well-
mastered operations still remain critical.
Passivation layer
Passivation defects are seldom yield problems, but they are one of the major
causes of aluminium corrosion for plastic encapsulated devices exposed to humid
atmosphere.
The intrinsic quality of the cover layer is obviously of prime importance.
Phosphorous-doped oxides proved to be poor protections against moisture: should
the phosphorous content be large enough, they generally are crack-free, but the
corrosion process is anyway observed to occur, owing to the chemical
transformation of phosphorous pentoxide into phosphoric acid in the presence of
humidity. On the other hand, plasma nitrides demonstrated a much better efficiency
when defect-free; the basic problem regards the cracking resistance which is
critically related to their density and internal stress. So-called compressive nitrides
are in fact observed to be locally tensile (i. e. brittle) at topographical steps.
3 Previously, process techniques such as annealing, diffusion and back surface processes always
reduced and kept defects to a tolerable level.
9 Reliability of memories and microprocessors 285
Dry etching to replace liquid chemical etching. In this decade MOS devices will
have more than 10 million components on one single chip. Design rules will
demand submicron features, and new and improved technologies will be necessary
for generating and delineating the required patterns. Traditional lithographic
processes will be pushed to their limits. Wet chemical etching will have to be
complemented or replaced by dry etching.
All three types of plasma-based dry etching met the requirements of submicron
features. The choice depends on the applications of the process and the interactions
with the materials being processed. Plasma etching (PE) occurs predominantly by a
chemical reaction with little directionality between a reactive gas in the plasma and
the substrate. Reactive ion etching (RIE) adds a sputtering action with higher
directionality to the chemical reaction. Reactive ion beam milling (RIBM) also
combines physical and chemical action with even higher directionality.
RIE and RIBM will become the major dry processing techniques for the rest of
the decade because of their characteristics and suitable etching compatibility with
tantalum over polysilicon for feature sizes in the one micron range. The best
technique for high quality etching seems to be RIE. The process consists of a
chemical reaction enhanced by ions, bombarding the silicon wafer. The ions
remove non-volatile etching inhibitors from its surface and in that way permit
etching to continue anisotropically.
Automation. Automatic wafer processing, inspection and final test continue to be
the password and key to the semiconductor future. There is little doubt that
automation of wafer fabrication will improve the reliability of the devices but the
automatic processing and testing will have to be economical for its total
implementation. In wafer process automation, critical processing steps such as
wafer handling, lithography and dry etching are all being done by machines. Test
and inspection will cover not only highly automated systems but also their use in
repairing processed wafers by adjusting with lasers and automation radiation testing
ofICs at the wafer stage.
Automatic surface mounting of components directly on printed circuits boards is
now gathering momentum.
Yield and reliability. It is usual to consider that ''the higher the yield, the better
the reliability". Of course it all depends on the nature of the main yield detractors.
Design-related yield losses are not easily correlated with reliability failures.
However, when manufacturing defects are involved, it can make sense to look for
such correlations. In a restricted number of cases, no correlation at all does exist
between yield and reliability. The most typical example regards the final
passivation layer quality: large defect densities do not impact the chip performance
but they definitely promote humid test failures. The basic reason why no
correlation can be made between time zero and long term failures is that
passivation layer do not play any active role in the chip functionality.
Some DRAM test results. Dynamic life-testing of 300 IMxl DRAM devices
only yielded 6 (six) electromigration failures after 1000 hours. Contact mask mis-
alignment: 5 (five) failed parts out of300 after 1000 hours in life test.
Surface Mounting Technology (SMT). Solid state equipment manufacturers
continually seek increased packaging density by packing more functions into a
given size enclosure and still maintain the same functional capability. Advances in
integrated circuit designs and fabrication will result in little practical benefit unless
9 Reliability of memories and microprocessors 287
4 Look for automatic surface mounting integrated circuits and other components on printed
circuits boards to greatly improve reliability, cut costs, and increase packaging density.
5 The main advantage of LCC over DIP is that no PCB holes are required for the assembly but
rather the LCC are soldered directly to the PCB solder pads. SMT eliminates the need to drill
holes in the PCB and saves valuable surface areas, so LCC's are much smaller that DIP's.
6 a) An LCC has a reduction in area of 3 to lover a DIP; b) a reduction in volume of 8 to lover
a DIP and c) a reduction in weight of20 to lover a DIP.
7 The only problem with LCC was the well publicised solder-joint cracking problem caused by
the thermal expansion difference between the PCB and the LCC; this problem can be solved by
the proper selection of the PCB material.
288 9 Reliability of memories and microprocessors
All this contributes to cost reduction, consistent product quality and high
reliability7.
Gallium arsenide technology. GaAs technologists have made steady progress in
the development of GaAs materials, processes and packaging technologies. Their
commitment to move GaAs from the laboratory to the production line has made
GaAs a powerful and proven technology in microelectronics. The new GaAs
technology has recently excelled and made advances in the integrated circuits.
GaAs ultra high speed ICs provide the ultimate in speed for super computers and
other high speed signal processing applications which require clock rates in the
gigahertz region and above. Clock rates of 2 to 5 times greater than those available
with the fastest silicon technology provide advantages for faster processing speeds,
increase in throughput capability and reduced system complexity8. The relative
fragility of GaAs, and possible thermal instability at high temperatures, together
with the lack of a native oxide, impose constraints on the allowable processing
techniques.
Future trends. For many years to come, silicon wafers will enjoy being the
prime candidate for many electronic devices. However, for certain applications -
such as ultra high-speed computer elements and communications - other basic
materials are being considered. So a GaAs universal shift register and a binary
counter operate five times faster than the silicon integrated circuits available today.
The market of GaAs digital integrated circuits is now poised for a phenomenal
explosive growth; the standard product GaAs market reached $5 billion in 1998.
Advances in material technology and fabrication techniques such as ion
implantation and ion milling direct-step-on-wafer photolithography and dry plasma
etching have made the commercialisation of GaAs possible.
Surface-mount packages will dominate IC packaging. Their popularity is
attributed to ease of assembly and short leads or pads which enhance both speed
and performance. This package miniaturisation is much overdue. The encapsulation
of large-area dies in thin surface mount plastic packages results in much higher
compressive and shear stresses at internal interfaces than experienced previously in
conventional DIPs.
9.3
Possible memories classifications
Table 9.2 shows some various types of semiconductor memories. In general, classi-
fications can be made as following, depending on the
• Form a/signal: a) analogue; b) digital.
• Access type: a) stochastic; b) sequential; c) semi-sequential.
• Cell type: a) static; b) dynamic.
8 As an added benefit, the extended useful operating temperature range and radiation hardness of
GaAs IC's open new applications that are not possible with silicon technology. Con~equently,
high speed GaAs 16K SRAMs have an access time of 2ns and integrate more than 10 FETs on
a 7.2 x 6.2 mm chip.
9 Reliability of memories and microprocessors 289
Semiconductor memories
Functional
mode
Addresszng
Product
register RAM register jamily
Fig. 9.6 Another possible classification of semiconductor memories. (PLA: programmable logic
array)
290 9 Reliability of memories and microprocessors
9.4
Silicon On Insulator (501) technologies
For more than 30 years, SOl technologies have primarily been dedicated to radia-
tion-hard application. At present, a very serious opportunity is offered to SOl by
the aggressive development of ultradense, deep-submicron CMOS circuits operat-
ing at low voltage. The subsisting obstacle is the credibility of SOl when competing
with bulk Si, which is still extremely efficient. In SOl, the upper silicon layer, i. e.
the active device region, is fully isolated from the inactive substrate by a buried9
oxide. This configuration results in outstanding merits and substantial theoretical
advantages [9.10] over bulk silicon: improved speed and current driveability,
higher integration density, attenuated short-channel effects, lower power consump-
tion, and elimination of substrate-related parasitic effects. Another strong argument
in favour of SOl technology would be the inexpensive control of the interface deg-
radation induced by hot-carrier injection. This is indeed a key challenge for any
technology including bulk silicon.
The most successful and mature SOl material so far is SIMOX, formed by deep
oxygen implantation into silicon and subsequent high temperature annealing.
Commercially available wafers are synthesised with 1.8 x 108Qfcm2 dose and
160 ... 200keV energy, followed by annealing at I 320°C. The thickness of the
silicon overlay and buried oxide are about 200nm and 380nm, respectively. The Si
film is a wafer-scale monocrystal with lower residual doping « 5 x 1015cm-\ high
carrier mobility, and excellent in-depth homogeneity. Subsisting defects are
dislocations and stacking faults (102•• .106cm-2).
The reliability of CMOS circuits depends on the capability of individual
transistors to withstand ageing effects. In short-channel transistors, carriers gain
enough energy to become hot, but the hot-carrier immunity does not appear to be a
critical limit for the operation of fully-depleted or partially-depleted SOl circuits.
Nevertheless, the hot-carrier-induced ageing of SIMOX transistors is a challenging
problem because not only the front gate oxide, but also the buried oxide may be
damaged. The gate-induced drain leakage (GIDL) current - measured at large VD
by scanning the gate bias from depletion to strong accumulation - is very sensitive
to ageing. Increasing the drain voltage or reducing the channel length leads to more
pronounced ageing, whereas the extension of the defective region depends
essentially on gate bias. Other sensitive monitors are: charge pumping'O current,
low frequency noise, photoluminescence, etc.
9 The buried oxide differs from a thermal oxide: it is Si-rich which implies a high density of
electrons traps and E' centres (acting as hole traps). It is definitely larger than the thermal oxide
interface at the gate, but small enough not to adversely affect the circuit performance. The
buried oxide being more subject to degradation than the gate oxide, its defects may jeopardise,
via coupling effects, the performance of CMOS circuits.
10 Charge pumping reveals a more intensive build-up of interface traps due to the simultaneous
presence of electrons and holes [9.11].
9 Reliability of memories and microprocessors 291
9.4.1
Silicon on sapphire (50S) technology
For years, in many research laboratories - such as General Electric, Hewlett Pack-
ard, Hughes, RCA, Rockwell- important works have been accomplished to replace
the silicon support with another material, with better properties: the silicon on sap-
phire. Its most important advantages are:
• absence of field inversion problems, and of parasitic circuit elements;
• better reliability;
• smaller dissipated power, especially smaller static power;
• simple failure causes;
• complementary SOS ICs are very resistant to perturbations, and the working
voltage can varies in large limits.
9.5
Failure frequency of small geometry memories
Soft errors induced by alpha particles can be a reliability concern for microelec-
tronics, especially DRAMs packaged in ceramic. For example, in n-channel MOS
memories, the charge carriers are electrons, and the capacitors are potential wells in
the p-type silicon. Alpha particles emitted from trace levels of uranium U238 and
thorium Th232 in the packaging materials can penetrate the surface of the semicon-
ductor die. As the alpha particle passes through the semiconductor device, electrons
are dislocated from the crystal lattice along the track of the alpha particle [9.12]. If
the total number of generated electrons collected by an empty storage well exceeds
the number of electrons that differentiates from a J to a 0, the collected electron
charge can flip a J to a 0 (Fig. 9.7) generating a soft error in the memory device.
i
+++ "]"
T
V'~"~i
+++
T T
- - - - "0"
encapsulated microcircuits is the filler material and the package lid is the primary
source in hermetic packages, alpha particle contamination from the mineral acids
used in wafer processing must not be ignored. The susceptibility of DRAM to soft
errors is typically measured by accelerated tests or real-time soft-error rate tests.
Knowledge of the factors which lead to soft errors can be used to improve
reliability in DRAM by using a physics-of-failure approach to monitor variables in
the manufacturing. A trend toward decreasing alpha particle emission rates in fused
silica fillers used in the encapsulants for plastic encapsulated microcircuits was
observed [9.14][9.15][9.17].
Since 1985 there was a shift in the semiconductor reliability community from
reliability prediction based on accelerated test measurements of the final product
towards designed-in reliability. The use of accelerated testing and real time testing
[9.16] to monitor soft-error rates during manufacturing of a 256K DRAM were
reported. A direct correlation was found between a batch of phosphoric acid
containing high levels ll of Po-210 and the increase in soft-error rate. The alpha
particle emission rate of the hot phosphoric acid batch was 30 to 80 times higher
than the other materials.
9.6
Causes of hardware failures
These causes have varied over the last 20 ... 30 years [9.18]; generic causes for
failures have been associated with the following:
• part or (active and passive) device failures;
• interconnect failures;
• electrical and mechanical system design;
• excessive environment stresses (mechanical, moisture, chemicals, temperature);
• user handling;
• can not duplicate (CND) or retest OK;
• miscellaneous.
Table 9.3 presents a Pareto ranking of device failure data in which some 22% of
failures fall into the CND and not verifiable category [9.19]. Often these failures
are considered as apparent or virtual failures, and only the remainder of failures are
perceived as actual or hard failures. In some cases, a failure is acknowledged but
the failure cause cannot be attributed to a specific failure site, failure mode, or
failure mechanism.
One reason for the increasing trend of non-attributable failures may be due to the
higher level of complexity, so that failure analysis methods or techniques are
incapable of isolating the associated defects. Traditional techniques of failure
analysis need to be further developed or radically changed to address new cate-
gories of failures. Fig. 9.8 [9.20] indicates that the average quality of JAN, MIL-.
II The Po-21O level in the hot phosphoric acid was 50 ... 100pCillitre, which was 10 to 20 times
higher than the other lots tested. A quality control procedure has been established to monitor the
alpha particle emission rate of incoming phosphoric acid batches [9.l6].
9 Reliability of memories and microprocessors 293
STD-883C-qualified, and military IC's qualified with source control drawings has
improved from 200 defective chips per million (in 1987), to 40 defects per million
(in 1991). This study covers electrical defects, mean density defects, and
hermeticity defect data for digital MOS and linear and digital bipolar technologies.
Table 9.3 Pareto ranking of failure causes in 3400 VLSI failed devices +) (fd) [9.19]
'i VLS] class devices were from multiple sources like manufacturing fallout, qualifications, reliability
monitors, and customer returns. ESD: Electrostatic discharges; * = possible packaging/assembly
related failures.
: \ / \ 1/\
120
100
80
V
\!
•
/\
60 :\ / i \/\
1\( \
y
40
20 :'-
o
1986 1987 1988 1989 1990 1991 1992 Quarterly-yearwlse
Fig. 9.8 Defects in digital MOS and linear and digital bipolar teclmologies Ie's [9.20]
294 9 Reliability of memories and microprocessors
Table 9.4 Historical perspective of the dominant causes offailures in devices [9.18]
9.6.1
Read only memories (ROMs)
12 The resultant failure must not be confused with intrinsic charge loss associated with the
detrapping of electrons on the floating gate [9.8].
9 Reliability of memories and microprocessors 295
the trap level to oxide phonons results in virtual energy levels in the oxide which
allow for more effective transition paths. As a consequence of the electron-phonon
coupling, the emission occurs close to the oxide conduction-band edge at
temperatures between 250 and 350°C, producing a strong temperature dependence
of the mechanism.
--""--
_ _ _----.J
Typical screening test used to eliminate defective EPROMs are (i) burn-in, (ii)
high-temperature reverse bias HTRE, (iii) high-temperature storage, and (iv) low-
temperature dynamic life test.
The main failure mechanisms affecting electrically erasable/programmable read-
only memories (E 2PROM) are intrinsic charge trapping and defect charge loss 13.
Failure rates for EPROM and E2PROM are almost the same up to 10 000 cycles
at 250°C, with an activation energy of 0.6eV. For this type of failures it is relatively
simple to devise screens on a production basis (similar to those used on EPROMs),
since the mechanism is temperature activated.
Electrically alterable ROMs (EAROMs) are manufactured using MNOS
technology (a technology similar to NMOS, but with a modified gate insulating
layer). The device performance is affected by the degradation of the Si02 during
erase/write cycling. Changes in surface states and, consequently, alterations of VT
13 Its major cause is oxide breakdown; two types of breakdown have been identified: (a) tunnel
oxide breakdown (accounts for some 87.5%), without temperature acceleration, and (b) oxide
breakdown in the row select circuitry (some 10%).
296 9 Reliability of memories and microprocessors
and deterioration of charge mobility were noticed, hence the possibility of charge
loss by direct tunnelling from traps within the oxide to the silicon is increased. The
activation energy of this mechanism varies with the number of erase/write cycles,
decreasing from 0.65eV at 10000 cycles to 0.5eV, and then to 0.25eV at 10 cycles.
The retention time is logarithmically dependent on temperature [9.8]. Read cycling
is temperature dependent, but in no way does it influence retention time. The
mechanisms of charge loss are similar to those observed in EPROMs and
E2PROMs; therefore screening processes used for EPROMs are found to be
effective. ESD can be disastrous for the three ROMs discussed above, since their
thin gate oxide would be highly susceptible to breakdown as a result of static
potentials. Therefore by handling in the field all precautions must be taken.
9.6.2
Small geometry devices
As geometry get smaller, more devices can be built on the same area of silicon,
reducing therefore the cost of each individual circuit. The major cause of VMOS
failures was found to be ionic contamination (accounting for over 75% of the failed
devices). Proper process controls and screens result in a marked improvement in
device reliability, but ESD protection in VMOS devices cannot be easily accom-
plished using conventional electrostatic protection circuitry.
The major cause of HMOS device failure (infant mortality condition) seems to
be caused by the ionic contamination through defective passivation layers; it can be
screened using either a high-temperature life test or a storage bake. Accelerated
tests show that thinner gate oxides do not automatically result in higher failures,
because the provided screening removes devices with hazardous latent defects. Hot
electrons are a problem due to the high electric fields in the solid devices; the high
E fields cause impact ionisation and the generation of hole-electron pairs within the
conduction channel degrade the performance. Accelerated tests at low tempera-
tures (-1 ooe to -70°C) can detect defective devices. The acceptable soft error rate
(SER) level due to a-radiation, as specified by Intel, is 0.1 %/1 OOOh or 1000FIT.
The smaller the device, the higher the defect density. Device complexity has been
found to increase non-linearly the defect density.
9.7
Characterisation testing
This is a key to successful screening and inspection testing and plays a dominant
role in the development of design margins and test specifications, as it may reveal
the sensitivities of the RAM. Characterisation is a parametric, experimental
analysis of the electrical properties of a given integrated circuit; its purpose is to
investigate the influence of different operating conditions (temperature, supply
voltage, logic levels, frequency, etc.) on the Ie's behaviour and to deliver a cost-
effective test programme for incoming inspection l4 • Normally a characterisation is
14 For quality cost optimisation at incoming inspection level, see chapter 8.4.2 in [9.24].
9 Reliability of memories and microprocessors 297
Table 9.7 Some typical characteristics of the two types of testing [9.23]
Incorrrlnginspection Characterisation
• N, N"· patterns; • N- patterns;
• Worst case temperature; • Three temperatures (hot, cold, ambient);
• I, 2 or 4 comer voltage supply; • Four comer supply;
• Fixed sets of timing data; • Variation oftiming data in may change area;
• Screening out false second sources; • Defining non specified characteristics.
9.7.1
Timing and its influence on characterisation and test
To characterise and test a dynamic RAM for sensitivities due to timing [9.23],
several timing set-ups must be included. The address latching must be considered
carefully. Row Addressed Strobe (RAS) initiates the cycle by going from high to
low. Prior to that it must remain a sufficient time in the high condition for internal
modes to be pre-charged to a known initial state. The parameter is tRP • Once RAS
goes low it must remain low enough (tRAS) for the selection of the accessed cells,
sense operation and restoration of the destroyed data (read out is destructive).
Similar requirements must be met when Column Address Strobe (CAS) goes low
and the column addresses are latched. Cycle time influences on power consump-
tion. The high impedance state of the output buffer must also be checked; there is
no reason to search for some test sequence or data pattern, which are the worst
cases for access time.
9.7.2
Test and characterisation of refresh
Refresh tests may be roughly divided into two parts: block refresh and distributed
refresh. The normal way to do block refresh testing is to write some data (such as
checkerboard pattern) in the entire memory. Then the memory is tested and if no
9 Reliability of memories and microprocessors 299
failure, all clocks are stopped and paused for the specified time, 2ms. After 2ms the
memory is tested again for failure. Such a test insures that in addition to data being
retained for the refresh interval, the peripheral circuits are also fimctioning after the
pause.
Refresh time is not so critical at low or room temperatures, but becomes
significant at elevated temperatures; it can vary as much as 30 times or more over
the temperature range 70°C to 25°C, depending on the internal construction of the
memory. One of the drawbacks of testing refresh time by this method is the thermal
changes within the chip when pausing between read and write [9.23]. Because of
this, the refresh time reading is not constant and it is difficult to decide, what is the
actual refresh time of the memory. A way to solve this problem is to apply a
distributed refresh.
9.7.2.1
Screening tests and test strategies
The newest memories on the market, produced in small series, manufactured with
insufficiently stable parameters can exhibit early failures and they must be elimi-
nated before they are mounted on PCB, with the aid of well skilled personnel. The
screening tests must activate failure mechanisms, and must not cause damage or
alteration of the tested memories. For memories in hermetic packages, and for high
reliability (or safety) applications, the following screening tests should be applied:
• Burn-in - statically or dynamically - (125 'C' for 160h) produces some 80% of
the chip related and 30% of the package related early failures; memories should
be operated with the same electrical signals as in the field. Should surface, oxide
and metallisation problems be dominant, a static bum-in is better. A dynamic
bum-in activates practically all failure mechanisms. The choice will be made on
the basis of practical results.
• Constant acceleration (for memories in hermetic packages) to check the me-
chanical stability of die-attach, bonding, and package. The memories are placed
in a centrifuge and exposed to an acceleration of 30 OOOg for one minute (gen-
erally z-axis only).
• ESD test (lkY ... 3kY) during handling, assembling and testing of memories or
ICs, using human body model HBM and the charged device model CDM.
• Glassivation (silicon dioxide and/or silicon nitride) test of the entire die surface.
Ideally for memories in plastic packages it should be free from cracks and pin-
holes. To check this, the chip is immersed (for 5 minutes) in a 50°C warm mix-
ture of nitric and phosphoric acid, and then inspected with an optical micro-
scope (MIL-STD-883, method 2021).
• High-temperature storage (150°C for 200h) to stabilise the thermodynamic
equilibrium and to activate failure mechanisms related to surface problems (e. g.
charge induced failures, contamination, contacts, oxidation). Should solderabil-
ity be a problem, an Nrprotective atmosphere can be used.
• Hot carriers are a consequence of the high electric fields (10 4 ..• 105 y fcm) in
transistor channels. Effects: increase of switching times, possible data retention,
300 9 Reliability of memories and microprocessors
increase of noise. The test is performed under dynamic conditions, at 7... 9V and
at -20°C to -70°e.
• Humidity or damp heat test, 85/85 and pressure cooker - to investigate the in-
fluence of moisture (e. g. corrosion) on the chip surface [9.24].
• Latch-up tests simulate voltage overstresses on signal and power supply lines as
well as power-on I power-off sequences [9.24].
• Seal test - Ih at O.Smm Hg I storage (4h at Satm) in a helium atmosphere I
waiting O.Sh in open air I measurement with the help of a specially-calibrated
mass spectrometer (to check the seal integrity of the cavity) begins with the fine
leak test and continues with the gross leak test (lh at Smm Hg I 2h at 5atm in
FC-72 I 2 minutes waiting in open air I immersion in a FC-40 bath at 125°C I
observation of the continuous stream of small bubbles from the same place
within 30s to confirm a defect).
• Soft errors. At the chip level, an electron beam tester allows the measurement of
signal within the chip circuitry. (If logical circuits with different signal levels
are unshielded and arranged close to the border of a cell array, stray coupling
may destroy the information of cells located close to the circuit, leading to chip
design problem).
• Solderability of tinned pins, performed according to MIL-STD-883 or IEC 68-2
after the applicable conditioning, and using the solder bath or the meniscograph
method.
• Thermal cycles - to test the memory's ability to support rapid temperature
changes (at least 10 thermal cycles from -65°C to +150°C) air to air in a two-
chamber oven using a lift. Dwell time at the temperature extremes should be
zlO minutes (after the thermal equilibrium of the memories has been reached
within ±5°C), transition time less than 1 minute. Should solderability be a
problem, an Nz-protective atmosphere can be used.
• Time-dependent dielectric breakdown (particularly sensitive for memories z4M)
as a result of up to IOMV/cm electric fields (Fowler-Nordheim effect, hot carri-
ers into isolation layer).
The following listing gives a standard procedure of the screening tests (batch of 50
pieces):
1.: Extreme temperatures, measuring the parameters at 25°C, after each step;
• beginning with 70°C, in 10°C steps until failure; Ih for each step, with
vitality test.
1. Electrical behaviour at various temperatures.
• -20°C, -40°C, -60°C, Ih for each step, and measuring at 2SOC after each
step.
• -20°C until + 120°C in 10°C steps, Ih per step with vitality test.
• 100°C, 100h with vitality test, continuous monitoring of various
parameters.
J: Thermal cycles (1000 cycles -20°C/+85°C).
• 20 .. .40°C with vitality test;
• 60 ... 80°C with vitality test;
• 60 ... 80°C with power on only during the heating phase.
9 Reliability of memories and microprocessors 301
1.:
Humidity tests (intermediary and final measurements, in dry state).
• 85/85, 250h with vitality test;
• 30/100 (hot water), 500h without supply voltage;
• 95/95, cooling to -20°C in 3h, heating to 95°C (95/95) in 2h, 100 cycles
without supply voltage.
~ Vibrations (without supply voltage).
• 8 sine explorations (0 .. .3000Hz) to detennine the resonance frequency;
• random 20 ... 500Hz; 1,3,6, eventually 10 and 15g, three axes, Ih.
~ ESD-test.
9.7.3
Test-programmes and -categories
A test programme for a RAM memory consists of three items: DC parametric test,
AC parametric test and functional test (although they often are applied
simultaneously). A memory test program comprises various tests such as:
continuity check, leakage tests, a variety of functional tests, dynamic or timing tests
and parametric tests. Functional tests are by far the most important tests for RAMs.
The DC tests usually have access only to the outskirts of a memory chip, whereas
the functional tests have logical access to all the embedded functions of the chip,
resulting in a much better test coverage.
9.7.3.1
Test categories
9.7.3.2
RAM failure modes
Over the years a number of failure modes have been reported on semiconductor
RAM memories. Traditionally, the solutions of the problems were found through
an evolutionary trial-and-error approach. Very often the failures were reported by
end-users either as a result of effective electrical-characterisations or well-planned
incoming inspection or simply as the systems experienced field failure repair. Some
of the classic failure modes can be described as follows:
• Breakdown: Failure of a clamp or Zener diode; any other semiconductor or
junction breakdown.
• Decoder malfunction: Inability to address a substantial part of the array due to an
open decoder line internal to the device, or a defective decoder.
• Excessive write-recovery: Read access time lengthening, when the read cycle
immediately follows a write cycle. When using the same data line for both
reading and writing, the increased time is caused by a sense amplifier that is
saturated during the write and is unable to recover in time to detect the differen-
tial voltage of the cell being read. Recovery time may even be pattern sensitive.
• Input and output leakage: Excessive leakage currents above specified limits.
• Multiple writing: Data are written into other cell(s) than the one addressed, due
to capacitive coupling between cells or other defects like leaky input or short
circuit.
• Open and short circuits: Bonding failures or insufficient/excessive metallisation
in one of the last semiconductor. manufacturing steps.
• Pattern sensitivity: The device response varies with the test pattern, reflecting
differences in address and/or data sequences; it may also reflect timing and
voltage specifications being too close to actual failure regions.
• Refresh sensitivity: Dynamic RAM fails to retain data reliability during the
specified minimum interval between refresh cycles. Failure is due to excessive
9 Reliability of memories and microprocessors 303
voltage or current leakage fromthe storage element or a fault in the rewrite cir-
cuits .
• Sense amplifier recovery: Tendency of the output (sense) amplifier to favour one
logic state after reading a long string of a similar logic state. Alternate l's and
D's may be read correctly, while a single bit of a given logic state in a long
string of opposite logic states may come out incorrect. It is caused by improper
charge accumulation in the sense amplifier.
• Slow access time: Charge storage on the output driver circuits or long lines
causes excessive time to sink or source current, thereby increasing access time.
Each of the listed failure modes can effectively be screened for, even though it may
require several screening approaches, if all failure modes have to be dealt with. To
effectively test for the signal detection capabilities of the sense amplifier and its
pre-charge requirements, a worst case pattern string of identical data, including a
single bit of inverted data could be run in a fast read mode.
9.7.3.3
Radiation environment in space; hardening approaches
What would happen to standard electronics if they were launched into space? From
500 to 75 OOOkm above the surface of the earth, the space can be a very hostile
environment to most electronics needed for satellite functions such as navigation,
communication, and data processing. The high-density RAMs, the microprocessors,
and other vital electronics would operate for only a few months up to a year or two
in many satellite systems before succumbing to the effects of radiation trapped in
the earth's magnetic field (bad data, spurious output signals, latch-up, or bum-out,
all caused by the bombardment of galactic cosmic particles, from hydrogen to ura-
nium, that permeate the space above the earth) [9.30]. Electronics designed and
built to operate effectively in a radiation environment (rad-hard) have been in pro-
duction for over 30 years. What is new is the need for rad-hard parts l5 in quantities
of tens and hundreds of thousands for commercial satellite systems at costs close to
their unhardened commercial equivalents. UTMC Microelectronic Systems intro-
duced one alternative - the self-contained process module - at Colorado Springs, in
1997.
Radiation has two primary effects on electronics in space: the first is the total
dose (accumulation of radiation over time, which results in permanent degradation
of device performance, including shifts in tum-on voltages, increases in operating
and stand-by currents, and changes in signal propagation delays); trapped electrons
and protons are the bulk contribution to the total dose damage (Fig. 9.9) ; the other
15 Rad-hard les require special processing - for the most parts - have been manufactured on
dedicated wafer fabrication lines. Because of the high cost of maintaining this kind of facilities
and the relatively small market for high-level rad-hard products, the cost of these components
could easily exceed the cost of their commercial equivalents by a factor 10 to 100. Responding
to this dilemma, some rad-hard suppliers have come up with cost-saving innovations (such as
running rad-hard products on the same fabrication line as commercial products or shielding
commercial components from radiation by placing them in special packages with enough mass
to reduce the radiation inside to tolerable levels).
304 9 Reliability of memories and microprocessors
effect is the displacement damage caused by the proton portion of the space
radiation (degrades solar cells and bipolar devices, but essentially have no effect on
digital electronics). Because of the sensitivity of most commercial electronics to
ionising radiation, almost all satellite systems require some means of mitigating the
system degradation due to space radiation.
+v +v +v +.v
PG PG PG PG
Fig. 9.9 Generation of electron-hole pairs in the gate and field oxides (PG = polysilicon gate)
The easiest way to minimise the trapped-hole density is to thin down the oxide; a
clean gate oxide less than 12.5nm thick can usually survive up to 100krad(Si) with
no process changes. It is also possible to entirely eliminate the field oxide through a
fully depleted technology as SOL When rad-hard products are run on commercial
lines using modified steps, it may be possible to reduce the hole traps, but in
absence of a dedicated rad-hard process, the hardness will rarely go much beyond
100krad(Si). Rad-hard components fabricated on dedicated lines (expensive
solution!) are frequently hard to 1Mrad(Si), and they can easily survive most
natural space radiation environments.
It is usual to consider that the higher the yield the better the reliability. Of
course, it all depends on the nature of the main yield detractors. Design-related
yield losses are not easily correlated with reliability failures. However, when
manufacturing defects are involved, it can make sense to look for such correla-
tions. In a restricted number of cases, no correlation at all does exist between yield
and reliability. The most typical example regards the final passivation layer quality:
large defect densities do not impact the chip performance, but they definitely
promote humid test failures. The basic reason why no correlation can be made
between time zero and long term failures is that passivation layer does not play any
active role in the chip functionality. Dynamic life testing of 300 1Mxl DRAM
devices only yielded 6 electromigration failures after 1000h. Contact mask
misalignment: 5 failed parts out of 300 after 1000 h in life test.
9 Reliability of memories and microprocessors 305
A SER (soft error rate) predictive design tool is presented in [9.17] which has
resulted in a unique modelling tool called the ~oft-Error Monte Carlo Model, or
SEMM. SEMM has been used in designing chips with performance/cost and soft-
fail reliability trade-offs for bipolar, CMOS, and bi-CMOS technologies. SEMM
can be extended to model SERs in chips used in aerospace environment, which
involves bombardment by protons, neutrons, and heavy ions. Also, as the critical
charges and device dimensions reach very low values, SER effects of secondary
spallation products - such as deuterons, tritons, and low-energy protons - and of
low-energy neutron recoils must be taken into account.
9.8
Design trends in microprocessor domain [9.29]
1) The device feature size will continuously decrease (0.2 ... 0.1 f.U11) and the delay
time of each gate will be reduced under 0.2ns. This allows integrating more than 80
millions transistors in a single chip, accommodating more function units and
greatly improving the microprocessor speed. The parallel processing technique is
very efficient to expose the instruction level parallelism, leading to a significant
reduction of communication, I/O interface, time and power consumption and
system design complexity. Consequences: (i) integrating the communication link
into the microprocessor chip; (ii) designing a synchronisation circuit in a
microprocessor chip to support multiprocessing; (iii) setting up a supporting
mechanism for the microprocessor chip to ensure the data consistency [9.28].
2) Five to six metal routing layers will be possible cutting down the routing area,
and reducing connection wire resistance and capacitance. Ion implantation
technology will improve the microprocessor speed and will reduce the parasitic
capacitance.
3) CMOS (high integration level, low power supply, low power consumption,
low I/O swing) and GaAs (high speed, radiation-hardness, temperature insensiti-
vity, low power consumption, harsh environment bearing) technologies may be
preferable options for microprocessor design.
4) New types of semiconductor material and optical interconnection will be
developed, and will permit to reach up to 5 to 10GHz. Therefore new types of wide
bandgap for the high speed device design must be developed.
5) The microprocessor design will continue to follow the line of the reduced
instruction set computer (RISC) architecture.
These ways can tremendously cut down the cost, making the parallel, and
processing systems more powerful in competition, and the embedded micro con-
troller market will substantially grow up.
The microprocessor design is directed toward the green chip in the following
aspects: (a) low voltage power supply; (b) several operation modes for energy
saving, including variable operation frequencies and clock throttling; (c) static logic
design which may work at frequencies as low as zero Hertz; (d) power management
capability fully independent of the application environment to avoid possible
conflicts; (e) dedicated hardware to monitor the power supply status of peripheral
devices.
306 9 Reliability of memories and microprocessors
9.9
Failure mechanisms of microprocessors
16 As passivated AI(Cu) lines become narrower, the metal exhibits increasingly elastic behaviour
with higher stress levels, a combination of stress characteristic which favour void formation.
Stress relaxation in AI(Cu) films and lines has been measured by bending beam and x-ray
diffraction methods [9.31].
17 The term packaging is used to cover the forms of encapsulation available. However, the die
attachment system used in the package and the lead frame system, parts of the so-called
interconnects are often involved when discussing the problems of packaging.
9 Reliability of memories and microprocessors 307
18 Such applications of models can be expensive, however. Programmes for finding the real faults
entail at least a thousand lines of code for such relatively simple functions such as data transfer
or manipulation, whereas 10 000 lines of code may be needed to detect faults in more complex
instructions-decoding and -control circuits.
19 A 64 KRAM, for example, may require from 10 5 to 109 test vectors or patterns (each pattern is a
set of input signals for testing a given state or function); with a typical memory-cycle time of
475ns, the corresponding test periods would range from 49ms to 53.7 minutes. Test times under
a few minutes are still deemed very long, considering the production volume.
308 9 Reliability of memories and microprocessors
Tunneling: Tunneling through the gate insulator is a potential fundamental limit. In theory the
tunneling limit is around 3nm. In practice, oxide quality and defect density have resulted in thicker
oxides, but improved processing should allow insulators close to the tunneling limit.
Devicefields: Another fundamental limit is the maximum allowable field in the depletion regions and
the gate insulator. If the fields go too high, hot electron effects, punch-through, or breakdown results. As
dimensions are reduced, the supply voltage cannot be made arbitrarily low Even with a well designed
device with good characteristics, sub-threshold conduction will limit how low the threshold voltage can
be reduced. Thermal energy allows some fraction of the carriers in the silicon to surmount the barrier
which the gate electrode creates as the device is turned off. In good devices current decreases by an order
of magnitude for every 80 ... 90m V reduction in gate voltage in the sub-threshold regIOn. In practice,
operation substantially below I V results in performance deterioration which is unacceptable in many
applications, due to loss of overdrive to tum the device on [9.54). This limit of the supply voltage does
have the advantage of keeping the energy in a logic transition well above the minimum necessary to
overcome thermal fluctuations; i. e. the minimum voltage must be > PkT/q, where P is 2 to 4 [9.36)
Another challenge with device fields is controlling where the field lines terminate. Device threshold
voltage can vary due to short channel effects, which arise when the device threshold voltage depends
upon the source-drain spacing and drain voltage. For very small devices, the field distribution must be
considered as a three-dimensional problem, since charge in the channel and both the source and drain can
terminate field lines from the gate. A double-sided gate structure where there are gates both above and
below the channel region, can give the best control of channel fields and short channel effects, but is
very difficult to fabricate at present. It can, however, result in the shortest channel device.
Soft errors: As device dimensions and supply voltage are reduced, the amount of charge involved in
a switching or retentive operation is correspondingly reduced. Soft errors can occur when minority
carriers cross a pn junction into a node in sufficient quantity to upset the state of the node. DRAM is
most sensitive because it involves storing small amounts of charge in the memory cell. Memory can
handle such errors through error correction codes, and logic can use parity to detect and retry. One
source of minority carriers is ionising radiation such as a-particles or cosmic rays.
DRAM cell size: One reason for the continuing progress in DRAM has been that the normalised cell
size, as expressed in minimum lithographic squares, has decreased by about a factor of 1.4 each
generation. The minimum cell size for the folded bit line cell configuration, used in all present DRAMs,
is eight squares (determined by the intersection of two word lines and a bit line). This is about the
number of squares for the 256Mb DRAM cell; to stay on the projection, the I Gb cell should require 5 ... 6
squares. The open bit line cell requires 4 squares (the intersection of one word line and one bit line), but
would have severe noise and sense amplifier pitch matching problems. If cell area does not shrink
according to projection, the chip size will be affected and impact the economic viability of the next
DRAM generation.
Fabrication control: In practice, control of the device characteristics and yield is a major concern.
Each process has an associated variation, and as devices shrink, the variation must also shrink
correspondingly. Collectively, immeasurable effort has been spent by the industry learning to control and
refine processes. Controlling a 0.1).1111 gate electrode to ± 20% requires controlling each edge to a few
10's of atomic distances. While this may not be a fundamental limit, it does represent a formidable
challenge. An eventual limit comes in the statistics of random distribution of impurities in depletion
regions [9.37). As the active volume shrinks, the number of impurity atoms N in the depletion region
shrinks, while the standard deviation goes as N 112, and the probability that somewhere on the chip a
device will not have sufficient atoms to support the depletion region goes up.
Despite these potential limitations, devices with perfectly good characteristics with channel lengths
well below 0.1).1111 have been made [9.38). Specific technology improvements (such as SOl or a low E
interconnection dielectric) could further improve performance without pushing dimensions. It appears
310 9 Reliability of memories and microprocessors
that fundamental limits would not start kicking in any serious manner until somewhere after the 16Gb
generation, if then. Practical limits in lithography and control of fabrication processes will dominate.
The industry has a record of overcoming such challenges, but it is becoming increasingly more difficult
to do so, and at increasingly higher costs. The costs of state-of-the-art DRAM fabrication facility is in the
vicinity of a billion US$, and is doubling each generation.
And processors would still get better. Microprocessor throughput has been increasing at about 2X
every 18 months. Half of this has been due to improvement in device performance, but the other half has
resulted from other sources, such as improved design, circuits, layout, architecture, compliers, and the
like, and this progress would continue. Further, there would be strong potential for improvement through
optimising design for a specific application. At present, microprocessors are designed for general usage
and then personalised through software for the particular application; processor design is very time
consuming and expensive, and the high design costs must be amortised over a large sales base. A goal
would be automated design tools which could take a high level description and produce a processor at
the "push of a button" which was reasonably optimised for the specific application, with improved
performance, and at a design cost which would be economically attractive for the smaller sales volume.
The current drive for low power electronics is a good example of optimising designs for a specific
end without pushing technology limits [9.36). Since power in a CMOS circuit is given by the very
familiar formula P = CV2j, power supply reduction is the major first step. However, power is a system
characteristic, and optimisation results from considering a wide spectrum of disciplines, including the
system level and all levels below: processing technology, device design, circuits, chip design, CAD
tools, system architecture and organisation, system operation, logic partitioning and synthesis, algorithms
and software. A key aspect has been to incorporate low power considerations into the system design
from the beginning, and major progress in power reduction has been obtained in memory, logic, and
communications [9.39), [9.40).
Reliability problems of Gigabit CMOS circuits. There is the widely shared opinion that the minimum
structure size of mainstream CMOS devices which is currently at about 0.51JIIl correspon-ding to the
16M-DRAM-CMOS-generation will be scaled down to about 0.07j.Ull for the 64Gigabit DRAM level.
Simple CMOS circuits - such as ring oscillators - have already been realised with a channel length of
0.07j.Ull. This means that today's CMOS-technology will prevail for at least a further decade, or, very
likely for two or even more decades.
DRAM processing will continue to play the role of the technology driver up to a memory cell density
of 64Gbit per chip. The pace of the past that brought us in intervals of three years the laboratory versions
of a new DRAM generation will be maintained Less technological problems, but more the fmancial risk
and the need of huge investments for more advanced fabrication capabilities may slow down the speed of
introducing higher integration densities on the market But, nevertheless, 64Gbit circuits with a minimum
structure size of 0.07j.Ull will become a reality and will enter the market place in the first quarter of the
next century.
Conclusion. Finding the optimum trade-off between reliability and performance will be a challenge
for all the reliability scientists.
References
9.1 Ning, T. H. (1995): Second symposium on nano device technology, Hsinchu, Taiwan,
May 25-26
9.2 Terman, L. M. (1995): Limits - some are more fundamental than others. Proceedings of
the fourth international Conference on Solid-State and Integrated-Circuit Technology,
Beijing (P. R. China), October 24-28, pp. 7-12
9 Reliability of memories and microprocessors 311
9.3 Pilkuhn, M. H. (1995): Molecular electronics: new prospects for IT. Proceedings of the
fourth international Conference on Solid-State and Integrated-Circuit Technology, Beijing
(P. R. China), October 24-28, pp. 13-20
9.4 Doering, R. R. (1992): Trends in single-wafer processing. Symposium on VLSI Tech-
nologies. Digest of Technical Papers, pp. 2-5
9.5 Takeda, E. (1995): Reliability challenges for giga-scale integration. Proceedings of
RELECTRONIC '95, Budapest (Hungary), October 16-18, pp. 1-16
9.6 Kleppmann, W. G. (1989): WLR Final Report, pp. 125-135
9.7 Feibus, M.; Slater, M. (1993): Pentium Power. PC Magazine, vol. 12, no. 8, pp. 108-120
9.8 Amerasekera, E. A.; Campbell, D. S. (1987): Failure Mechanisms in Semiconductor
Devices. J. Wiley & Sons, Chichester
9.9 Crook, D. L. (1990): Proceedings ofIRPS, pp. 2-11
9.10 Cristoloveanu, S.; Li, S. S. (1995): Electrical Characterization of Silicon-On-Insulator
Materials and Devices. Kluwer Acad. Publ., Boston
9.11 Guichard, E. et al. (1994): IEDM'94 Techn. Digest, p. 315
9.12 Lantz, L. II (1996): Tutorial: Soft errors induced by alpha particles. IEEE Trans. Reliabil-
ity, vol. 45, no. 2, pp. 174-179
9.13 Messenger, G. C.; Ash, M. S. (1986): The Effects of Radiation on Electronic Systems.
Van Nostrand Reinhold
9.14 Rauhut, H. W. (1991): Low alpha epoxy moulding compounds. SPE ANTEC Techn.
Papers, vol. 37, pp. 1260-1264
9.15 Pecht, M. G. et al. (1995): Plastic-Encapsulated Microelectronics. John Wiley & Sons,
New York
9.16 Hasnain, Z.; Ditali, A. (1992): Building-in reliability: soft errors - a case study. Proc. Int.
Reliab. Physics Symp., pp. 276-280
9.17 Srinivasan, G. R. (1996): Modeling the cosmic-ray-induced soft-error rate in integrated
circuits: An overview. IBM J. Res. Develop. vol. 40, no. 1, pp. 77-89
9.18 Pecht, M.; Ramappan, V. (1992): Are components still the major problem? IEEE Trans.
Comp., Hybrids, and Manuf. Technol. vol. 15, no. 6, pp. 1160-1164
9.19 Ghate, P. B. (1991): Industries Perspective on Reliability of VLSI Devices. Texas Instru-
ments
9.20 Semiconductor Industry Association, SlA (1992): SIA Report: Military Ie quality rising
across the board. Military & Aerospace Electronics, p. 50
9.21 Westinghouse Electric Corp. (1989): Failure analysis memos.
9.22 Weber, W. et al. (1991): Dynamic degradation in MOSFET's. IEEE Trans. on EI. Dev.,
vol. 38, no. 8,pp.1859-1867
9.23 Jensen, E.; Schneider, B. (1979): Characterization of RAMs.
9.24 Birolini, A. (1997): Quality and Reliability of Technical Systems. Springer, Berlin
9.25 Woods, M. H.; Rosenberg, S. (1980): EPROM Reliability. Electronics, pp. 133-141
9.26 Mielke, N. R. (1983): New EPROM data-loss mechanisms. 21st Ann. Proc. Int. ReI. Phys.
Symp., pp. 106-113
9.27 Bfljenescu, T. 1. (1978): Sur la fiabilite des memoires bipolaires PROM. Bull. SEVNSE
(Switzerland), no. 6, pp. 268-273
Bfljenescu, T. 1. (1982): ZuverHissigkeit und Systemzuverlassigkeit. Aktuelle Technik, no.
7/8, pp. 9-13
Bfljenescu, T. 1. (1982/1983): Zuverlassigkeit monolitisch integrierter Schaltungen. EPP-
Artikelserie, September 1982 / May 1983
Bfljenescu, T. 1. (1983): Fertigung bestimmt Qualitat. Elektronikpraxis, November, pp.
178-184
312 9 Reliability of memories and microprocessors
10.1
Introd uction
Visible light-emitting diodes LEDs (red, green, yellow, and blue) became indispen-
sable as visual indicators. Combinations of LEDs - in a hybrid or monolithic form
- are among the competitors for the lucrative visible alphanumeric display market.
Reliability of such LEDs is now almost taken for granted; however the main em-
phasis of this chapter will be the understanding of degradation processes in LEDs
and optocouplers. In Fig. 10.1 a classification of optoelectronic semiconductor
components is given.
Optoelectronic
semicond. components
with photodiode
with photoresistor
IR semiconductor lasers and high radiance LEDs (HRLEDs) are still not readily
available and their current prices reflect the fact that today there is neither a large
market, nor real production capability. The potential market for such devices could
consist in various applications for semiconductors laser in optical video and audio
disc reading and writing and in range finders and IR illuminators for surveillance
purposes. In addition, the original fibre optic (FO) communication concept is
enlarging to include not only the land based civil transmission network [10.3] but
undersea transmission and interconnections within telecommunication switching
centres. The HRLED and laser combination of characteristics includes: low
electrical input power, capability for efficient optical coupling into a particular FO,
correct transient response for the particular application and operation over the
required systems temperature range. These characteristics do not deteriorate during
operation (neither catastrophically, nor gradually, the device remaining within the
specification limits for a time sufficient to make systems operation economically
viable).
For visible LEDs, GaAs,.P 1•x and GaP are the forerunners; commercially, GaAsP
- because of their band structure, which permits light emission via direct
recombination between holes and electrons - is the favoured material for red-
emitting LEDs (Fig. 10.2).
In the IR, GaAs and Gal_xAlxAs dominated until now. In order to grow alloys
such as InGaAsP in the form of high quality epitaxial layers on a conveniently
binary substrate material, it is usually necessary to assure lattice matching; this
prevents mismatch dislocations and other defects being formed which could affect
radiative efficiency and reliability. The most common method of growth for
epitaxial layers is liquid phase epitaxy (LPE) although hydride (or chloride) vapour
phase epitaxy (VPE), metal organic chemical vapour deposition (MO-CVD) and
molecular beam epitaxy (MBE) have been tried with varying success for some of
the materials. Radiative emission in the materials system of interest is achieved by
electron or hole injection across heavily forward-biased pn junction or by injection
of both. The diode chip geometry depends on the type of LED or laser required.
The simple cleaved or sawn-sided dice with large area alloyed contacts (Fig. 10.3)
may be modified in different ways to maximise external radiative efficiency [lOA]
depending upon whether the light suffers or not from self-absorption in the
10 Reliability of optoelectronic components 315
particular semiconductor material in use. The light output increases with the
injected current, in a fairly linear fashion.
The main cause of degradation - a consequence of forward biasing - are the
inherent crystal defects; non-radiative recombination centres are formed at these
defect sites, thereby impairing the quantum efficiency of the devices.
Recombination enhanced defect reactions utilise the energy liberated during the
recombination process to induce defect dissociation, defect creation, and defect
migration. The main source of extrinsic failure mechanisms is the packaging;
encapsulants for optoelectronic devices must be transparent and such materials
have high thermal coefficients (60ppmJ°C) compared with Ie encapsulants doped
with alumina and silicon (20ppmJ°C). Hence, the thennal mismatch between the
rest of the package (4 ... 1SppmJ°C) and the encapsulant is a severe problem'. Gold
bonding wires are preferred to the poorer quality of aluminium wires. Failures may
occur as a result of breakage of the wire, kink fonnation caused by thermal
contraction of polymer encapsulants and bond lift-offs. The bonding stress in
ceramic substrates must be controlled to prevent the OCClUTence of cracks;
temperature cycling is a major reliability hazard and soft silicon is recommended
for use in ceramic substrates because of its stable thennal behaviour' r10.13].
n contact wire
n metallisation
p metallisation
n
solder
p
- header
, The outcome of the mechanical stresses generated can be either delamination or separation of
the encapsulating epoxy from the substrate, or high bending stresses may cause bulk epoxy
cracking
Defects in silicon-based optoelectronic devices are less affected by temperature variations than
in gallium-based devices; investigations into GaAIAs devices [10.14] have shown that it is
possibile to obtain different defect types for devices of various structures. A dependence of
failure rates upon initial concentrations of defects and impurities make these processes sensitive
to sLlbtle differences in crystal growth teclmiques. Therefore. is not cautios to assume that
failure rates will be universal across a p3lticular technology.
316 10 Reliability of optoelectronic components
The mechanisms behind degradation and failure are not fully understood; until a
better understanding of the causes and processes of failure will be obtained,
reliability predictions cannot be made with a good degree of confidence.
10.2
LED reliability
Soon after the production of the first LEDs in the early 1960s, the reliability of
LEDs improved reasonably quickly; however this was because of a general im-
provement in materials and fabrication techniques, with little understanding of the
basic factors determining LED reliability. The most obvious manifestation of its
degradation is the gradual decrease in power output when the device is operated at
a constant current, i. e. the spontaneous efficiency decreases with time. Early life
test data were difficult to interpret because the erratic manner in which device pa-
rameter deteriorated, and the considerable variability in the results from device to
device. A time to end of life 3 has become the most favoured parameter for measur-
ing the device reliability.
LED failure is a gradual process; the power output decreases with time, although
not necessarily in a well behaved manner. Although the failure of light emitting
devices became considerably less erratic over the past years, there is a variation in
reliability between LEDs within a specific batch. A common approach is to
consider the lifetest data as a statistical distribution, and to use its characteristic
parameters to describe the device population as a whole.
In the case of the failure of semiconductor components, it appears that there are
no sound physical reasons for the validity of the lognormal distribution4 to
characterise LEDs, although it was suggested that it could occur as a fundamental
consequence of diffusion processes of an Arrhenius type temperature dependence.
A common approach in characterising failure distributions is to find the mean time
to failure by assuming a failure distribution, and extrapolating from the first few
failures. The lognormal and Weibull distributions can be considerably different in
the tails of the distribution, so that large differences in predicted values of mean life
could result. An additional parameter - incorporating both mean and standard
deviation - is often used for components in a telecommunications system and is
known as the 2% reliability life; this is the time at which 2% of the population will
fail. (The requirement for high reliability LEDs comes mainly from the
telecommunications industry, where these devices should carry digital information
between 2Mbitls and several Gbitls). Therefore pulsed operation is a realistic
method of testing, as mechanisms associated only with pulsed operation can be
envisaged.
3 However this time clearly depends on what criteria are used to determine device failure. The end
of life of an LED is the time at which the power output has fallen to either 50%, or lie, of its
original value.
4 One method of understanding the implications of lognormal distribution is to make comparisons
with the normal distribution. Whereas the normal distribution results from the additive effects of
random variables, the lognormal distribution should result when the random variables interact
multiplicatively.
10 Reliability of optoelectronic components 317
5 The lifetime (as usual, 50% degradation defines the end of lifetime) of commercial
optoelectronic components using LEDs - made by one and the same manufacturer - may differ
considerably from batch to batch.
318 10 Reliability of optoelectronic components
10.3
Optocouplers [10.22] . .. [10.29]
10.3.1
Introduction
A crucial problem is that of the current transfer ratio, CTR, changing with time. The
resulting optocoupler's gain change, iJCTR = CTRjinal - CTRinitial., with time is re-
ferred to as CTR degradation 6 . This degradation mus{ be accounted for, if a long,
functional lifetime of a system is to be guaranteed [10.5][10.6][10.7].
10.3.2
Optocouplers ageing problem
The main cause for CTR degradation is the reduction in efficiency of the LED
within the optocoupler. Its quantum efficiency - defined as the total photons per
electron of input current - decreases with time at a constant current. The LED cur-
rent consists primarily in two components: a diffusion current component7 and a
space-charge recombination current
IF (VF) = AeqVI"T + BeQV/2kT (10.1)
6 Numerous studies have demonstrated that the predominant factor for degradation is reduction of
the total photon flux being emitted from the LED, which, in tum, reduces the device's CTR.
7 The diffusion current component is the important radiative current and the non-radiative current
is the space-charge recombination current.
10 Reliability of optoelectronic components 319
Over time - at fixed VF - the total current increases through an increase in the
value of B. From another point of view, with fixed total current, if the space-charge
recombination current increases - due to an increase in the value of B - then the
diffusion current, the radiative component, will decrease. The reduction in light
output through an increase in the proportion of recombination current at a specific
IF is due to both the junction current density J, and junction temperature TJ- In any
particular optocoupler, the emitter current density will be a function not only of the
required current necessary to produce the desired output, but also of the junction
geometry and of the resistivity of both the p and n regions of the diodes. The junc-
tion temperature is a function of the coupler packaging, power dissipation and
ambient temperature. As with current density, high TJ will promote a more rapid
increase in the proportion of recombination current rI 0.2] riO. 91.
K
Transm. of Gain of () Output
optical inteiface outpUI current
amp/it:
An useful model (Fig. 10.4) can be constructed to describe the basic opto-
coupler parameters which are able to influence the CTR. Any coupler can be mo-
delled in this fashion within its linear region. The same Fig. 10.4 shows the system
block diagram which yields the relationship of input current IF to output current 10'
( 10.2)
where K represents the total transmission factor of the optical path. generally con-
sidered a constant as well as R, the resistivity of the photodetector, defined in terms
of electrons of photocurrent per photon, T} is the quantum efficiency of the emitter
defined as the photons emitted per electron of input current and depends upon the
level of input current IF and upon time. Finally fJ is the gain of output amplifier and
is dependent upon IF, the photocurrent, and time. Temperature va-riations would,
of course, cause changes in T}, fJ as well.
From equation (10.2), a normalised change in CTR, at constant 'F, can be ex-
pressed as in (10.3). The first term, (I), iJ.T}/T}, represents the major contribution to
iJ.CTR due to the relative emitter efficiency change; generally, over time, iJ.T} is
negative. This change is strongly related to the input level!F The second term (II)
represents a second order effect of a shift, positive or negative, in the operating
8 For this reason, it is important not to operate a coupler at a current in excess of the manufaclu-
rer's maximum rmings.
320 10 Reliability of optoelectronic components
point of the output amplifier as the emitter efficiency changes. The third term (III)
is a generally negligible effect which represents a positive or negative change in the
output transistor gain over time. The parameters K and R are constants.
10.3.3
eTR degradation and its cause
It is an established fact that the total photon flux emitted by an optoelectronic de-
vice diminishes slightly over the operating lifetime of the device9 . Barring cata-
strophic failures or over stressing of the optoelectronic device, this change of pho-
ton emission is almost imperceptible for many tens or thousands of hours in visual
applications, but can be measured with a sensitive photodetector. At lower stress
currents, the change of light output versus time is reduced. CTR degradation is
important because an excessive amount of degradation or a bad designed system
can cause a reduction in performance and eventual system failure unless an allo-
wance is made for it [l0.7].
Potential causes of CTR degradation are a reduction in efficiency (1]) of the
emitter, a decrease in the transmission of the optical path (K), a reduction in re-
sponsiveness (R) of the photodetector, or a change in gain (/3> of the output ampli-
fier. It is generally accepted that the overwhelming influence in the .1CTR is the
time dependent reduction in the radiated output of the LED. The recorded .1CTR
can be appreciably influenced by the choice of measurement conditions. Also, since
the gain of the output amplifier (/3> is related to its input current, CTR degradation
may be made up by the change in fJ, due to a decrease in photocurrent (Ip) caused
by a reduction in 1].
There are a number of factors which influence the amount of degradation asso-
ciated with the diode. In general, however, degradation is a result of electrical and
thermal stressing of the pn Junction. Combinations of IFS/stress current in the
LED) and tamb (ambient temperature) will produce a spectrum of .1CTR va-
lues
9 This change is often referred to as a degradation of light output, although in some instances, the
light output of a LED has actually increased over time. An optically coupled isolator is an opto-
electronic emitter/detector pair. Any degradation of light output of the emitter will cause a
change in the apparent gain of the entire device. The change in gain of the isolator can be ex-
pressed as a change in CTR over time and is commonly called CTR degradation. This term is
now widely used to describe the phenomenon, and the study of factors intluencing it has grown
considerably in recent years. Semiconductor manufacturers, for their part, are at pains to point
out that the term "degradation" in the above text does not imply that their product is either
poorly designed or of inferior quality, but rather that the process of "degradation" is an inherent
characteristic of junction electro luminescence.
10 Reliability of optoelectronic components 321
15
• 4kh
Fig. 10.5 Effect of varying the stress to monitor ratio (M) on eTR
throughout the stress duration 10. It is emphasised here that the overall degradation
cannot be totally accounted for by the monitor ratio M = IFS II FM and the stress
level (l FS) contributes to the total picture, too, making impossible to completely
isolate the effect of varying M alone. The plots of Fig. 10.5 are intended to give
general trends in behaviour, to enable the designer to appreciate the approximate
effect of varying the monitor current. The M values chosen ranged from I to 100.
Some interesting conclusions may be drawn from the curves in Fig. 10.5. Note
how the degradation measured at high M values (typically with I FM = 2mA) is
"relatively" independent on time on test.
Assume that the degradation mechanism establishes a resistive path in parallel
with the active pn junction. Any current flowing in this resistive shunt will not
generate light. At low I FM values, this alternative path may have appreciable im-
pact on the total device performance, as it offers a low resistance path to substantial
amounts of current. As the current increases however, the low resistance forward
biased pn junction draws the major proportion of the total current and the impact of
the secondary path is' considerably reduced. Using this model, we can understand
how a reduced light output is seen at low I FM currents, when a sizeable percentage
of the LED drive is deflected in this way.
10.3.4
Reliability of optocouplers
Reliability is something that must be "built in", not "tested in". Through proper
design and process control this can be accomplished, thereby reducing the task of
screening programmes which attempt to eliminate the lower tail of the distribution.
10 GaAs can display considerable lot-to-Iot variations; the individual diode chips themselves
reflect only a small fraction of a single wafer of GaAs and each wafer may have a range of va-
rious physical/electrical characteristics across its surface. Considerable impact on work has the
choice of measuring conditions used to monitor the amount of degradation incurred during a
particular stress test.
322 10 Reliability of optoelectronic components
One of the major inspection points in the wafer processing area is the light output
test of each light emitting diode; the major inspection point in the assembly area is
the die attach and the wire bond. For the forty years life of telecommunications
products, the optimal reliability screen would consist generally in 20 temperature
cycles (-65°C to +150°C), followed by variables data read and record, followed by
a 16 hours bum-in (at IF = 100mA, VCEE= 20V, IC = 15mA, Tambient = +25°C),
followed by a variables data read and record. Screening limits are based on both
,,-
-- -
10 ~------,,=-~"----=-~?rl~~~---------~
o
- -"
,,-
10 \00 hsilFM
Bias current effect
Fig. 10.6 IRED output versus time slope prediction curves, assuming a virtual initial time of 50
hours
parametric shift and value. In our experience, temperature cycle is a more effective
screen than stabilisation bake.
Our experience indicates two major problems that must be addressed in the de-
sign of optoelectronic devices, utilising IRED" and phototransistors: the tem-
perature coefficient of expansion and low glass transition temperature of unfilled
clear plastics is much greater than that of the other components, requiring a reduced
temperature range of operation and stronger mechanical construction to maintain
reasonable device integrity; some clear plastics build up mechanical stress on the
encapsulated parts during curing. This stress has been likened to rapid, inconsistent
degradation of IRED light output. Although a filled plastic would stop these phe-
nomena, the filler also spoils the light transmission properties of the plastic.
The "preconditioning" is usually understood to be a stress test (or a combination
of stress tests) applied to devices (i. e. high temperature storage, operating life,
storage life, blocking life, humidity life, HTRB, temperature cycles, mechanical
sequence - which includes solderability - , etc.), after which a screening criteria is
applied to separate good units from bad ones. This criteria may be any combination
of the absolute value and parameter shift levels agreed to by the involved parties.
Since the optocoupler is a hybrid circuit, it is nonnal that the MTBF is lower
than for TTL. It is extremely difficult to find an epoxy (between LED and detector)
II Work on performance degradation has been done to improve GaAs performance and to match
that performance with GaAlAs, a newer, more difficult material (Fig. 10.6) [10.8][10.9].
10 Reliability of optoelectronic components 323
which be transparent and which perfectly matches with the bonding wires at the
same time. Most catastrophic failures are due to thermal stress between epoxy and
bonding wires.
The decrease in quantum efficiency of LEOs is the main reason for CTR degra-
dation of optocouplers. Other - less important - causes of CTR degradation are a
decrease in the transmission of the transparent epoxy, a change in sensitivity of the
photodetector and a change in gain of the output amplifier. It is now known that the
rate of CTR degradation is influenced by the materials and processing parameters
used to manufacture the LED, and the junction temperature of the LED in addition
to the current density through the LED. Several tests have been performed to find a
law of degradation. Some laboratories derived the following formula:
Teff= KCI(JF)n ·e-EIkTJ (lOA)
where:
Teff = x percent of the optocouplers have a CTR of less than m times
. the initial CTR after teff hours of operating time:
C = constant, depends on technology:
J F = current density in the diode (A/cm2);
E = activation energy of the degradation mechanism (eV):
k = constant of Boltzmann (8.62 x 1O- 5eV/K);
TJ = junction temperature of the diode (K);
K = correction factor; depends on current at which CTR is measured (CTR
degradation increases when this current decreases).
Another well known problem is that of intermittently open circuit devices (identi-
fied as thermal opens). In its simplest form, the thermal intermittent results from a
combination of an initially weak bond, acted upon by forces originating from the
thermal mismatch of the constituents of the encapsulating medium. That is why
many quality checks were introduced by manufacturers during the fabrication
process, as well as multiple screenings at elevated temperatures (i. e. 100°C for
thermal continuity, on a 100% basis) on the finished product. The data generated to
date indicate an outgoing quality better as 0.15% for intermittents (if all production
is temperature cycled during manufacture with the aim to remove weak mechanical
bonds).
The solderability (normally the lead frame is an Alloy 42. comprising 42% Ni
and 58% Fe) is checked several times daily during the production process, and - for
special customers - these tests are routinely performed. A change to silver plated
lead frame affects only that part of the frame which is enclosed by the encapsulant.
10.3.5
Some basic rules for circuit designers
e) Design the circuit for a CTR below the minimum specified CTR.
f) Allow a ±30% drift of the coupling factor during operation.
The optocouplers are relatively reliable products when one is aware of CTR degra-
dation while designing a circuit. A well designed circuit should allow CTR degra-
dation, as well as consider the worst case effects of temperature, component to-
lerance, and power supply variations. On the whole, the mechanisms behind degra-
dation and failure of optoelectronic devices are not fully yet understood.
10.4
Liquid crystal displays
Liquid crystal displays LCDs differ from other types of displays in that they scatter
- rather than generate - light. Two basic types are available: reflective (which re-
quire front illumination), and transmissive (which require rear illumination). A
third type - the transflective - combines the properties of the two others and ope-
rates either by reflection of front-surface light or by illumination from the rear. All
of these types of LCDs use a cell filled with liquid crystal material 12 •
50
10
o Root mean
V sat square voltage
Fig. 10.7 Optical response curve of liquid crystal cell. Vth = threshold voltage (threshold at which
response is 10% of maximum); V,at = saturation voltage (voltage at which response is 90% of
maximum)
12 A liquid crystal material is an organic compound (containing carbon, hydrogen, oxygen, and
nitrogen) that has the optical properties of solids and the fluidity of liquids. In the liquid crystal
state - exhibited over a specific temperature range - the compound has a milky, yellow appea-
rance. At high end of the temperature range, the milky appearance gives way to clear liquid; at
the low end of the range, the compound turns to a crystalline solid. The molecules of a liquid
crystal compound are in the form of long, cigar-shaped rods. Because of the special grouping of
the atoms that form these molecules, the rods act as dipoles in the presence of an electrical field.
This field-effect characteristic enables the molecules to be aligned in the direction of the electri-
cal field, and provides the basis for operation of a LCD.
10 Reliability of optoelectronic components 325
The optical response of a liquid crystal cell is shown in Fig. 10.7. When a volta-
ge greater than V sat is applied between a segment contact and the backplane contact,
molecules in the liquid crystal material twist to align themselves with the electric
field in regions of segment and backplane overlap, turning the segment on. The
optical response is the same whether the segment voltage is positive or negative
with respect to the backplane.
DC operation causes electromechanical reactions which reduce the life of a
LCD; it is therefore customary to drive the display with AC waveforms having
minimised DC components. Frequently, these are square waves in the range of
25Hz to I kHz. The response of the LCD is to the rms value of the applied voltage.
10.4.1
Quality and reliability of LeOs
LCDs are rugged devices and will provide many years of service when operated
within their rated limits. The limiting factor in LCD life is the decomposition of the
organic liquid crystal material itself, either through exposure to moisture, prolonged
exposure to ultraviolet light or to chemical contaminants present within the cell.
The design of some LCD manufacturers eliminates these failure modes:
• by providing a hermetic cell incorporating glass to glass and metal to glass
seals;
• by using a liquid crystal that is relatively insensitive to UV light and by incor-
porating an UV screen in the front polariser;
• by specifying and maintaining a high degree of chemical purity during the
synthesis of the liquid crystal, and during subsequent display manufacturing
steps.
A high temperature humid environment will cause gradual loss of contrast over a
period of time, due to degradation of the polarisers. If displays are to be operated or
stored at temperatures >50°C and humidity higher than 60%RH for extended pe-
riods of time, the user should contact the LCD manufacturer for more specific in-
formation.
The price of LCDs bears little relation to the number of digits or complexity of
the information displayed, but is more related to glass area. It is the customer's
advantage not only to reduce glass area in his design, but - where possible - to
utilise standard display external glass sizes, thereby reducing custom display de-
velopment costs.
Today's reliability level (MTBF) of enhanced LCDs is ranging from 50 OOOh up
to values of 100 OOOh or more (Fig. 10.8).
It is to remind, that one of the first LCDs applications was the clectronic watch,
marked by two essential characteristics: (i) The normally imposed LCD lifetime -
without maintenance intervention (except the battery replacement) - is approxi-
mately 50000 h (>8.5 years), and represents an unusual value, asked only for high
performances industrial products. (ii) The expensive watches arc considered as
jewels, for which the aesthetical aspect has a primordial role. That is why, very
326 10 Reliability of optoelectronic components
small optical defects (i. e. small air bubbles, with no function influences) are consi-
dered as valuable denunciation reasons, in other words as failure signs.
10-5
10-6 \
10-7
\
Time t
.25 .5 10 100 (xI000h)
Fig. 10.8 LCD failure rate A. dependence on the time t; typically lifetime: 50 OOOh, A. :0; 10-% for
Us =5V, Tamb =25°C
From a reliability point of view, the principal question is to know how the tech-
nical properties (especially the optical properties) of LCDs change depending on
the ambient conditions and of the lifetime. The specialised literature gives only
very few answers to this question, but recently new more stable crystal materials
have been synthesised, and the quality and the reliability of the LCDs have been
improved.
Generally, we distinguish two types of failure modes: sudden and long term de-
gradation failures. The first ones are normally associated with the blackout of the
LCDs (short-circuits, opens, mechanical failures concerning the tightness, etc.);
the second ones induce an increased consumed power, lost of alignment, reduction
of the isotropic transition temperature, change of the response speed, aesthetical
defaults (lost of contrast, bubbles, etc.) [1 O.IS].
To estimate the lifetime of LCDs, the following methods are utilised:
• lifetime test (+SO°C at 8S% RH);
• storage test at +2SoC, +SO°C, and -20°C, without controlling the humidity;
• thermal shock;
• high temperature test (+SO°C), without controlling the humidity.
One of the arbitrary failure criteria utilised is the 100% increase of the AC absor-
bed. The results of such tests - performed beginning with the year 1972 -reached to
the conclusion that the expected LCDs lifetime is greater than SO OOOh. (~ 10 ye-
ars), with a failure rate of "dO-7/h (at 3V / +2S°C).
10 Reliability of optoelectronic components 327
References
10.23 Bajenesco, T. I. (1982): Le C.N.E.T et les tests de fiabilite des photocoupleurs. L'lndi-
cateur Industriel (Switzerland) no. 9( 1982), pp. 15-19
10.24 Bajenescu, T. I. (1984): Optokoppler und deren Zuverlassigkeitsprobleme. Aktuelle Tech-
nik (Switzerland), no.3, pp. 17-21
10.25 Bajenescu, T. I. (1994): Ageing Problem of Optocouplers. Proc. of Mediteranean Electro-
tech. Conf. MELECON '94, Antalya (Turkey), April 12-14
10.26 Bajenescu, T. I. (1995): Particular Aspects of CTR Degradation of Optocouplers. Pro-
ceedings ofRELECTRONIC '95, Budapest (Hungary)
10.27 Bazu, M. et aI. (1997): MOVES - a method for monitoring and verfying the reliability
screening. Proc. of the 20th Int. Semicond. Conf. CAS '97, October 7-11, Sinaia, pp. 345-
348
10.28 Bajenescu, T. I., Bazu, M. (1999): Semiconductor devices reliability: an overview. Proc.
of the European Conference on Safety and Reliability, Munich, Garching, Germany, 13-17
September; Paper 31
10.29 Ueda, Osamu (1996): Reliability and Degradation ofIII-V Optical Devices. Artech House,
Boston and London
11 Noise and reliability
11.1
Introduction
Much work has been carried out in the past to study the various types of (low-
frequency excess) noise sources as they commonly occur in silicon planar transis-
tors used in monolithic integrated circuits. Some examples of such noise sources
are presented in the following.
• Shot noise:
in metal-semiconductor diodes, pn junctions, and transistors at low injection;
in the leakage currents of FETs;
in light emission ofluminescent diodes and lasers.
• Noise due to recombination and generation in the junction space-charge re-
gion, high-level injection effects (including noise in photo diodes, avalanche
diodes, and diode particle detectors).
• Thermal noise and induced gate noise in FETs.
• Generation-recombination noise in FETs and transistors at low temperatures.
• Noise due to recombination centres in the space-charge region(s) ofFETs, and
noise in space-charge-limited solid-state diodes.
• lIf - or flicker - noise in solid-state devices in terms of the fluctuating occu-
pancy of traps in the surface oxide.
• Contact or low frequency noise.
• Popcorn noise (also called burst noise) in junction diodes and transistors, and
kinetics of traps in surface oxide.
• Microplasma noise.
• Random noise.
• Flicker noise injunction diodes, transistors, Gunn diodes and FETs.
• High-injection noise.
• Excess low-frequency noise.
• Bistable noise in operational amplifiers.
• Pink noise.
The theory of the low-frequency noise of bipolar junction transistor has arisen
many years ago and remained essentially unchanged since its conception.
Unlike the other noise sources, the popcorn noise is due to a manufacturing
defect and can be eliminated by improving the manufacturing process. (e.g. X-ray
examination of transistor wafers showed that the total number of defects increases
with the incident implantation energy). The noise consists typically of random
pulses of variable length and equal height, but sometimes the random pulses
seemed to be superimposed upon each other (Fig. 11.1).
n
Fig. 11.1 Typically burst noise observed at the collector of a transistor [11.16]
11.2
Excess noise and reliability
Extensive studies on silicon bipolar transistor [11.1 ]... [11.4] have shown that noise
phenomena can be classified in two categories: normal and excess noise. The first
one includes the thermal and shot noises, the second the flicker (or lit), the micro-
plasma, the generation-recombination and the burst noises. It is an old assumption
(partly verified [11.5] ... [11.7]) that excess noise could give some information about
, y radiation is shown to increase the low-frequency noise level in linear bipolar devices, while it
tends to cause latch-up of CMOS lCs; X-rays are found to affect MOS devices to a greater
extent than bipolar lCs as a result of the development of positives charges in the oxid layer,
causing a threshold voltage shift. GaAs devices - because they are majority carrier devices - are
relatively radiation hard when compared to silicon devices [11.37].
Noise and reliability 331
11.3
Popcorn noise
Popcorn noise - also called burst noise - was firstly discovered in semiconductor
diodes and has recently reappeared in integrated circuits [11.8] ... [11.11]. If burst
noise is amplified and fed into a loudspeaker, it sounds like com popping. Hence,
the name popcorn noise. He is a curious and undesirable noise phenomenon that
can plague the normal operation of pn junction devices. Popcorn noise is charac-
terised by collector current fluctuations, having generally the aspect of random
telegraph wave, but sometimes, different levels of current pulses can be observed.
It may appear or disappear spontaneously or under particular stress conditions, it
does not occur on all devices manufactured from the same wafer, nor does it occur
on all wafers in a given production loe.
Popcorn noise was first discovered in early 709 type operational amplifiers.
Essentially it is an abrupt step-like in offset voltage (or current) lasting for several
milliseconds and having an amplitude from less than one microvolt to several
hundred microvolts. Occurrence of the pops is quite random - an amplifier can
exhibit several pops per second during one observation period and than remain
popless for several minutes. Worst case conditions are usually at low temperatures
with high values of source resistance Rs. Some amplifier designs and the products
of some manufacturers are notoriously bad in this respect.
Some theories were developed about the popcorn mechanism. In [11.2] and
[11.4] the authors arrived to the conclusion that the burst phenomenon is located
near the surface of the emitter-base junction. In 1969, Leonard and laskowlski
[11.23] postulated that the random appearance and disappearance of microplasmas
in the reverse-biased collector-base junctions of transistors would produce step-like
changes in the collector current. However, Knott [11.24] claimed in 1970 that burst
noise was a result of a mechanism arising in the emitter-base junction, and not in
the collector-base junction. In 1971, Oren [11.22] reported that it would be
premature, without further study, to rule out either of the aforementioned models. A
closer look indicates that different mechanisms are indeed at play (e. g. modulation
of leakage current flowing through defects located in the emitter-base space-charge
region; surface problems; metal precipitates; dislocations) and an unique answer is
not yet available. Roedel and Viswanathan [11.12] observed that in Op. Amp. 741
there was a very strong correlation between the intensity of the burst noise and the
density of dislocations on the emitter-base junction. Martin and Blasquez [11.14]
2 We have checked the percentage of burst noise incidence in relation to position of the units on
the wafer (central versus peripheral) and the results show larger incidence rate for the peripheral
devices.
332 Noise and reliability
) In [1.11] it was detennined that - in order to reduce burst noise - one or more of the following
steps had to be accomplished: a) remove or neutralise the recombination-generation centres; b)
remove the metal atoms from the crystal, or at least prevent them from precipitating at the
junction; c) reduce or eliminate the surface junction dislocations. The first step was abandoned
because of the impossibility of removing all bulk and surface trapping centres.
Noise and reliability 333
11.4
Flicker noise
All solid-state devices show a noise component with a lit" spectrum, where n == l.
This type of noise is known as flicker noise or 1/fnoise. It has been demonstrated
that this lIf noise spectrum holds down to extremely low frequencies; FirIe and
Winston [11.14] have measured 1If-noise at 6.10-5 Hz. Experiments made by Plumb
and Chenette [11.21] indicated that flicker noise in transistors can be represented
by a current generator if] in parallel with the emitter junction. Theoretically, a par-
tially correlated current generator in in parallel with the collector junction may be
used, but careful experiments have shown that its effect is so small, that it can be
neglected.
In normal operating conditions, the excess noise consists essentially (over all the
low frequency range) of flicker and burst noises; they may be represented by two
equivalent current generators connected between the input terminals of the
transistor (Fig. 11.2).
11.4.1
Measuring noise
Noise measurements are usually done at the output of a circuit or amplifier, for two
reasons: (i) the output noise is larger and therefore easier to read on the meter; (ii) it
avoids the possibility of the noise meter upsetting the shielding, grounding or bal-
ancing of the input circuit of the device being measured.
In order to make excess noise predominant comparatively we have utilised the
HTRB step stress test (one week storage; starting temperature 150°C; 25°C/step)
followed by 24h stabilisation at normal ambient temperature, with shortened
junctions4. This enables to select high reliability transistors by a previous noise
measurement; the selection principles are: (a) acceptance of the only transistors
with a low flicker noise level; (b) rejection of the entire lots having an important
4 The testing of a sample is stopped and a failure analysis made when SO% of the transistors
shows a DC current gain higher than SO% of the iuitial gain. The transistor under test must be
biased across a large external base resistor and the measurement made at 30Hz. For a valid
comparison, the emitter-voltage must be kept at the same value and the noise must be measured
with a constant base current [1l.lS].
334 Noise and reliability
proportion of elements with burst noise; (c) rejection of the lots having a high
average value of the flicker noise spectral density (fig. 11.3 and Table 11.1).
11.4.2
Low noise, long life
This is the conclusion of our reliability tests: by measuring the excess noise it is
possible to make reasonable prediction about life expectancy of the devices by
mean of a non destructive test. A large increase in excess noise occurs just prior to
failure; units with low initial values of noise current have a longer life under artifi-
cial ageing.
Some findings on perfect crystal device technology (PCT) for reducing flicker
noise in bipolar transistors [11.25]: (i) The flicker noise can be drastically reduced
by eliminating various crystal defects such as dislocation and precipitates, and
achieving low Si/Si02 state density with the use of P/As mixed doped oxide
diffusion technique. It is worth to mention the disappearance of burst noise by
employing PCT. (ii) The degree of dislocation generation during diffusion process
depends on the grown-in dislocation density; the smaller, the better. (iii) Diffusion-
induced dislocation density depends on the crystal orientation. (Ill) turned out to
be the best so far as the dislocation is concerned.
11.5
Noise figure
Noise figure NF is the logarithm of the ratio of input signal-to-noise and output
signal-to-noise.
NF = 10 log[(S/NJin/ (S/NJouJ (11.1 )
where S and N are power or (voltage)2 levels.
This is measured by determining the SIN at the input with no amplifier present,
and then dividing by the measured SIN at the output with signal source present. The
values of Rgen and any Xgen as well as frequency must be known to properly express
NF in meaningful terms.
We desire a high signal-to-noise ratio SIN; it also happens that any noisy
channel or amplifier can be completely specified for noise in terms of two noise
generators en and in as shown in Fig. 11.4. The main points in selecting low noise
amplifiers are:
(i) Don't pad the signal source; live with the existing Rgen.
(ii) Select on the basis of low values of en and especially in if Rgen is over about a
thousand ohms.
(iii) Don't select on the basis of NF. NF specifications are all right so long as
you know precisely how to use them and so long as they are valid over the
frequency band for the Rgen (or Zgen) with which you must work.
(iv) The higher frequencies are often the most important unless there is low
frequency boost or high frequency attenuation in the system [11.26].
Noise and reliability 335
,----------------------------------------------
.... -----_._-----------------------------------------------------------------------,
i i
o
·ce····-I _
e,ig Input en
: Output
?-----------'----l
i
L. _________________ .t:!"_?_~~_:_!:~~~~ ____________________________________ J
Fig. 11.4 Noise characterisation of an operational amplifier [11.26]
Avoid the applications requiring a high gain (> 60dB), because the amplified
noise (:= 2/l V) can reach the audio domain. For high reliability systems, all the
components having burst noise should be rejected; also all the batches with an
important proportion of components having lIf-noise or burst noise should be
rejected. Only the components with a reduced noise level should be accepted.
Avoid the utilisation of too great resistances in your circuits. Minimise the external
noise sources.
The noise spectroscopy [11.38] ... [11.41] gives information on trap parameters
located in pn junction depletion layer. Noise reliability indicator in forward
direction is defined as the ratio between the maximum value of the noise spectral
336 Noise and reliability
density (measured on a load resistance) and its thermal noise spectral density. As a
noise reliability indicator for reverse bias operation, the ratio of breakdown voltage
for ideal junction and reverse voltage of soft breakdown was introduced [11.41].
Burst noise is used as the third reliability indicator.
11.6
Improvements in signal quality of digital networks
References
11.1 Bajenescu, T. I. (1985): Excess noise and reliability. Proceedings ofRELECTRONIC '85,
Budapest (Hungary), pp. 260-266
11.2 Jaeger, R. C.; Brodersen, A. J. (1970): Low frequency noise sources in bipolar junction
transistors. IEEE Trans. on Electron Devices, ED-17, no. 2, p. 128
11.3 Martin, J. C. et al. (1966): Le bruit en cn\neaux des transistors plans au siliciurn. Elec-
tronics Letters, June, vol. 2, no. 6, pp. 228-230
(1971): Le bruit en creneaux des transistors bipolaires. Colloques Internationaux du
C.N.R.S. no. 204,pp. 59-75
(1972): Correlation entre la fiabilite des transistors bipolaires au siliciurn et leur bruit de
fond en exces. Actes du Colloque Internat. sur les Compos ants Electroniques de Haute Fi-
abilite, Toulouse, pp. 105-119
(1972): L'effet des dislocations cristallines sur Ie bruit en creneaux des transistors bipo-
laires au siliciurn. Solid-State Electronics, vol. 15, pp. 739-744
11.4 Brodersen, A. J. et al. (1971): Low-frequency noise sources in integrated circuit transis-
tors. Actes du Colloque International du C.N.R.S., Paper II-4
11.5 Curtis, J. G. (1962): Current noise indicates resistor quality. International Electronics,
May 1962
11.6 Ziel, van der, A.; Tong, H. (1966): Low-frequency noise predicts when a transistor will
fail. Electronics, vol. 23, Nov. 28, pp. 95-97
11.7 Hoffmann, K. et al. (1976): Ein neues Verfahren der Zuverlassigkeitsanalyse fur Hal-
bleiter-Bauteile. Frequenz vol. 30, no. 1, pp. 19-22
Noise and reliability 337
11.8 Ott, H. W. (1976): Noise reduction in electronic systems. Wiley Interscience, New York,
1976
11.9 Noise in physical systems (1978). Proceedings of the Fifth Internat. Conf. on Noise, Bad
Nauheim, March 13-16, Springer Verlag, Berlin, 1978
11.10 Prakash, C. (1977): Analysis of non-catastrophic failures in electronic devices due to
random noise. Microelectronics and Reliability vol. 16, pp. 587-588
11.11 Knott, K. F. (1978): Characteristics of burst noise intermittency. Solid-State Electronics
vol 21,pp. 1039-1043
11.12 Roedel, R; Viswanathan, C. R (1975): Reduction of popcorn noise in integrated circuits.
IEEE Trans. Electron Devices ED-22, Oct., pp. 962-964
11.13 Martin,1. c.; Blasquez, G. (1974): Reliability prediction of silicon bipolar transistors by
means of noise measurements. Proceedings of 12th International Reliability Physics
Symp.
11.14 Bajenesco, T. I. (1981): Probh!mes de la fiabilite des composants electroniques actifs
actuels. Masson, Paris, pp. 163-169.
(1996): Fiabilitatea componentelor electronice. Editura Tehnicii, Bucharest (Romania),
pp.312-324
11.15 Firle, J. E.; Winston, H. (1955): Bull. Ann. Phys. Society, tome 30, no. 2
11.16 Blasquez, G. (1973): Contribution a I'etude des bruits de fond des transistors ajonctions
et notarnment des bruits en lIf et en creneaux. These doctorat no. 532, Univ. P. Sabatier,
Toulouse
11.15 Luque, A. et al. (1970): Proposed dislocation theory of burst noise in planar transistors.
Electronics Letters, vol. 6, no. 6, 19th March, pp. 176-178
11.16 Koji, T. (1974): Noise Characteristics in the Low Frequency Range of lon-Implanted-
Base-Transistor (NPN type). Trans. Inst. Electron. & Com. Eng. Jap. C, vol. 57, no. I, pp.
29-30
11.17 Jaeger, R. C. et al. (1968): Record ofthe 1968 Region III IEEE Convention, pp. 58-191
11.18 Giralt, G. et al (1965): Sur un phenomene de bruit dans les transistors, caracterise par des
creneaux de courant d'amplitude constante. C. R Acad. Sc. Paris, tome 261, groupe 5, pp.
5350--5353
11.19 Caminade, J. (1977): Analyse du bruit de fond des transistors bipolaires par un modele
distribue. These de doctorat, Universite P. Sabatier, Toulouse, France
11.20 Le Gac, G. (1977): Contribution a I'etude du bruit de fond des transistors bipolaires:
influence de la defocalisation. These de doctorat, Universite P. Sabatier, Toulouse, France
11.21 Plumb, J. L.; Chenette, E. R. (1963): Flicker noise in transistors. IEEE Trans. Electron
Devices, vol. ED-I0, pp. 304-308
11.22 Oren, R. (1971): Discussion of Various Views on Popcorn Noise. IEEE Trans. on Elec-
tron Devices, vol. ED-18, pp. 1194-1195
11.23 Leonard, P. L.; Jaskowlski, L. V. (1969): An investigation into the origin and nature of
popcorn noise. Proc. IEEE (Lett.), vol. 57, pp. 1786-1788
11.24 Knott, K. F. (1970): Burst noise and microplasma noise in silicon planar transistors. Proc.
IEEE (Lett.), pp. 1368-1369
11.25 Yamamoto, S. et al. (1971): On perfect crystal device technology for reducing flicker
noise in bipolar transistors. Colloques intern at. du CNRS no. 204, pp. 87-89
11.26 Sherwin, 1. (1974): Noise specs confusing? National Semiconductor AN-104
11.27 Grivet, P.; Blaquiere, A. (1958): Le bruit de fond. Masson, Paris
11.28 Ziel, A. van der (1970): Noise: sources, characterization, measurement. Prentice Hall,
Englewood Cliffs
338 Noise and reliability
11.29 Motchenbacher, C. D.; Fitchen, F. C. (1973): Low-noise electronic design. John Wiley &
Sons, New York
11.30 Cook, K. B. (1970): Ph. D. Thesis, University of Florida
11.31 Soderquist, D. (1975): Minimization of noise in operational amplifier applications. AN-15
of Precision Monolithics Inc., Santa Clara, California
11.32 Bilger, H. R. et al. (1974): Excess noise measurements in ion-implanted silicon resistors.
Solid-State Electronics vol. 17, pp. 599-605
11.33 Bajenesco, T. I. (1977): Bruit de fond et fiabilite des transistors et circuits integres. La
Revue Polytechnique no. 1367, pp. 1243-1251
11.34 Wolf, D., editor (1978): Noise in physical systems. Proc. of Fifth Internat. Conf. on
Noise, Bad Nauheim, March 13-16, Springer Verlag, Berlin
11.35 Boxleitner, W. (1989): Electrostatic Discharge and Electronic Equipment. IEEE Press,
New York
11.36 Frey, o. (1991): Transiente Storphenomene. Bull. SEVNSE, vol. 82, no. 1, pp. 43-48
11.37 Amerasekera, E. A.; Campbell, D. S. (1987): Failure mechanisms in semiconductor
devices. J. Wiley and Sons, Chichester
11.38 Kirtley, J. R. et al. (1987). Proc. of the Internat. Conf. on Noise in Physical Systems and
IIfFluctuations, Montreal
11.39 Schultz, M.; Pappas, A. (1991): Telegraph noise of individual defects in the MOS inter-
face. Proc. of the Internat. Conf. on Noise in Physical Systems and lIf Fluctuations,
Kyoto, Japan
11.40 Jones, B. K. (1995): The sources of excess noise. Proc. of the NODITO workshop, Brno,
CZ, July 18-20
11.41 Sikula, J. et al. (1995): Low frequency noise spectroscopy and reliability prediction of
semiconductor devices. Proc. of RELECTRONIC '95, Budapest (Hungary), October 16-
18,pp.407-412
11.42 Ciofi, C. et al. (1995): Dependence of the electromigration noise on the deposition tem-
perature of metal. Proc. ofRELECTRONIC '95, Budapest (Hungary), October 16-18, pp.
359-364
11.43 Schauer, P. et al. (1995): Low frequency noise and reliability prediction of thin film
resistors. Proceedings ofRELECTRONIC '95, Budapest (Hungary), October 16-18, pp.
401-402
11.44 Koktavy, B. et al. (1995): Noise and reliability prediction of MIM capacitors. Proc. of
RELECTRONIC '95, Budapest (Hungary), October 16-18, pp. 403-406
11.45 Yiqi, Z.; Qing, S. (1995): Reliability evaluation for integrated operational amplifiers by
means of l/f noise measurement. Proc. of the Fourth Internat. Conf. on Solid-State and
Integrated-Circuit Technology, Beijing (China), October 24-28, pp. 428-430
11.46 Guoqing, X. et al. (1995): Improvement and synthesis techniques for low-noise current
steering logic (CSL). Proc. of the Fourth Internat. Conf. on Solid-State and Integrated-
Circuit Technology, Beijing (China), October 24-28, pp. 634-636
11.47 Merkelo, H. (1993): Advanced methods for noise cancellation in system packaging. 1993
High Speed Digital Symposium, University of Illinois, Urbana
12 Plastic package and reliability
12.1
Historical development
In the beginning, only metallic packages were used for transistor encapsulation.
These type of packages seemed to be very reliable, both for military and civilian
applications. In 1962, General Electric used for the first time plastic packages for
transistors. Thus, the costs were significantly reduced, even with 90% in some
cases [12.1]. First, plastic encapsulated transistors were developed for mass con-
sumption, without taking into account the reliability or the environment. Therefore,
the low cost of these new transistors called rapidly industry and army's attention.
Consequently, starting from 1964, their market increased appreciably. Almost im-
mediately, the weaknesses referring to the reliability were revealed, especially in
combined conditions of high temperature and moisture, when the failure rate in-
creases dramatically compared with the metal encapsulated transistors' one. This
explains why, with rare exceptions, at the time, the plastic package was not ac-
cepted by the army.
In the 60's, the manufacturers of semiconductor devices published results [12.2]
trying to prove that plastic encapsulated transistors fulfil the technical requirements
of the American military standards (referring to metal packages) and, therefore,
they can successfully replace the metal encapsulated transistors. The military and
industrial users asserted the opposite [12.3], especially for combined test of high
temperature and humidity. In 1968, Flood [12.4], from Motorola, performed reli-
ability tests with duration of thousands of hours, varying the temperature and hu-
midity conditions, and arrived to the idea that the vapour pressure is the most ap-
propriate stress for evaluating the effect of the moisture on plastic encapsulated
transistors. It resulted that the humidity has a significant effect on the failure rate.
Baird and Peattie [12.2], from Texas Instruments, have asserted that they obtained
satisfactory results for the tests stipulated by the method 106B of MIL-STD-202C
and that the failure rate doubles its value if the components undergo a relative hu-
midity of 70%, at 55°C, for 5000 hours. But this it seemed to be a deficiency of the
method 106B, and methods more appropriate for plastic encapsulated transistors
were needed. In the same work, by comparing the same transistor encapsulated in
metal and in plastic, respectively, the conclusion was the better reliability offered
by the metal package. In 1968, Anixter [12.3], from Fairchild, believed that there
are some unsolved problems about plastic encapsulation and recommends not using
this type of package for military applications. Also, in 1968, Diaz [12.5], from
Burroughs, arrived to the same conclusion.
As a result of these contradictory reports, US Army Electronics Command de-
cided to organise a complete programme of reliability tests about plastic encapsu-
lated devices. In a research report from 1971, Fick [12.6] summarised the main
results of these reliability tests, performed in Panama. Fick used instead of com-
bined temperature and humidity cycles (as method 106 indicated), constant high
temperature and high humidity tests. He assumed that in this way the accelerated
failure rate was correlated with the operational conditions. As the most detrimental
conditions must be tested, experiments at the Tropics, were high temperature and
humidity is naturally combined, were also performed. The conclusions of this study
may be summarised as follows:
• The transistors intended to commercial purposes were the weakest. Their cur-
rent gain increased from 100 .. 200 to 1000 ... 2000, without a plausible explana-
tion for this phenomenon to be furnished.
• The study about the materials used for various plastic packages could offer
valuable information about the reliability of plastic encapsulated devices.
• Another aspect that it is worth to be studied is the effect of mechanical shocks
and vibrations on plastic encapsulated devices.
• It is necessary to specity the test requirements for plastic encapsulated
transistors and the failure criteria (such as: ICBO max (V CB= 16V) = 50nA and hFE
(VCE= IV, Ic= 2mA) = 60 .. .300).
After 1980, a significant improvement in the performance of semiconductor
devices was obtained. In a study from 1996, performed by the Reliability Analysis
Centre (RAC), field failure rates from one-year warranty data were analysed [12.7].
It seemed that both for hermetic and for nonhermetic devices a decrease of more
than 10 times in the failure rate was found between 1978 and 1990. In another
study, reported in 1993, a 50 times decrease of the failure rate of PEM (Plastic
Encapsulated Microcircuits) over the period 1979 to 1992 was found [12.7]. These
results are confirmed by many other industry studies. The reason is very simple:
covering 97% of worldwide market sales, the plastic encapsulated semiconductor
devices were the most studied devices. Also, the absence of the severe controls of
Military Standards allows a continuous process improvement, leading to the
mentioned results. And, eventually, a major cultural-change arisen in the
procurement politics for military systems. Known as the Acquisition Reform, this
new approach encourages the use of plastic encapsulated devices in DoD
(Department of Defence of the US Army) equipment, and - as a consequence - in
the military systems of all countries. The needed steps for implementing this new
system will be detailed in 12.8.
12 Plastic package and reliability 341
12.2
Package problems
From a reliability viewpoint, one of the most important parts of the electronic com-
ponent is the package. The experience indicated that the majority of failures arise
because the encapsulation could not fulfil its role to protect the die. The integrated
circuits encapsulated in plastic and in metal package, respectively, have a different
behaviour, depending on the environmental stress. Thus, a plastic package is more
resistant to vibrations and mechanical shocks because the wires are hold by the
plastic mass. On the contrary, plastic encapsulated integrated circuits are not tight
and may have intermittence of the solder joints at the temperature changes. The
thermal intermittence becomes manifest for all types of integrated circuits, but
especially for LSI memories. Generally, this is an effect depending on the com-
plexity of the circuit and one can reduce it with an order of magnitude if the manu-
facturing process is well monitored. One may note that plastic encapsulation is a
relatively simple technology with good properties for mechanical shocks and vi-
brations.
For plastic encapsulation of semiconductor devices, only thermoreactive resins
are used (e.g.: for a series production, a combination of phenol and epoxy resins or
silicone resins). The moulding material contains a basic resin, a drying agent, a
catalyst, an inert material, and an agent for firing delay and a material facilitating
the detaching of the package after the moulding operation.
The English standards D3000, D4000 and 11219A stipulate three levels of reli-
ability for the plastic encapsulation of semiconductor devices, the first two having a
cumulated failure rate of 2% and 10%, respectively, for an operational life of 40
years. Generally, the surface contamination may lead to various failure modes, such
as: the diminution of the current gain of a transistor, the increasing of the leakage
current, the corrosion of the aluminium metallisation, etc., accelerated by the ionic
impurities from the moulding material, especially in a humid environment. A spe-
cific failure mode for the plastic package is the mismatch between the dilatation
coefficients of the plastic material and of the other constituent parts (frame, gold
wires, and die), which may lead to open or intermittent contacts.
About 90% of the electronic components used today are plastic encapsulated. A
hermetic encapsulated semiconductor die costs, on an average, twice than its plastic
equivalent [12.8].
The majority of plastic encapsulated semiconductor devices have some inherent
failure mechanisms, such as ionic contamination and mechanical stress, which may
bring about open circuits. Moreover, the ionic contamination may distort the elec-
trical parameters of a device (examples are the increase of the leakage current for a
reversely biased pn junction or the change of the threshold voltage for a MOS tran-
sistor).
The external sources of ionic contamination are salt mist, industrial atmosphere
and corrosive solder flux. The corrosion may be a chemical one, a galvanic one, or
- with an external bias - an electrolytic one. The time period till the appearance of
a short circuit depends on temperature, relative humidity, presence of ionic
342 12 Plastic package and reliability
minants, type, plastic purity and mechanical design of the package, geometry of
aluminium interconnections. From this simple enumeration, it is obvious that to
predict the reliability of a certain plastic encapsulated semiconductor device is not
an easy task.
To outline the extreme importance of this problem, one must mention that in the
beginning of the microelectronic revolution, the Department of Defence of USA, in
co-operation with NASA and Federal Aviation Administration, created an ad-hoc
committee for plastic encapsulated semiconductor devices, with two working
groups: one for measuring methods and proceedings, and another for research and
development on plastic materials.
12.2.1
Package functions
12.3
Some reliabilistic aspects of the plastic encapsulation
Normally, one may consider that there are three main aspects of the plastic encap-
sulation of semiconductor devices.
1. The stability of the electrical characteristics of the die. One of the most im-
portant degradation factors is the ion contamination due to the moulding mate-
rial, which may lead to the formation of an immersion layer at the surface of
the die. This layer produces the degradation of the electrical characteristics of
the device. The test currently used for the identification of this degradation
mode is the ageing at a high temperature reverse bias.
2. The resistance of the internal connections. For the devices in plastic package,
it is much more important than for hermetic packages to have very good me-
chanical connections, because [12.10,12.11]:
• at the moulding operation, the connection wires undergo a stress produced by
the injection of the moulding material;
• the dilatation coefficients of various materials are different, producing a
mechanical stress which cannot be neglected at extreme temperatures;
• the connection wires are included in plastic material on their whole length.
3. The resistance of the plastic package in hostile environment. This is the most
important factor which makes up the reliability of the plastic encapsulated
devices, because the degradation due to a lack of hermeticity begins with the
penetration of moisture into the package, reaching the die, especially along the
contact area between the moulded material and the metallic frame.
The main parameters characterising the resistance to humidity of a package are:
• the relative hermeticity,
• the dilatation coefficient of the moulding material,
• the quantity of hydro Ii sible contaminants in the moulding material,
• the die resistance to corrosion.
The experience showed that the accelerated test the most rich in signification (but
also the most controversial) for the evaluation of the resistance to humidity is the
ageing in functioning at high temperature (+85°C) and in a humid environment
(relative humidity 85%, deionised water). The bias must lead to a minimum
dissipation on a die, but with a maximum voltage gradient between the
neighbouring aluminium conductors. The penetration of the moisture depends on
the partial vapour pressure. However, one may emphasise that for this kind of tests
the ions arise essentially from the plastic package itself, while in an operational
environment, they are brought from the outside, by the moisture [12.10].
344 12 Plastic package and reliability
12.4
Reliability tests
The first distinction that must be made is between the discrete components and the
integrated circuits. While plastic encapsulated discrete devices are used mainly for
mass consumption, plastic encapsulated integrated circuits enlarge constantly their
utilisation field. This explains the user expectations for high reliability perform-
ances, practically equivalent with those for hermetically (metal or ceramic) encap-
sulated integrated circuits. But, one must not forget that there are specific failure
modes, created or accelerated by plastic encapsulation. To eliminate some of these
specific failure modes, the manufacturers introduced the following improve-
ments[12.11]:
• die passivation (that is the deposition on all the surface of a protective glass
layer, in which the contact area for bonding the connection wires between die
and metallic frame is to be etched);
• recovering ofthe wires after bonding with a high purity protective resin;
• impregnation of the package, after moulding, with resins liable to fill the holes
or the microcracks which could exist at the interface frame / moulding
material.
Cumulatedfailures
(%)
80
1 •••••
2/
60
/
/
/ , .... - - - - -.'..
'.
40 ~
/
/ ~ " " 4 •• ••••
............... .' 6
20
o
o 200 400 600 800 1000 1200
Number of cycles
Fig.12.t Results of destructive tests perfonned with thennal shocks (MIL-STD-883, method 1011,
level C, -65°C ... +125°C) for various package types [12.12]: 1 - epoxy without die protection; 2 -
silicone with detrimental package protection; 3 - epoxy with die protection; 4 - silicone with
nonnal die protection; 5 - ceramic package; 6 - phenol package with die protection; 7 - flat pack
F or integrated circuits, the wires must resist to a pulling force of 10gf, in the
bonding machine control, while for the metallic packages the force level is 1-2gf.
12 Plastic package and reliability 345
12.4.1
Passive tests
For a grosso modo study of the thermal cycling conditions for electronic
equipment, the company National Semiconductors [12.16] employed two automatic
chambers for evaluating the various types of plastic materials used for
semiconductor encapsulation. The tested devices were transported from a cold
room (O°C) to a warm one (lOO°C) and conversely, at each 10 minutes, in a passive
test (without electrical biasing). The temperature of the junction is the same with
the ambient one. The Fig.12.2 resumes the results of these tests.
- .-
Cumulatedfailures
- - <:
(%)
10
.. .'
/i 2
.. ••••••
........'./ .
' ~
..... ~
......... /
................... .; .;
... ... ...
...
.;
.;
0.1
.
""------
." 3
... .","""
6
5
0.01
40 150 500 1K UK 2K Number of cycles (0 ... 125°C)
Fig. 12.2 Results of temperature cycling tests for various types of plastic encapsulation [12.15]; to
be noted the good behavior of encapsulant no. 6 (epoxy A, without die protection) and, especially,
the remarkable behavior of the encapsulant no. 5 (epoxy B, without die protection)
346 12 Plastic package and reliability
No screening test was perfonned. One must note the remarkable behaviour
observed for epoxy A and B, no failure being registered after 200 cycles. This
proves that the failure rate was smaller than one failure at 106 devices cycles and
corresponds approximately to a failure for 500 devices functioning 5.5 years in an
equipment working once per day, seven days per week.
The failure analysis showed that the main failure mode after the first hundred of
cycles was the breaking of the connection wires due to the material fatigue, because
the connections were frequently stretched by repeated dilatations and contractions
of the surrounding encapsulant. This allows deducing the fact that the dilatation
coefficients of the epoxy A and B are closed with those of the golden wires, up to
about + 115°C. Beyond this temperature (called transition temperature of the glass),
the increase of the mentioned coefficients is important, explaining the material
fatigue. For other encapsulating material, the failure was due to the important
values of dilatation coefficients of the moulding layer and 1or to the combination of
the drift or dilatation characteristics of the silicone resins. As a conclusion, the
results about the passive temperature cycling show that, for epoxy A and B, a small
percentage «0.03%) of intennittent failures depend on temperature ("hot
intermittent failure").
Valuable infonnation about the results of passive tests was furnished by McCoog
[12.7]. In a series of tests perfonned in 1986 by Rockwell International, at extended
temperature cycling (-401+80°C, 883 cycles), a higher failure rate (6.1 % - 2 failures
observed per million hours) was found for ceramic devices than for plastic ones
(1.6% - 1 failure per million hours). In 1987, in a study perfonned by Motorola,
similar results were reported for plastic and ceramic encapsulated devices,
undergoing temperature cycling (-65/+ 150°C, 1000 cycles): 0.083%/1000 cycles
for plastic and 0.099%/1000 cycles for ceramic, respectively. In 1989, Motorola
repeated the experiment, in the same conditions, and reported higher failure rate
values, but also equal values for plastic (0.44%11 000 cycles) and ceramic
(0.38%11 000 cycles). In the 90's, similar decisions were given: no reliability
advantage between plastic encapsulated microcircuits (PEM) and ceramic
encapsulated ICs. Condra et al. [12.17] reported such a result for temperature
cycling (-55/+85°C, 1000 cycles): more than 20-year useful life for both plastic and
ceramic packages. Weil et al [12.18] used also temperature cycling (-65/+150°C)
and obtained 1-2 device failures after 2 ... 20 millions cycles, for both plastic and
ceramic packages.
12.4.2
Active tests
Another series of tests, the active ones, are perfonned under a bias and with a
charge - power cycle test - increasing the junction temperature up to at least
+100°C. The devices are connected for 2 minutes and 30 seconds and than
disconnected for another period of 2 minutes and 30 seconds. The thenno-
mechanical stresses generated by this test approximate well those appearing in the
real functioning of the devices in an equipment which is connected and
disconnected. The experience [12.19] showed that the observed failure rate is
smaller than 0.17 failures for 10 6 devices X cycles, which is equivalent with a
12 Plastic package and reliability 347
12.4.3
Life tests
As one already knows, the performances and the reliability of the transistors and
integrated circuits (bipolar and MOS) may be degraded by surface problems, ther-
mally activated and associated to unwanted contaminants (mobile ions, polar mole-
cules, etc.). From this viewpoint, the organic encapsulants (such as epoxy resins)
are well known as having surface problems. As a result of numerous life tests, the
conclusion was drawn that [12.5 ... 12.22] epoxy B is a "clean" system assuring a
high reliability in normal operation conditions.
The epoxy B encapsulants were not allowed for military applications, because
they let to penetrate, to a certain extend, the moisture, an unacceptable fault for
military requirements. It is true that the huge majority of industrial applications do
not have such demanding requirements as those for military applications. One
knows that, in the beginning, in the period 1965-1970, the silicone resin packages
demonstrated excellent properties and performances at humidity accelerated tests,
but they did not pass the military examination about the functioning in a saline
atmosphere. This type of packages offers, however, a better reliability, especially in
the extreme conditions of a humid environment. In the 80's, the improvement of the
epoxy resins and the excellent mechanical reliability demonstrated with tests per-
formed by National Semiconductors led to a re-evaluation of the epoxy encapsula-
ted integrated circuits, because epoxy B proved to be much more resis-tant to hu-
midity than the other epoxies (especially A type). Even the tests performed in a
saline atmosphere (according to MIL-STD-883, method 1009, condition B: 48
hours, 271 ICs, bipolar and MOS ones, digital and linear ones) did not produce
supplementary failures, the epoxy B package having an improved thermal resistan-
ce (with about 10% smaller) than the silicone package, which allows to the juncti-
ons to operate at lower temperature and thus to attain a better reliability.
The Table 12.1 r12.211 resumes the comparative features of the most used
moulding compounds.
Cumulatedfailures
(%)
98
90
80
50
20
10
Average lifetime
(h)
10'
~O"C
[r.h./
Fig. 12.4 Average lifetime for an integrated circuit plastic encapsulated (DIL, 14 pins) vs. [r.h.1 2
Life tests perfonned in the early 90's demonstrated also the same failure rate
value for both plastic and ceramic encapsulated ICs. Schultz et al r12.25] reported
failure rate values of 1 FIT (l0-9h- 1) at +55°C for both plastic and hermetic ICs.
Another study, also cited in [12.7], demonstrates the higher increase of the reliabi-
lity for plastic encapsulated ICs than for ceramic encapsulated ones (see Table
12.2).
Table 12.2 A comparison between the 1979-1992 decrease of failure rates (in FITs) for plastic
and ceramic packages, respectively
1992 0.2 2
Logic 1979 10 6
1992 0.2 3
As one can see, in 1992, the plastic packages were more reliable than ceramic ones.
12.4.4
Reliability of intermittent functioning plastic encapsulated ICs [12.26]
If, on the contrary, the power dissipated by the component is negligible (this is
the case of CMOS devices), and the overtemperature of lOoC cannot be obtained,
or if the component operates intermittently (instead of continuously), a detailed
analysis before any extrapolation must be made. Strohle [12.27] studied the most
detrimental testing cases for intermittent functioning simulation and he investigated
also the extrapolation of the typical results of the 85°C/85% r.h. test for the
intermittent functioning case.
At the intermittent functioning of plastic encapsulated components, the typical
failures are produced by humidity and by mechanical tensions arisen at rapid
changes of temperature (due to different temperature coefficients of the die and of
the package). The typical failure modes are die detachment, die scratches and drift
of electrical characteristics.
The humidity produces the following failure mechanisms:
• corrosion of aluminium pads,
• bit defects (for static and dynamic memories),
• drift of the threshold voltage, brought about by RO and ORO ions.
All these failure mechanisms become manifest by high leakage currents at the
surface. To establish the size of these currents, a special chip with long pads
covered by a glass passivation (with 4% in weight phosphorus) undergone to
various temperature and relative humidity levels was used. The leakage current was
measured for an element with two 41lm width pads, with a voltage of 5Y
(corresponding to an electric field of 1.25J104Y/cm) applied between them. The
results of the measurements are shown in Table 12.3.
Table 12.3 Surface leakage current produced by humidity on a test structure Si/AI
Table 12.4 The effect of the humidity on the time till the pad interruption (that is 50% corrosion);
the pad has the width = 41ffi1 and the thickness = IIffi1
contains the results of an experiment about the corrosion till 50% of an aluminium
pad (width = 4/lm and thickness = l/lm). One may notice that the difference
between 25°C/20% r.h. and 80°C/80% r.h. is of about 6 magnitude orders.
The parameters influencing the behavior of plastic packages at intermittent
functioning may be sort out as follows [12.27][12.28]:
a) Parameters specific to semiconductor:
- technology,
- glass passivation.
b) Parameters specific to package:
- water penetration,
- water retention,
- contamination,
- pH value,
- conductibility
- transmission time of the glass,
- fixation capacity (die, lead-frame, filler).
c) Parameters specific to functioning:
- supply voltage (intensity of the electric field),
- conditions of the environment (temperature, relative humditity),
- "over-temperature" of the die or of the component.
Because of the large number of parameters, the lifetime of the plastic encapsulated
component is hard to calculate taking into account each parameter in a global
model. Therefore, Strohle [12.27] studied only the parameters not changing at the
intermittent functioning, by using an accelerated static lifetime test (e.g. the
85°C/85% r.h. test) and by using the resulting lifetimes (between lOOO and 2000
hours) as a basic factor. So, the intermittent functioning may be taken into account
as an accelerated factor (F), with the formula:
F = lifetime for operational conditions / lifetime for testing conditions. (12.1)
Strahle intended to study the influence of various parameters on the failures
produced by humidity and to use the results in a model for simulating the
operational conditions. But this goal proved to be difficult to reach and only models
for the "worst case" were obtained [12.29].
Accelerationfactor
10
Fig.12.S Dependence of the acceleration factor on the duty cycle, having as parameter the die
over-temperature [12.61]; test conditions: 85°C/85% r.h. (192 hours cycle)
352 12 Plastic package and reliability
The various acceleration factors depending on the die temperature, for the
85°C/85% r.h. test, are shown in Fig. 12.5. The main result is that for a plastic
encapsulated component, even if the overtemperature is of only 5°C, the small duty
cycles are a more hard stress that continuous functioning. The shape of the curves
(for all temperatures between 0 and 60°C) is almost independent on the duty cycle
value. For smaller values, the equilibrium state is reached (see Table 12.5).
Table 12.5 Relationship between the duty cycle and the equilibrium state (test conditions: over-
temperature of 20°C, duty cycle 0.15, 85°C and 85% r.h.)
12.5
Reliability predictions
The results from operational life cumulated by CNET (France), about the reliability
of bipolar ICs with a reduced integration degree, plastic encapsulated, have shown
an important difference between the graphic representations of the failure rate vs.
the number of gates, in accordance with MIL-HDBK-217 and the CNET data,
respectively [12.7]. As one knows, the American military organisations rejected the
plastic package on the basis of the experiments made in conditions of excessive
12 Plastic package and reliability 353
humidity in Panama. This explains the ratio of about 25 between the results from
MIL-HDBK-217 and those from CNET. The pessimistic predictions of MIL-
HDBK-217 about plastic encapsulation are not justified by the failure analysis of
integrated circuits, because the established nature of the defects is directly linked to
the mechanisms, which are rarely referring to the plastic encapsulation.
The 1992 version, called F, of MIL-HDBK-217 [12.32], offers a reliability
improvement with a factor of 3.5, but the experimental results [12.7] demonstrates
a 50 times decrease of the failure rate. It seems that MIL-HDBK-217F
underestimates the reliability of PEMs, used mainly as commercial devices. For
instance, after 12 264 hours of functioning, MIL-HDBK-217F predicts for com-
mercial devices 46 failures and for military devices 23 failures. Experimentally, 19
failures were observed for both types.
12.6
Failure analysis
First, the attention of manufacturers was concentrated on the wire bonds and on the
die-frame connections, which produced the highest failure percentage. Then, the
evaluation of the material for the package was made, such as a detailed study of the
various building methods for moulded packages.
Table 12.6 A history offailure rate improvements (in FITs) for plastic encapsulated ICs
As the solder joints and the connection wires are completely included in plastic
material, the moulded components are extremely resistant to vibrations and
mechanical shocks, even if a fracture (or a discontinuity) arises in the connection
wire. The two discontinuous elements remain held together as long as the moulding
environment continues to exercise the same compression force on these two parts.
This force has, however, the tendency to weaken the contact and, eventually, the
electrical connection is broken. But, as soon as the temperature changes, the contact
is restored. This is an intermittent contact. If the ambient temperature does not
restore the electrical contact, an open circuit arises. Such failures are generated by
354 12 Plastic package and reliability
12.7
Technological improvements
An important step in the process of manufacturing better plastic packages was done
by the British Telecommunications (BT) Labs. The main evaluation tool was ob-
tained by the invention, in 1968, at BT Labs of the technique called HAST (Highly
Accelerated Stress Test), in fact a non-saturating autoclave test. The work done at
BT Labs is synthesised by Sinnandurai [12.43] in 1996. Starting from 1968, a first
series of experiments were performed on bipolar transistors and on specially de-
signed moisture sensors (assembled onto ceramic substrates) [12.44]. These devices
were covered with 15 various plastic coatings. The experiments, made on 500 test
vehicles, last for 4000 hours, meaning 2 x 106 device hours, and showed that 4
silicone plastic encapsulants attain a life duration equivalent two 25 years in tropi-
cal climates, while the rest of 11 plastic encapsulants continued to be hazardous to
device reliability.
Meanwhile, various laboratories developed improved plastic coatings. To
passivate and mechanically protect the die, a silica glass was used with great care,
because the phosphorus concentration must not exceed 2% in weight. Otherwise, a
catastrophic increase of the aluminium corrosion appears [12.45]. RCA proposed a
technological improvement [12.4, 12.47], by replacing the aluminium layer with a
multilayer (titanium/platinum/gold) passivated with silicon nitride. In this system,
silicon nitride assures the junction hermeticity, the titanium layer improves the
adherence of the dielectric, platinum is a barrier-layer for diffusion, and gold
constitutes the conducting layer. The electromigration speed is 10 times smaller for
a gold layer than for an aluminium layer.
In the 80's, another series of tests performed by BT Labs used improved coating
materials. In fact, two materials (a silicone resin mechanically protected by a filled
silicone, and a silicone resin protected by a filled phenolic) demonstrated high
reliability properties, for a time period equivalent to 25 years in tropical climates.
This work was the basis for using plastic encapsulant for high reliability applica-
tions. These materials were subsequently tested and the obtained results confirmed
the high reliability properties.
The high reliability plastic encapsulant allowed to obtain low cost, high
perfonnance, plastic chip carrier, with the trade name EPIC, manufactured by
common printed wiring board techniques [12.51] The reliability of these
components was assessed in tests made also on the same die encapsulated in
various commercially available packages. The results obtained from a damp heat
test are presented in Table 12.7.
12 Plastic package and reliability 355
It seems that one type of SOIC and two types of SLCC demonstrated higher
performances than the hermetic package CerDIP. These results also indicate that
humidity tests must be used for hermetic packages too.
In another series of tests, the reliability of hermetic packages was compared with
that of plastic ones, under temperate climate conditions (steady and cycle damp-
heat). The results showed that hermetic packages had no lifetime advantages over
plastic package [12.17].
In the 80's, the idea to use silicone gels as plastic encapsulants for high
reliability ICs seemed an appropriate one. The IEEE Computer Society formed a
"Gel Task Force", with representative from 24 companies, to pursue this
opportunity, starting form the earlier initiative of BT Labs. This team evaluated
polymer gel coatings for IC. A number of 1440 IC chips, with specific test patterns
and protected by various glassivations, were tested with 5 gel types and one
silicone RTV coating [12.52]. The tests were thermal shock, salt spray and HAST.
The results are summarised in Table 12.8.
Table 12.7 Results of a reliability test program: high humidity testing in a non-saturating
autoclave (108°C, 90%RH). SOlC = Sml,lll outline IC package, SLCC = Silicone junction coated
IC, CerDIP = Ceramic dual-in-line package (hermetic)
The identified failure mechanism was the break of the wires. It seems that the
thick coating caused wires to be broken. Consequently, one must apply only thin
layers (about 25 J.llll thick). In fact, it is recommended to use three layers of thinly
applied gel (the "triple track" from Table 12.8).
In 1994, BT Labs evaluated the reliability of better gels as reported by IEEE Gel
Task Force, applied to PIN and laser diodes and GaAs IC. As accelerated test, they
used Damp heat (85°C, 85%RH). For PIN diodes, the gel seemed to assure high
reliability properties. Under the same test conditions, laser diodes had an
inconsistent behaviour, with a degradation of the performances. On the contrary,
gel coated GaAs IC showed a remarkable stability up to 6000 hours.
356 12 Plastic package and reliability
Table 12.8 Results of reliability tests performed by IEEE Gel Task Force
Initial characterisation 71 36 0 0
12.7.1
Reliability testing of PCB equipped with PEM
So far, only tests performed on the component level were taken into account. But a
necessary step towards the use of Plastic Encapsulated Microcircuits (PEMs) in
military systems is to test printed circuit boards (PCBs) containing PEMs. In 1993
July, US Air Force Electronics System Centre contacted DSD Laboratories Inc.
(Sudbury, Massachusetts, USA) to conduct an experiment about the possibility to
replace the existing electronic hardware of DoD systems with commercial hardware
built using best commercial practice. This was called "Commercial-Components
Initiative - Ground Benign Systems". An important chapter of this study was dedi-
cated to PEMs. The experiments were made on PCBs. A system operating in a
temperature controlled ground shelter (but with a cold storage requirement, at -
40°C) was chosen. Military ICs, hermetically encapsulated, were replaced with
equivalent commercial PEMs. The results, reported by Kross and Sicuranza [12.53]
were impressive. By replacing the military components with commercial ones, a
cost saving of 88% was obtained. The experimental hardware has run for 2 years
(1994-1996, over 107 device hours) without a failure. Each PCB (8 x 10inch2)
contained 75 ... 100 ICs, mostly digital technology. This experiment proved that the
operational reliability of the commercial devices is high enough for the use in
military systems.
12.7.2
Chip-Scale packaging
The term chip-scale package (CSP) entered the industry's lexicon in 1994 [12.57];
rigorously defined, the perimeter of such a package is no more than 1.2 times the
perimeter of the die it contains. These packages combine the best features of bare
die assembly and traditional semiconductor packaging, and reduce overall system
size, something devoutly to be desired in portable electronic products. CSP is still
12 Plastic package and reliability 357
taking its first steps into the marketplace; unresolved issues include reliability,
thermal performance, design, materials, assembly, test, shipping, handling and the
CSP-system interaction. This reflects the newness of the technology and the fact
that few CSPs are as yet in production or use. J-STD-012, the first US standard,
was put into effect in 1996, and has settled on 0.5-, 0.75-, and l.O-mm pitches
(pitch: separation between adjacent conductors). Materials for chip-scale packages
must meet at least three criteria: reliability, cost-effectiveness, and manufactu-
rability. Jet Propulsion Laboratory (Pasadena, California) and Institute jor
Verkstedsteknisk Forskning (MOlndal, Sweden) are currently evaluating the
reliability of CSPs from several international suppliers; data so far show improving
levels of reliability.
12.8
Can we use plastic encapsulated microcircuits (PEM) in
high reliability applications?
The improvement obtained in the reliability of commercial PEM led to the idea to
use such components in military systems. This approach, called the Acquisition
Reform, was possible only after careful tests performed on components and PCBs
(see 12.7). But the replacing of military parts with commercial ones implies also a
series of operations, having as a main tool the physics of failure approach, such as
[12.54]:
• investigating the utilisation environment and identifYing the operating stresses,
• identifYing the failure modes and, subsequently, the failure mechanisms and the
acceleration models,
• using accelerated tests to precipitate relevant failures,
• Analysing the failures, separating the populations affected by each failure
mechanism and evaluating the reliability level.
Only after performing such a cycle, PEMs can be used in military systems. But the
work is not finished. ELDEC Corporation developed a system for implementing
PEMs in high reliability applications [12.54]. The system contains three modules,
operating at the level of the design, procurement and assembling, respectively.
The design process control contains already used operations (part selection and
derating, functional and structural design, thermal design), but also the new ones,
based on the concurrent engineering approach. Even at the design phase, the
manufacturability, the quality, the reliability, but also the cost are taken into
account. The most interesting point is the design for reliability. Reliable prediction
methods may be used, because the reliability engineer must evaluate the reliability
of the future device. Two main approaches are to be used: the probabilistic and
deterministic reliability prediction, respectively.
The probabilistic reliability prediction uses models, such as MIL-HDBK-217,
based on field data from equipment failures, the acceleration factor is the same for
all components, and the failure rate is taken as a constant for all components (an
exponential model).
358 12 Plastic package and reliability
References
12.1 Hamill A. T. (1968): Westinghouse Goldilox integrated circuits offer military meeting in
plastic packages. DODINASA Industry Meeting on Plastic Encapsulated Semiconductor
Devices, Washington D.C., May 15
12.2 Baird, S. S.; Peattie. C. G. (1968): Present reliability status of plastic encapsulated semi-
conductors and evaluation of their potential for use in military systems. DODINASA In-
dustry Meeting on Plastic Encapsulated Semiconductor Devices, Washington D.C., May
15
12.3 Anixter B. (1968): Plastic semiconductors? DODINASA Industry Meeting on Plastic
Encapsulated Semiconductor Devices, Washington D.C., May 15
12.4 Flood 1. L. (1968): Reliability of plastic integrated circuits. DOD/NASA Industry Meeting
on Plastic Encapsulated Semiconductor Devices, Washington D.C.. May 15
12.5 Diaz R. P. (1968): Plastic-epoxy semiconductors. In: DOD/NASA industry meeting on
plastic encapsulated semiconductor devices, Washington D.C.. May IS 196K.
12.6 Fick S. R. (1971): Test of plastic encapsulated semiconductors. Research Report. Texas
University, May
12.7 Mc.Coog, J. R. (1997): Commercial component integration plan for military equipment
programs: reliability predictions and part procurement, Proceedings of the Annual Reli-
ability and Maintainability Symp., Jan. 13-16, Philadelphia. Pennsylvania (USA). pp.
100-110
12.8 Taylor C. H. (1976): Just how reliable are plastic encapsulated semiconductors for mili-
tary applications and how can the maximum reliability be obtained. Microelectronics and
Reliability. IS. pp. 131-134
12.9 Fehr H. G. (1970): Microcircuit packaging and assembly - state of the art. Solid State
Technology. August, pp. 41-47
12.10 Andre G. et al. (1972): Fiabilite des circuits integres it encapsulation plastique. Actes du
colloque international "Les compos ants de haute fiabilite", Toulouse. 6-10 Mars, pp. 143-
159
12.11 Andre G.; Regnault 1. (1972): Problemes de fiabilite lies it I'encapsulation plastique des
circuits integres. L'onde electrique, vol. 2, no. 3, pp. 121-125
12.12 Brauer 1. B. et al. (1970): Can plastic encapsulated microcircuits provide reliability econ-
omy? Proceedings of the International Reliability Physics Symp .. Las Vegas. pp. 61-72
12.13 Brauer J. B. (1972): military microcircuit packaging. The Electronic Engineer. July. pp.
30-31
12.14 *** (1973): Tests show epoxy IC packages to have reliability edge. Electronics. April 26.
p. 25
12.15 Hnatek, E. (1970): Plastic ICs entice military. Electron Device News. November 15. pp.
43-47
12.16 *** (1972): Epoxy B. National Semiconductor Corp.
12.17 Condra, L. et al. (1992): Comparison of plastic and hermetic microcircuits under tem-
perature cycling and temperature humidity bias. IEEE Trans. on Components. Hybrids and
Manufacturing Technology, vol. 15. no. 5, Oct. 1992, pp. 640-650
12.18 Weill, L. (1993): Reliability evaluation of plastic encapsulated parts. IEEE Trans. on
Reliability, vol. 42, no. 4. December. pp. 563-540
12.19 Hnatek, E. R. (1973): Epoxy package increases IC reliability at no extra cost. Electronic
Engineering, February, pp. 66-68
12.20 Feldt, E. 1.; Hnatek, E. R. (1972): High reliability consumer ICs. Proceedings of the IEEE
Reliability Physics Symp., pp. 78-81
360 12 Plastic package and reliability
12.21 Reich, B.; Hakim, E. B. (1972): Environmental factors governing field reliability of plastic
transistors and integrated circuits. Proceedings of the IEEE Reliability Physics Symp., pp.
82-87
12.22 Goarin, R. (1978): La banque et Ie recueil des donnees de fiabilite du CNET. Actes du
Colloque International sur la Fiabilite et la Maintainabilite, Paris, July, pp. 340-348
12.23 Lawson, R. W.; Harrison, 1. C. (1974): First Int. Conf. on Plastic in Telecommunications,
pp. 1-30
12.24 Hakim, E. B. (1978): US army Panama field test of plastic encapsulated devices. Microe-
lectronics and Reliability, vol. 17, no. 3, pp. 387-392
12.25 Schultz W.; Gottesfeld S. (1994): Reliability considerations for using plastic-encapsulated
microcircuits in military applications. Advanced Microelectronics Qualification / Reli-
ability Workshop, August, pp. 1-12
12.26 Bajenesco T. (1984): Microcircuits enrobes de plastiques: fiabilite en fonctionnement
intermittent. La Revue Poly technique, no. 1446, pp. 17-18
Biljenesco T. (1979): Problemes de la fiabilite des microcircuits bipolaires et MOS. Cycle
de conferences It l'Ecole Poly technique Federale de Lausanne, avril-mai
Bajenesco T. (1976): Quelques aspects de la fiabilite des microcircuits avec enrobage
plastique. Bulletin ASEIUCS vol. 66, pp. 880-884
Bajenesco, T. 1. (1976): Microcircuits: Fiabilite et contraintes. La Revue Poly technique
no. 11, pp. 1051-1095
Bajenescu, T. 1. (1997): A personal view of some reliability merits of plastic-encapsulated
microcircuits versus hermetically sealed ICs used in high-reliability systems. Proceedings
of the 8th European Symposium on Reliability of Electron Devices, Failure Physics and
Analysis (ESREF '97), Bordeaux (France), October 7-10
12.27 Strohle, D. (1983): Zuverliissigkeit von plastikverkapseIten LSIs bei intermitierendem
Betrieb. NTG-Fachberichte, Band 82, pp. 91-96
12.28 Bajenesco, T. (1981): Problemes de la fiabilite des composants electroniques actifs ac-
tuels. Masson
12.29 Strohle, D. (1981): Feuchtprobleme bei LSIs. NTG-Fachberichte, Band 77, p. 107
12.30 Sim, S. P.; Lawson, R. W. (1979): The influence of plastic encapsulants and passivation
layers on the corrosion on thin aluminium films subjected to humidity stress. Proc. of the
17th Annual Reliability Symp., pp. 103-107
12.31 Gardner, D. S. (1985): Layered and homogenous films of aluminium / silicon with tita-
nium and tungsten for multilevel interconnects. IEEE Trans. on Electron Devices, vol.
ED-32, no. 2, pp. 174-183
12.32 MIL-HDBK-2I7F, Notice I, Reliability prediction of electronic equipment, 10 July 1992,
US Department of Defense.
12.33 Khajezadeh, M.; Rose, A. S. (1977): Determination de la fiabilite des puces hermetiques
de circuits integres en boitiers plastique. L'Onde electrique, vol. 57, no. 3, pp. 206-212
12.34 Flod,1. L. (1972): Reliability aspects of plastic encapsulated integrated circuits. Proceed-
ings of the IEEE International Reliability Physics Symp., pp. 95-99
12.35 Fischer, F. (1970): Moisture resistance of plastic package for semiconductor devices.
Proceedings of the International Reliability Physics Symp., pp. 94--100
12.36 Peck, D. S.; Zierdt; C. H. Jr. (1973): Temperature-humidity acceleration of metal-elec-
trolysis failure in semiconductor devices. Proceedings of the International Reliability
Physics Symp., pp. 146-152
12.37 Kolesar, S.C. (1974): Principles of corrosion. Proceedings of the International Reliability
Physics Symp., pp. 155-159
12 Plastic package and reliability 361
12.38 Arciszewski, H. (1970): Analyse de fiabilite des dispositifs it enrobage plastique. L'Onde
electrique, vol. 50, no. 3, pp. 230-240
12.39 Gallace, L. J.; Khajezadeh, M.; Rose, A. S. (1978): Accelerated reliability evaluation of
trimetal circuit chips in plastic packages. Proceedings of the International Reliability
Physics Symp., pp. 224-228
12.40 Neighbour, F.; White, B. R. (1977): Factors governing aluminium interconnection corro-
sion in plastic encapsulated microelectronic devices. Microelectronics and Reliability, vol.
16, pp. 161-164
12.41 Paulson, W. M.; Kirk, R. W. (1974): The effect of phosphorus-doped passivation glass on
the corrosion of aluminium. Proceedings of the International Reliability Physics Symp ..
pp.I72-179
12.42 Parker, P.; Webb. C. (1992): A study of failures identified during board level environ-
mental stress testing. IEEE Trans. on Comp., Hybrid and Manuf. Tech., vol. 15, pp. 1086-
1092
12.43 Sinnandurai N. (1996): Plastic package is highly reliable. IEEE Trans. on Reliability. vol.
45, no. 2, June, pp. 184-193
12.44 Sinnandurai N. (1981): An evaluation of plastic coatings for high reliability microcircuits.
Microelectronics 1., vol. 12. pp. 30-38
12.45 Licari, J. J. (1970): Plastic coatings for electronics. McGraw-Hill Book Company, New
York
12.46 Lepselter, M. P.; Waggener. H. A.: McDonald, R. W., Davis. R. E. (197.\): Beam-lead
devices and integrated circuits. Proceedings of the IEEE, vol. 53. no. 4. pp. 405-409
12.47 Burkitt, A. (1975): Solid-state progress in circuits and devices. Electronics Engineering,
October, pp. 50-52
12.48 Baxter, G. K.: Anslow, 1. W. (1977): High temperature thermal characteristics of micro-
eletronic packages. IEEE Trans. on Parts, Hybrid and Packaging. vol. PHP-13. no. 4. pp.
385-390
12.49 Curran, L. (1970): Plastic ICs get foot in military door. Electronics. May II. pp. 127-132
12.50 Reich, B. (1970): Plastic semiconductor devices and integrated circuits for military appli-
cations. Solid State Technology, January, pp. 53-56
12.51 Sinnandurai N. (1985): EPIC: a cost effective plastic chip carrier for VLSI packaging.
IEEE Trans. Components. Hybrids. Manufacturing Technology. vol. CHMT-8. Sept.. pp.
386-390
12.52 Balde J. W. (1991): The effectiveness of silicone gels. IEEE Trans. Components, Hybrids,
Manufacturing Technology. vol. 14, June, pp. 352-365
12.53 Kross E. J and Sicuranza M. A. (1996): Commercial-Components Initiative: Ground
Benign Systems - Plastic Encapsulated Microcircuits. IEEE Trans. on Reliability, vol. 45,
no. 2, June, pp. 180-183
12.54 Condra L. W et al. (1994): Using plastic-encapsulated microcircuit in high reliability
applications. Proceedings of the Annual Reliability and Maintainability Symp .. Jan. 24-
27. Anaheim, California (USA), pp. 481-493
12.55 Demko E. (1996): Commercial-Off-The-Shelf (COTS): a challenge to military equipment
reliability. Proceedings of the Annual Reliability and Maintainability Symp .. Jan. 22-25.
Las Vegas, Nevada (USA), pp. 7-12
12.56 Fox, W. M. (1978): Semiconductor devices and passive components. The Bell System
Technical Journal, vol. 57. no. 7. pp. 2405-2434
12.57 Thompson, P. (1997): Chip-scale packaging. IEEE Spectrum, August, pp. 36-43
12.58 B1ljenescu, T. I. (1997): A personal view of some reliability merits of plastic-encapsulated
microcircuits versus hermetically sealed ICs used in high-reliability systems. Proceedings
362 12 Plastic package and reliability
of the 8th European Symposium on Reliability of Electron Devices, Failure Physics and
Analysis (ESREF '97), Bordeaux (France), October 7-10
12.59 Biijenescu, T. I. (1997): ZuverHissigkeit komplexer elektronischer Systeme. Sommerkurs,
Haus der Technik (Miinchen, Germany), July 14-16
12.60 Biijenescu, T. I. (1998): A particular view of some reliability merits, strengths and limita-
tions of plastic-encapsulated microcircuits versus hermetically sealed microcircuits util-
ised in high-reliability systems. Proceedings of OPTIM '98, Bra\=ov (Romania), May 14-
16,pp.783-784
12.61 Bazu, M. et al. (1984): Thermal and mechanical stress in rapid estimation of reliability.
Proceedings CAS 1984, pp. 257-260
12.62 Bazu, M. et al. (1985): SRER - a system for rapid estimation of the reliability. Proceed-
ings of 6th Symp. on Reliab. in Electronics RELECTRONIC '85, Budapest (Hungary), pp.
267-271
12.63 Bazu, M. et al. (1987): Failure mechanisms thermally and electrically accelerated .. Pro-
ceedings CAS 1987, pp. 53-56
12.64 Bazu, M. et al. (1987): Reliability of semiconductor components in the first hours of
functioning at high temperature. Electrotechnics, Automatics and Electronics, no. I, pp.
10-15
12.65 Bazu, M. et al. (1989): Behaviour of semiconductor components at temperature cycling.
Revue Roumaine des Sciences Techniques, no. 1, pp. 151-154
12.66 Bazu, M. et al. (1989): Rapid estimation of reliability changes for semiconductor devices.
Proceedings of Ann. Semicond. Conf. CAS 1989, October 7-10, pp. 399-402
12.67 Bazu, M. and llian, V. (1990): Accelerated testing of integrated circuits after storage.
Scandinavian Reliability Engineers Symp., Nykoping, Sweden, October
12.68 Bazu, M. et al. (1990): Rapid estimation of the reliability of a batch of semiconductor
components. Quality, Reliability and Metrology, no. 2-3, pp. 92-94
12.69 Bazu, M., Tiizliiuanu, M. (1991): Reliability testing of semiconductor devices in a humid
environment. Proc. Ann. Reliab. and Mainatin. Symp., Orlando, Florida, (USA), pp. 237-
240
12.70 Bazu, M. et al. (1997): MOVES - a method for monitoring and verifying the reliability
screening. Proc. of the 20th Int. Semicond. Conf. CAS '97, October 7-11, Sinaia, pp. 345-
348
12.71 Biijenescu, T. I., Bazu, M. (1999): Semiconductor devices reliability: an overview. Proc.
of the European Conference on Safety and Reliability, Munich, Garching, Germany, \3-17
September, Paper 31
13 Test and testability of logic ICs
13.1
Introd uction
Now, the electronic components are currently used in all activities. Technology
growths so rapidly, that manufacturers and users must accommodate an increased
importance to the components control. To make this, it is absolutely necessary to
have computer-assisted equipment. For the acquisition of such very modem
computerised equipment, high investments are needed, not only for the machines,
but also for personnel instruction, drawing up of software control and managing
programmes.
The testing problem can be formulated as follows: determine the input sequences
(or their lengths) so that these sequences - applied to the circuit - can allow to take
a decision from the output point of view: if the circuit is - or not - defective. The
test result allows to define if the circuit is failed or not (the detection). The
localisation bounds the failure inside of the component. The aggregate detection +
localisation is known under the name diagnosis.
The test of simple Logic Integrated Circuits (LICs), such as Small Scale
Integration (SSI) and Medium Scale Integration (MSI), can be performed without
difficulties utilising truth tables or patterns defined by an algorithm. For Large
Scale Integration (LSI) ICs the patterns must contain special perturbation tests that
can complicate the test programmes. If the complexity growths, the testers must be
more powerful (test more quickly to simulate the utilisation speed, more precisely
in timing, more versatile), so that, for example, for microprocessors an input brooch
can be transformed very quickly in an output brooch, and inversely.
The defect types from the population to be tested have a direct impact on the
quality and effectiveness of the test, also on the future quality, after test. Defects do
not necessarily influence the item's functionality. They are caused byflaws during
design, production, or installation. Unlike failures - which always appear in time,
randomly distributed -, defects are present at t = O. Table 13.1 tries to make a
classification of the defects based on their effects with the aim to better estimate the
possibilities of test methods.
A defect - in a component - is a physical imperfection that can involve an
inadequate operation (outside of specifications; nonconformity); a failure is a
defect symptom (sticking at 0 or at 1, short-circuit, etc.) which can be permanent or
intermittent.
13.2
Test and test systems
Excepting the ordered design, the users know neither the logic diagram, nor the
electrical scheme of the LIe. Sometimes, even the functional description is
incomplete. For LSI, VLSI or ULSI, the manufacturers don't wish to deliver the
scheme - for industrial property reasons. Finally, the user ignores a priori the most
probable defects. Under these conditions, they often utilise a manually generation
of the test programme through the exploitation of the function and initiate the
necessary corrections of the programme until he obtain a satisfactory test
effectiveness.
13 Test and testability of logic ICs 365
13.2.1
Indirect tests
After the manufacturing is finished, the logic board is tested. To develop the
functional programme test, it is necessary to have a model that simulate the LIC.
The main obstacle: the structural simulation models of LSINLSIIULSI are too dull
to be utilised by the automatic generation at the logical board level. In addition,
these models can vary from a LIC source to another. Consequently, one may use a
functional simulation model that fits the assisted manual generation methods.
Certain internal LIC defects are not affected by these programmes, and this is
more probable for the more complex LIC: an additional reason for prefering a very
severe test of LSINLSIIULSI before the mounting.
A method that eliminates the previous gaps would be to precede the functional
test by a direct test in situ. This means that:
• each complex LIC has a good direct testability
• the LIC to be tested - on the test board - is not influenced by the environment
(excepting the supply), the inputs and the outputs being logically switched-off
from the rest of the board.
To be noted that the dynamic marginally tests can be hobbled by the parasitic
connections (unlike the direct test).
Test systems
The choice of a test system depends of numerous parameters, such as the LIC
volume to be daily tested, the LIC variety, the variations of possible loads, the
variety of applications, the utilisation volume in each application, the usage
(evaluation, qualification of production, application type).
The evaluation and qualification needs aim to correspond to a universal testing
system. At the reception control ofthe user, we distinguish the following cases:
• small quantity of each type, great types variety, request of secure circuits - for
example in aeronautics (universal testing system);
• great quantity of a small variety of types, context simulator (a context
simulator is an adapted test system, capable of exclusively "exercise" LIC, as it
should be in operation, with marginal conditions);
• small quantity of a small types variety (external service).
13.3
Input control tests of electronic components
13.3.1
Electrical tests
The electrical tests verify the integrity of component and permit to compare the real
parameters limits with the specified limits of the technical conditions, for an
environmental temperature of +25°C. The cost of modem computerised test
equipment varies between 100 000 and 1 000 000$, depending on the desired test
level, on LIC complexity, and on user's needs. One may note that there is a direct
1 The more complex the IC, the longer the period of infant mortalities, and, consequently, the
burn-in duration must be longer. For example: SSI - 48 h; MSI - 96 h; LSI - 168 h; VLSI or
ULSI - 336 h. Configurations: A: high-temperature, reverse polarisation; B: high-temperature,
direct polarisation; C: high-temperature, reverse polarisation, loaded inputs (SSI); D: parallel
excitation (LSI); E: oscillator; F: extended temperature. Stress temperature: +125°C - for
ceramic package, +100°C - for plastic package, +70°C - for circuits having a high level of
electric power supply.
13 Test and testability of logic ICs 367
correlation between the confidenc~ level of the required tests and the equipment
cost. No system can work with an effectiveness of 100% because (a) the parameter
that causes the incorrect operation is not measured or specified; (b) LIC can
degrade or deteriorate after this test, as a result of incorrect manipulations or of
soldering conditions. Each user must decide if for his case the optimal solution is to
buy the equipment for all these electrical tests for the used components, or to make
appeal to the services of an independent test laboratory. It is important to notice
that the electrical test only detects and isolates the defective ICs induced by the
previous processes; it doesn't contribute to the reliability ofgood components.
13.3.2
Some economic considerations
Taking into account the continuous growth of the integration level of LIC, the user
must examine at least the following questions:
• why and how do I test?
• must I perform a 100% dynamic test?
• what is the evolution of the relationship between the test cost and LI C cost, if the
LIC complexity growths?
Generally, for resistors, capacitors, small signal diodes and small signal transistors
it is not economically desirable to perform 100% selection tests, burn-in tests
and/or electrical tests. On the contrary, the practical experience demonstrates that
for LIC, memories, microprocessors, hybrids, power components it is
reccommended to perform such tests at the components level. There is no universal
answer, for all types of users.
Although the LICs are tested several times during the fabrication cycle, a certain
(small) percentage of the delivered devices will fail to the user. This small
percentage can cause serious difficulties, especially for great systems. We know
that the probability of operation for an equipped board is determined by the
probability of good operation for the components. For example, a board with 100
ICs, (each IC having a 99% ~robability of good operation), will have a probability
of good operation of (99%)10 = 37%. This means that if only 1% of the utilised ICs
will fail, a third of the equipped boards will be defective during operation. This
example illustrates the great importance of the input control and of the control of
equipped boards, both controls being performed by the user.
To be persuaded about the need of a 100% input control at the user, one must
analyse: a) how much would cost the absence of an input control, and b) what
would be the minimum amount of ICs (bought by the user) that justifies a
remunerative 100% input control. For the first question, we suppose that the input
control is made with a tester that costs 25 000$; we suppose, too, that the operator
costs are 15 OOO$/year and that the amount of seconds/year is 7.106 . We presuppose
that we need 5s to test an IC. Therefore, the average test cost for one IC will be as
r
follows: (25 000 + 15000) x (7.10 6 1 = 0.0285$. If we allow, for example, an IC
failure rate average of 3% at input control (only as rough estimate example!), we
must spend 0.0285$ x 100 = 2.85$ to find 3 defect ICs from 100 ICs. We spend
also 2.85$:3 = 0.95$ to identifY each defect. In other words, an user that utilise
368 13 Test and testability of logic les
normally 100 000 ICs each year spends - due to the absence of the 100% input
control - 95 000$. The second question was: what is the minimum amount of ICs
(bought by the user) that justifies a remunerative 100% input control? To obtain the
answer, we utilise the following equation:
CuK = PI + KMN 1h (13.1)
where Cu is the unit cost of an defect IC (0.95$), K is the annual amount of tested
ICs, PI is the tester price (25 000$), M is the manual labour cost per hour (10$), and
Nh is the amount of ICs tested in a hour (say 500 ICs). With these values we reach
the conclusion that beginning with an annual amount of 27 000 ICs, the 100% input
control is justified.
13.3.3
What is the cost of the tests absence?
Table 13.2 Average indicative figures of the parameters A ....F and the unit cost for discrete
components, linear and digital ICs [13.5]
13.4
Lie selection and connected problems
The selection of LICs strongly influences the final price of industrial products. That
is why the user wishes to know if the operation of an IC is modified by the
presence of a physical defect. With the coming out of LSIsNLSIslULSIs having
non-repetitive structures (microprocessors, customer circuits), the problem became
more complicated, and the verification of the IC function needs more and more
computerised equipment's. Irrespective of the test method, the test principle is
always the following: at inputs is applied an input sequence, and - at outputs - it is
observed either a function depending of these values, or all the successive values
obtained at all outputs.
Initially the logic test was studied for combinatory circuits. The patterns for
these LICs are essentially of two types [13.6]:
• probabilistic: an ensemble of random input vectors is applied simultaneously to
the circuit to be tested and to a reference model (either material or simulated);
each different behaviour indicates an error.
• deterministic: the input vectors are determined by examining the circuit. This
second category regroups (with the exception of a manuaP method) the
functional methods (which take into account only the circuit operation) and the
structural methods (which examine the structure of the network that realise the
logical function of IC).
This last approximation can be divided into algebraic methods (which manipulate
equations concerning various components of the circuit) and heuristic methods
(which try to push the effects of a defect to the primary output, making a path into a
circuit), the path sensitive algorithms. Between the probabilistic and deterministic)
approximations we classifY the random generation: the input vectors are randomly
generated, and a simulation of the circuit permits to know the defects that the
sequence can detect.
All these methods - with the exception of the functional methods, which detect
errors - are based on the knowledge more or less fine of the circuit potential
defects, but few defect types are suited to an effective modelling.
Concerning the control and the different phases of the LIC lifetime [13.7], we
distinguish production controls at the end of the fabrication, performed by the
manufacturer, input controls made by the user at the reception of circuits or even
before the mounting on the boards, and maintenance controls, intended to detect
the defects due to a degradation that appear only after an operation period.
The defect causes are numerous and various: marking errors, deregulated
measure instruments, wrong programming, etc. Other defect category results from
2 Manual is not a method in itself, it is only a manner in which the method is applied.
) We make the difference between forms methods and test methods. The difference between a
probabilistic test and a random test don't exists at the level of forms generation, but only at the
level of output observation; for the random test we compare a reference with a defect circuit
(simulated or rea!), while for the probabilistic test we make statistical measures on the value that
the output should have.
370 13 Test and testability of logic ICs
the imperfections that appear by utilising chemical methods for circuit elaboration.
The major part of these defects are detected at the production control level. Some
of these will bring out degradations after long storage or operation periods.
For the detection of the anomalies appearing during the life of a component, two
types of tests are utilised: (a) parametric tests, static or dynamic (applied at the end
of component fabrication); (b) component logic tests - inside of a defect system -
at the end of fabrication or in maintenance; they verify if the logic operation of the
component is ensured in environmental conditions similar with the operation ones.
13.4.1
Operational tests of memories
The tests of memories [13.8] ... [13.15] - in other words the detection and the
localisation of defects - is a problem with an increased importance due to the rapid
evolution of the memory technology registered in the last decade. The needs differ
depending on the considered activities; we discern the following tests: (i) tests
during the fabrication; (ii) tests at the end of fabrication; (iii) formula tests; (iv)
maintenance tests. Generally speaking, the first two tests require detection and
localisation; the other two tests can be limited to detection.
Although the general LIC test principles and techniques can be applied to the
memories, the nature of memories grants to their tests the following particularities:
• The test method is, in general, sequential, but the test of other circuits is rather
combinatorial; in other words, the order of sequences depends on the obtained
results during the test [13.16]. This is explained by the test structure.
• The sequence generation is made on line, while for the other LC this generation
is accomplished prior to the effective test, with the aid of a general programme.
• The sequence synthesis with the aid of general programmes is not economically
desirable; a dedicated programme for a memory is preferable.
• The dynamic test (at real operation speed) is practically compulsory.
• The selection of test points is more restrictive, while the ideal test points are
even the memory points. The absence of direct access to these points or to the
addressing register outputs and to decoder is one of the principal causes of the
memories testing complexity.
The aim of the functional testing of a memory is to verify that:
• all the information can be written and preserved;
• all the read information in memory is correct;
• all the memory points operate correctly.
From the functional point of view, the defects of the memories have three
categories of effects:
• one bit data error,
• more bits data error,
• addressing error.
13 Test and testability of logic ICs 371
Most frequently, the programmes are oriented to detect the defects and to start a
series of writing, reading and comparing operations, but does not allow the loca-
lisation of defects (with rare exceptions). An universal test method of memories
doesn't exist, only an ensemble of instruments which have the tendency to mutu-
ally complete each other.
13.4.2
Microprocessor test methods
13.4.2.1
Selftesting
13.4.2.2
Comparison method
This method requires the utilisation of a tester composed of two groups of drivers
and detectors, two supports, a memory, and a reference microprocessor. The
reference microprocessor works in the same manner as in the selftesting method,
but the information entering into the reference microprocessor are also sent to the
product to be controlled, and afterwards the information leaving the two
microprocessors are compared.
This method has the same advantages as the selftesting method, with the
difference that the first noticeable default can be detected immediately, at each
instruction cycle. Nevertheless, the tests speed is determined by the response time
372 13 Test and testability of logic les
of the reference microprocessor; sometimes defaults that have not existed, appear if
the two components have different speeds. Once again, the microprocessor
characterisation and the analysis of the microprocessor are very difficult, the
environment is very difficult to be simulated, and the reference microprocessor
must be of very high quality, with the aim not to be degraded during the tests (but
this is not enough; this is the more general problem of the dependence of a
microprocessor "reputable good").
13.4.2.3
Real time algorithmic method
13.4.2.4
Registered patterns method
This method includes two independent stages. In the fIrst one, the microprocessor is
simulated with the aid of a minicomputer, a RAM or a PROM; each simulated
response can be identifIed and associated with the corresponding instruction. The
whole is controlled and sent in a buffer, at defIned periods. The content of the
buffer is then saved on disc or on magnetic tape. In the second phase, the patterns
are loaded in the buffer, and then transferred to the controlled microprocessor.
Advantages
• easy to implement;
• a certain smooth test;
• possibility of parametric measurements;
• turned right up into most sophisticated and universal testers.
13 Test and testability of logic les 373
Disadvantages
• requires a great buffer for pattern transfer;
• for each change, even minor, the phase one must be repeated, and this require a
new simulation;
• the simulation of the interruption is not possible;
• requires an important software support.
13.4.2.5
Random test of microprocessors
The philosophy of this microprocessor test - made by users and often by the
circuits designers and manufacturers - lies in the fact that the proposed test is a
comparison test using a renowned "good" microprocessor as reference micropro-
cessor. The microprocessor to be tested is dynamically compared to verify the
parametric and functional coincidence with the reference circuit. At the beginning,
the two circuits are initialised in a state known by a short programme. Afterwards
they are supplied by a pseudo-random signals generator that - in fact - generates
instructions for the "good" microprocessor. During this random sequence of
random tests, the functional integrity of the unit under test is verified in relation
with the reference unit supposed to be a "good" one. The coincidence of the input
and output signals must exist for the two units; if not, an error is indicated.
The basic philosophy is to execute random instructions concerning random data.
The programme generated by the pseudo-random signals generator is - from the
syntax point of view - correct, but has no semantic signification. He is written in
the language of the unit to be tested. This little effective approach is weak, since the
execution of each instruction in a microprocessor greatly depends on the previous
instructions. The internal elements (the counters) which demand a sequence of
instructions to reach a state must be tested by the initialisation programme which
contains: patching tests to "0" and "1 ", increments and decrements of the registers
and stacks. The available specimens of "good " microprocessors - having known
defaults - have been tested with this tester and the faults have been detected after
one to five seconds with the aid of this method.
The method does not guarantee any degree of test effectiveness and quality.
In conclusion, the user can test microprocessors if he shares correctly out his
tests between random and sequential signal generation diagrams, and this depends
on the specific architecture of the tested microprocessor. Nevertheless, a theoretical
study should be made to better assess the test confidence.
13.5
Testability of LIes
In connection with all these problems, the testability notion must be taken into
account. The testability is defined as the easiness to guarantee by test an objective
quality. This notion concerns a LIes population of a chosen type and is linked not
only to the manufacture, but also to the maintenance; the objective quality
374 13 Test and testability of logic les
(13.3)
where nl is the number of elements eliminated during the test (elements which
should be defective in use), and nd is the total number of population elements
(which would be defective in use).
The objective quality corresponds to the minimal proportion of elements which
would be "good" in use, in the population, after the test. The testability can be
defined as being reversely proportional to the test cost, brought back to the piece,
which would permit to reach the objective quality, starting from a given upstream
quality. The cost depends on the reported test duration and investments.
13.5.1
Constraints
13.5.2
Testability of sequential circuits
The test of combinatory circuits is not a difficult problem; on the other hand, the
sequential circuits are bringing the major part of the difficulties. Fortunately, it is
possible to preserve the testability of the last ones considering the following
recommendations at the design level:
4 These recommendations are not often admissible by the designer, since the circuit would be too
great or too expensive. In particular, the accessibility of internal states is rarely attainable.
13 Test and testability of logic les 375
The difficulty in generating tests for sequential circuits [13 .17] ... [13 .19] arises
from the poor controlability and observability of memory elements; the direct
impact on them is caused by the number of feedback and sequential depth of the
memory elements in circuits. Some algorithms describe a sequential circuit as a
directed graph; by selecting properly the scan flip-flops to eliminate the circles in
graph as many as possible, the testability of circuit can be increased and the time
spent on test generation can be shorted. A special algorithm [13.19] can efficiently
mitigate the contradiction between high fault coverage and the required extra chip
area.
13.5.3
Independent and neutral test laboratories
The American experience has shown that for the little and medium enterprises the
quality problems can be better solved by a small group of specialists from the
enterprise and/or by an external test laboratory, independent and neutral. The
respective co-operative works concern normally the ULSI, VLSI, and LSI Ies
characterisation studies, the comparative reliability studies of a certain product
realised by several manufacturers, and all the dynamic input controls for electronic
components.
The required knowledge concerning the measuring techniques, the quality and
the reliability of the high integrated active components, and the writing of the
necessary software programmes and of the test specifications is so complex, that
only an important group of engineers and technicians highly qualified in various
fields, and having a long experience can solve all the problems.
During the last years, the gravitation centre of the input control moves more and
more towards reliability and screening tests, with the statistical analysis of data.
We can resume the arguments that plead for indepenent and neutral test insti-
tutes as follows:
• These institutes are entirely dedicated to the semiconductor tests; that is why
they better understand the respective problems.
• The institutes offer some useful additional services (failure analysis, statistical
processing of data, etc.).
• The costs are minimised.
But there are, also, some arguments against the utilisation of these institutes, such
as:
• How the user can be sure the screening, the bum-in and other tests have been
performed on a 100% basis?
• How the user can determine - on the basis of noticed failures - the quality and
the reliability of the provided batch?
376 13 Test and testability of logic les
13.6
On the testability of electronic and telecommunications
systems, and on international standardisation
c) Visibility. The test sequences generated by the subtest circuits, and exposed to
some stimuli, and then directed to the test circuits. The visibility points are output
points of the elementary functions.
As the possibilities of physical access have the tendency to diminish, it is necessary
to implement a testability bus to have access to the elementary functions and to
perform a sequential access to a great number of points, with a minimum of
connection points. The testability bus permits the access to 4096 control and
visibility points, with 25 parallel addressing broaches and 5 series addressing
broaches. This permits a standard testability diagnosis plug:
design tools can only transferred on the chip (Fig. 13.1) with a clever combination
of intellectual property.
ASIC complexity
(Mia. transistors) Technologv: complexity
t ?
increases with 32% per .I'mr
50
8 ~ Time
'95 '98 '01 '04 '07 '10 (year)
Fig. 13.1 The productivity gap between the expected chips and the design tools can be transferred
on the chip only with a clever combination of intellectual property. (Source: Sematechi
References
13.1 Robach. C. (1978): Le test en production et en exploitation (2eme partie). Research report
RR no. 132. National Politechnic Institute. Grenoble (France). August
13.2 Piel. G. (1978): La testabilite des circuits integres logiques vue par I·utilisateur. L 'Ondc
electrique. vol. 58. no. 12. pp. 830-835
13.3 Robach. C. (1978): Le test en production. Session de perfectionncmenL National
Politechnic Institute. Grenoble (France): fevrier
13.4 Hnatek. E. R. (1975176): Costing in-house vs outside testing. Electronic Production. Dec.
1975 / January 1976. pp. 9-11
13.5 Bajenescu. T. I. (1996): Fiabilitatea componentelor electronice. (Reliability of Electronic
Components). Publishing House Editura Tehnicii. Bucharest (Romania)
13.6 Caillat.1. (1976): Contribution au test des circuits integres logiques. Ph. D. Thesis. Institut
Poly technique de Grenoble (France). October 8
13.7 Thevenod-Fosse. P. (1978): Contribution a I'etude du test aleatoire des circuits sequentiels
et des memoires. Ph. D. Thesis. National Politechnic Institute. Grenoble (France).
February 15
13.8 Girard, E. et al. (1974): Le test fonctionnel des memoires. Revue technique Thomson-
CSF, vol. 6. no. I. pp. 217-227
13.9 Dumitrescu. D.: Saucier. G. (1975): Test de memoirc dynamique it technologie MOS.
Proc. of internal. symp. on fault-tolerant computing, Paris. JUIlt:
13.10 Marshall. M. (1976): Through the memory cells - further exploitation of Ie's in
Testingland. EDN, 20.2.1976, pp. 77-85
13.11 Bollen, H. (1978): Wichtiger denn je - der Test von LSI-Bauclementen. Elektronik. no.
13. pp. 77-80
13.12 Muehldorf, E. I. (1976): Designing LSI logic for testability. Proc. of IEEE Semicond. Test
Symp., pp. 45-49
380 13 Test and testability of logic ICs
13.13 Robach, Ch.; Saucier, G. (1972): Le test logique des circuits integres. L'Onde electrique
vol. 54, no. 12,pp. 842-849
13.14 Davison, C. (1978): The testing of modern memories. L'Onde electrique vol. 58, no. 5, pp.
39~00
13.15 Fosse, P., David, R. (1977): Random testing of memories. Informatik-Fachberichte,
Springer Verlag, vol. 10, pp. 139-153
13.16 Rault, J. C. et al. (1972): La detection et la localisation des defaults dans les circuits
logiques: principes generaux. Revue technique Thomson-CSF, vol. 4, no. 1, pp. 49-88
13.17 Kwang-Ting Cheng et al. (1990): A partial scan method for sequential circuits with
feedback. IEEE Trans. on Computers, vol. 39, no. 4, pp. 544-548
13.18 Gupta, R. et al. (1990): The BALLAST methodology for structured partial scan design.
IEEE Trans. on Computers, vol. 39, no. 4, pp. 538-543
13.19 Bo, Y. et al. (1995): Testability design for sequential circuit with multiple feedback. Proc.
The fourth internat. conf. on solid-state and integrated-circuit technology, Beijing
(China), Oct. 24-28, pp. 208-210
13.20 Bajenescu, T. I. (1993): Wann kommt der nachste Uberschlag? Schweizer Ma-
schinenmarkt (Switzerland), no. 40, pp. 74-81
13.21 Bajenescu, T. I. (1998): On the spare parts problem. In: Proceedings of OPTIM '98,
Brasov (Romania), May 14-15, pp. 797-800
13.22 Bajenescu, T. I. (1998): The Monte Carlo method and the solution of some reliability
problems. Proceedings of the Symp. on Quality and Reliab. in Information and
Commmunications Technologies RELINCOM '98, Budapest (Hungary), September 7-9
13.23 Bazovsky, I. (1961): Reliability Theory and Practice. Prentice-Hall Inc., Englewood
Cliffs, New Jersey
13.24 Bazowsky, I., Benz, G. (1988): Interval Reliability of Spare Part Stocks. Qual. Reliab.
Engng. Int.,no.4,pp.235-246
13.25 Bazu, M. et al. (1983): Reliability data bank for semiconductor components. Proceedings
of Ann. Semicond. Conf. CAS 1983, October 6-8, pp. 35-38
13.26 Bazu, M. et al. (1983): Accelerated tests for evaluation of semiconductor component
reliability. Electrotechnics, Automatics and Electronics, no. 1, pp. 19-25
13.27 Biizu, M. and llian, V. (1990): Accelerated testing of integrated circuits after storage.
Scandinavian Reliability Engineers Symp., Nykoping, Sweden, October
13.28 Birolini, A. et aI. (1989): Test and Screening Strategies for Large Memories. Proc. 1st
European Test Conf. 1989, pp. 276-283
14 Failure analysis
14.1
Introduction [14.1] ... [14.25]
This delicate and, in the same time, interesting subject will be presented from the
viewpoint of a simple user of components. The emphasis will be not on the solid
physics, but on the adequate design of electronic systems or on their manufacturing
on an industrial scale. The scanning electron microscope, for instance, will be
mentioned only incidentally, while the habitual analysis means (such as: electrical
measuring, optical microscopy and chemical procedures) will be currently used.
For the failure analysis, the "bad" components have a higher value than the
"good" ones. If a component was incorrectly used, the system designer must review
his project or to choose another component. On the contrary, if a weakness of the
component is found, a detailed analysis is required to the manufacturer, leading to
corrective actions for the elimination of the failure causes.
The failure analysis starts from the failure mode (the symptom by which one
may observe a failure, such as: shorts, open circuits, excessive high leakage
currents, changes in the resistor values, degradation of the response time or of the
frequency dependent parameters) and leads to the identification of the failure cause
andfailure mechanisms (e.g.: breaks or cracks of the die, intermetallic compounds,
oxide defects, pinholes, contamination, metal migration, short-circuiting of the
oxide or dielectric layer, "bad" solders, overcharges due to incorrect use, open
circuits, missalignments, chemical reactions at the level of the metal/semiconductor
contact area, metal corrosion inside the package, etc).
The failure analysis must discover the failure roots. Only if the failure causes are
known and the failure mechanism is elucidated, the necessary measures can be
taken. One must understand that the failure analysis is long and costly. For
component malfunction arising during tests, the following main causes may be
identified: deficiencies in component manufacturing, incorrect use, incorrect
mounting, overcharging, and testing errors.
By failure analysis, an early knowledge of the problems linked to the
components is achieved, and in the meantime, the efficiency of the measures for
reliability improvement is verified. The obtained results are used for establishing
the failure sources and for their avoidance, both to the component manufacturers
and to the users. A user must clearly state if the failure was produced by his
deficiencies or by component manufacturing errors. The claims insufficiently
defined lead to incomplete answers (if any reaction is produced) [14.26].
The constitutive elements of a failure analysis are:
The discrete components, but especially the integrated circuits, contain a high
number of connections. The connections have an important contribution to the
failure due to their complexity and to the mechanical, climatic and electric stress
that they have to lUldergo.
The analysis shows that the contacts also give a high percentage of the failure
causes. So, a comparison between high frequency transistors vs. bipolar integrated
circuits leads to the conclusion that for high frequency transistors the failure is
produced by high currents and temperatures at the contacts, while for integrated
circuits the main failure cause is the high number of internal connections. If we
discuss the well-known "bath-tub" curve, a high contribution of the contacts is
found in the early failure period, but also in the wear out region. In this last case,
one may explain this by: material fatigue, interdiffusion of neighbouring metallic
layers, corrosion, electromigration or thermomigration.
The LSI integrated circuits (memories, microprocessors) ask important questions
on the defects examining and failure analysis. For microprocessors, the localisation
of the defect on chip is not always possible and sophisticated techniques must be
used, such as: nematic liquid crystals, stereoradiography with X-rays, neutron
radiography, Nomarski microscopy, instruments with electron and ionic beam,
mass spectrometry for residual gas analysis, etc. These powerful (and expensive)
instruments of analysis demand highly qualified personnel.
14.2
The purpose of failure analysis
14.2.1
Where are discovered the failures?
The failures may be suppressed only if their causes are known. Consequently, the
analysis of the failures is an important source of information for discovering the
"weak points" and taking the necessary measures for eliminating them. Depending
on the place (on the manufacturing stream of an electronic system), where they
384 14 Failure analysis
were discovered for the flrst time, the component failures may be divided as follow
[14.22]:
a) At the input control
The experience shows that at this place a characteristic failure rate is established for
each component. The testing laboratory must be informed about any deviation from
the normal behaviour, possibly by requiring a failure analysis. If this analysis
emphasises a serious failure, the whole batch will be returned to the supplier.
b) In the testing laboratory
The failures of the components are generally isolated, the causes being identifled
on the basis of previous experience. If a threshold of 1.5-2% failures is exceeded,
an alarm signal is pulled. The testing laboratory must give them a priority in
analysis, because an interruption of the manufacturing arises. The possible failure
causes are defects in component manufacturing, incorrect manipulation, incorrect
use.
c) In the development phase
During the development of the product, the components undergo high stresses
(circuit and mounting changes, inversion of the bias, too small supply voltages).
Such defects will be automately eliminated from the list of failure analysis. Taking
into account the remaining failures, on may identify in due time the weak points of
a component and the necessary measures for their elimination can be taken.
d) At the user o/the system
Here the differences in failure rate are assessed. In the case of an obvious deviation,
a careful analysis of the failures must be made. But an analysis of fleld failures is
difflcult to make because sometimes the related information is lost.
In most cases, as a consequence of the failure analysis and with the aid of the
manufacturer, it is possible to find a convenient solution for both sides. For
subsequent analyses and decisions, it was proved to be useful, to classify the
failures depending on the type of component.
It is recommended to perform simultaneously tests on components originating
from various manufacturers, in the purpose to compare directly the behaviour of the
components. Usually, a sample of 15-30 items undergoes electrical, mechanical
and/or thermal stress, for each component the input electrical characteristics being
measured, at given time period. Based on these results, taking into account other
parameters and the comparison between various producers, the necessary decision
is taken. A trend towards a diminution of the number of component types must be
promoted in any company.
14.2.2
Types of failures
The failures encountered at the user are relatively less numerous [14.1] and arise
from:
• utilisation problems (circuit overcharge, voltage, power),
• mechanical problems (assembling, and those referring to the aluminium),
• crystallographic structure problems (crystallographic defects, parasite
diffusions),
14 Failure analysis 385
Package
opening Plastic
X-ray ... YES package
inspection
Internal
visual
1 NO
NO
Identified Advanced
failure investigations
mechanism? (SEM, TEM, etc.)
14.3
Methods of analysis
Following the fact that lately the reliability of integrated circuits improved
significantly, new methods of analysis for the assessment of the "good" circuits
from the new technologies and for establishing the weak points of the
manufacturing were developed. The results are used for establishing the sources of
failures both by the manufacturer and by the user.
First, full analyses for establishing if the foreseen limits from the sheet data are
overreached. With the aid of this electrical analysis, the failure rate is calculated, by
correlating the identified and the original failures. In Table 14.1, the possible steps
for a failure analysis are presented and in Figure 14.1, an image on the utilisation
mode for the analysis methods is shown.
Table 14.1 Working plan for a failure analysis for semiconductor components
14.3.1
Electrical analysis
in the inputs of IC, the quality deviations of the components, the critical areas of
the electrical parameters, etc.
For SSI integrated circuits, it is possible to determine the areas with defects on
the die, by combining the different characteristics and based on previous
experience, possibly with the aid of thermal tests and, if necessary, with the
mechanical ones.
By using other electrical tools, I-V and C-V curves, at various temperatures,
may be obtained and form these curves relevant information about the device is
acquired. Experiments made by Papaioanou [14.41] for an ultra fast diode lead to
activation energy of O.66eV (for the main failure mechanism) extracted from a
reverse saturation current vs. temperature plot. From C-V curves, the carrier profile
may be obtained.
Another technique, currently used is Deep Level Thermally Stimulated (DLTS),
allowing characterising the deep level from the forbidden band. From DLTS
spectra, data about the traps can be obtained. For a case study (irradiated ultra-fast
diodes), performed by Papaioannou for the Phare/TTQM project R09602-02-02-
LOOl/28.10.1997, DLTS data are presented in Table 14.2 [14.41].
14.3.2
X-ray analysis
X-rays are used for the location of foreign material (in any packages), broken wires
and die attach soldering defects prior decapsulation. The most frequently used tool
is the Faxitron 43804 produced by Hewlett Packard.
388 14 Failure analysis
14.3.3
Hermeticity testing methods
14.3.4
Conditioning tests
Various tests may be used to evidence the failures. Mechanical shock, vibration or
high-temperature storage allows re-creating an intermittent failure. High
temperature storage (16h, 150°C) make mobile the electrical charge produced by
ionic contamination at the interface oxide / semiconductor, curing often the device.
14.3.5
Chemical means
corrosion head (with sulphuric acid). The pressure force is controlled in the purpose
to release with the aid of hot sulphuric acid (290°C) the solder and contact points.
After opening, the package is processed with acetone, deionised water, ultrasounds
and dry air [14.40].
One must know that a cause of semiconductor device degradation, both at
storage and functioning, is the chlorine impurifying of the package atmosphere.
This may be produced at the die manufacturing or at the mounting. In many cases,
the moulding press is the cause. It has been shown [14.35] that the impurifying of
the package atmosphere with chlorine may arise when the semiconductor devices
are treated with some chlorine solvents.
14.3.6
Mechanical means
14.3.7
Microscope analysis
After enough information has been extracted by electrical analyses, the packages
are open and, with the aid of the metallographic microscope, one may go on.
Usually, X50 or XI00 pictures are. taken. Only if the position of the defect in the
functioning scheme of the integrated circuit was clearly identified, one may draw
conclusions on the external failure causes.
The hot spots (areas where a dielectric breakdown produced a short between
gate and source of a power FET, for instance) may be detected with Liquid Crystals
(nematic or cholesteric).
14.3.8
Plasma etcher
14.3.9
Electron microscope
The first commercial Scanning Electron Microscope (SEM) was built in the mid
60's [14.31] and was immediately used for semiconductor technology. In the field
of quality control and failure analysis, SEM is an appropriate tool (Fig.l4.3 ... 14.6).
The interest to develop non-destructive methods imposes the use of SEM in
industry. Comparing to optical microscopy, at an enlargement of 1000 times, the
390 14 Failure analysis
resolution is 100 times better, and the field clarity, with an order of magnitude. In
the Table 14.3 [14.29], some of the possibilities of SEM are presented. As one can
see, apart from surface analyses, researches referring to the failure causes and
failure mechanisms for integrated circuits (e.g. masking defects, microbreaks at the
contact windows or forming of alloys at the contacts) can also be made. As
examples of the most usual failure mechanisms discovered with the aid of SEM, the
following can be mentioned: formation of intermetallic phases at the contact
between different materials, corrosion of the metallic leads, migration of the
material due to the high current densities on the leading paths. For instance, if
integrated circuits for semiconductor memories must be reliably measured, one
must know their organisation and topology. A manufacturer offers rarely this kind
of information and, therefore, SEM may be used.
If the failure mechanism is not discovered with the aforementioned means, some
other tools must be used. Examples of such tools allowing knowing more about the
failure cause are [14.42]:
• Methods using secondary electrons, back-scattered and Auger electrons (Auger
electron spectroscopy, Scanning Auger microprobe, Transmission electron
microscope, Transmission electron energy-loss microscopy, Low-energy
electron diffraction)
• Methods using electron-induced photon emISSIOn (Electron probe
microanalysis with X-ray, Appearance potential spectroscopy)
• Methods using photo- and Auger electron emission (Electron spectroscopy for
chemical analysis, X-ray induced Auger electron spectroscopy, Ultraviolet
photoelectron spectrometry)
• Methods using sampling by laser-induced emission (Atomic absorption
spectroscopy, optical emission spectroscopy)
• Methods using fluorescence and reflection (X -ray fluorescence spectrometric
analysis, Light microscopy IR, UV and Raman scattering, Laser optical
spectrometry)
• Methods using ion scattering (Ion scattering spectrometry, Rutherford back-
scattering spectrometry, Neutron activation analysis).
It is important to note that these special means (and others) must be used only if the
failure mechanism was not yet identified. With other words, the special means must
be needed by the logic of the failure analysis. A tendency to embellish the failure
reports by adding "beautiful" results (impressive pictures or diagrams obtained with
sophisticated tools) is encountered all over the world. Such a technique may be
used for making a scientific paper more convincing, but it is also the most powerful
argument for increasing the price required for a failure report. The customer must
be aware that sometimes expensive reports are not the best ones.
14 Failure analysis 391
Fig. 14.4 Detail from Fig. 14.4, at a Fig. 14.5 Contact of a connection wire
higher enlargement
392 14 Failure analysis
Table 14.3 Examples for the usage of a Scanning Electron Microscope (SEM)
Field Usage
Semiconductors: failure • The formation of the alloys at the solder joints (e.g. the phases
analysis AU/AIISi).
Displays with liquid crystals • The surface of the glass and of the electrode-layers after chemical
and optical fibers treatments.
• Surface impurification.
• Resistive paste.
• Mechanical modification.
14.4
Failure causes
• process errors,
• process variations,
• mounting and manipulation errors (user / manufacturer),
• wrong usage.
The failures may arise also form the insufficiencies of the testing technique, such
as:
• weak reaction from the user to the manufacturer.
• old-fashioned testing technique.
• nonflexible testing system,
• renunciation to the expensive testing methods.
• non-adjustement of the testing system to the user's requirements.
To avoid the wrong conclusions, the failed components must be analysed
separately for types of tests, because it is important to know [14.28] if the failure
in an humidity test is due to the penetration of the moisture inside the package or
due to other failure mechanisms.
To accustom the reader to some current problems of the failure analysis, in the
following, many cases of failed components, based on rich illustrated examples,
will be analysed. The observation of these cases will be useful especially for
young engineers and technicians. The experience shows that from total of failures,
the utilisation ones have a high weight. If these defects are excluded, for the rest
of the defects the following distribution has been found the distribution given in
Fig.l4.7.
crystalographic defects
die defects
oxide defects
other defects
connection defects
~ii5IiIIIlilllBll!iIi!!Illil!&!aliilllilllBliI:iEiIlillllllll!lillllllll!lIilllBlSllil_IllE!!:ll2!ElIIIIIIIIIII&Zaii metallisation defects
o 5 10 15 20 25 30 35 40 45
As one can see, the crystallographic defects are relatively rare ones. Generally,
the components with such defects are removed by the input control, at the
mounting stage or, eventually, at the first testing of the circuit.
394 14 Failure analysis
14.10.1
Electrical overcharge
Case 1: Case 2:
Fig.14.8 TTL Integrated circuit 944. Fig. 14.9 DTL integrated circuit 9936,
Overcharge of an extender input good at the input control, butfailed at the
control of equipped cards (pin 13
interrupted) . By oppening the case, the
path was found to be melt and the input
diode shorted
Case 3: Case 4:
Fig. 14.10 Integrated circuit 936. Fig.14.11 DTL integrated circuit 9946,
Electrical overcharge: pads of the defect at electrical control of equipped
output transistors are melted cards (inputs I and 2 overcharged)
14 Failure analysis 395
Case 5:
Fig. 14.11 Optocoupler: the failure mode
is an open circuit of the phototransistor;
the emitter solders are interrupted.
Because the optocouplers passed by a
100% electric control, it seems that no
mechanic defects occured. To reach the
aluminium pad (leading to the emitter
windows), the glass passivation layer was
removed and the failure mechanism was
discovered: the metallisation surrounding
the emitter area was burned by a
overcharge current produced by the
scratch of the pad during the
manufacturing process. Only a small
portion of the pad remains good,
allowing the passing of the electric
control. When the optocoupler was used,
the pad was burned and the failure
occured
14.10.2
Mechanical defects
Case 6:
Fig. 14.12 Aluminium and oxide removal
during ultrasound solder
396 14 Failure analysis
Case 7:
Fig. 14.13 Local damage of the
protection layer during ultrasound solder
Case 8:
Fig. 14.14 TTL Ie 7410: Two inputs are
found defect at electrical functionning
control of equipped cards. The silicon
was broken under the contact zone (a rare
defect, produced by an incorrect
manipulation during manufacturing
process
Case 9:
Fig. 14.15 Local removal of aluminium
at testing, bellow a thermocompression
area
14 Failure analysis 397
Case 10:
Fig. 14.16 Break of an aluminium wire
(ultrasound bond)
Case 11:
Fig. 14.17 Crack in a crystal
Case 12:
Fig.14.18 Break of a die
398 14 Failure analysis
14.10.4
Bad centered solder joints
Case 13:
Fig. 14.19 TIL IC 7400 (XI70): Output
8 is defect at the electrical control of
equipped cards. One may notice the
shortcircuit between the contact wires
soldered at pin 8 and 7, respectively
Case 14:
Fig. 14.20 Failures of diodes after a test
at temperature cycling [14.34] . Causes:
wrong centred dies and wrong aligne-
ment at diodes mounting
14 Failure analysis 399
14.10.5
Contact windows insufficiently open
Case 15:
IC TTL 7475 (nip-nop with complementary outputs. The normal operation was observed
only for temperatures between 25 and 40"C. At temperatures higher than 40"C, the output
level is instable. The phenomenon is produced by the contact windows insufficiently open
at the open collector output transistors. (Fig. 14.21...14.23 Metallised dies. Fig. 14.24 Dies
with metallisation removed.)
14.10.6
Electrostatic discharges
Case 16:
Bipolar LSI IC type HAl-4602-2:
electrostatic discharges. There are no
differences between the handling
precautions for bipolar and MOS ICs,
because both categories are sensitive to
electrostatic discharges. SEM pictures
show the areas affected by electrostatic
discharge (Fig. 14.25... 14.27)
Fig. 14.25
Case 17:
Partial vue of the metallisation layer of a ROM die, longitudinal section
(Fig. 14.28... 14.31)
14.10.8
Catalogue sheets with incomplete data
Case 22:
Hybrid circuit voltage regulator with power transistor at the output. Melt connection at the
emitter of power transistor. This failure mecanism may be avoided if the manufacturer does
not forget to specify in the catalogue sheet that at the regulator input a capacitor with good
high frequency characteristics must be mounted (Fig. 14.36... 14.38)
14.10.9
Bad quality solder joints
Case 23:
Small signal transistors with wire bonding defects
14.10.10
Open circuits
Case 24:
Fig. 14.42 Electrical opens of a metallic
pad (RAM chip), produced by
eiectromigration
14.10.11
Popcorn noise (Burst noise)
Case 25:
Fig. 14.43 Typical example of popcorn
noise at an operational amplifier
406 14 Failure analysis
14.10.12
Holes in silicon
Case 26:
Fig. 14.44 Silicon dissolution III
aluminium (X 11000)
Case 27:
Fig. 14.45 Dissolution of silicon in
aluminium. To be noted the change of
orientation in horizontal plane (100) (X
1700)
Case28:
Fig. 14.46 Hole in a gate oxide, leading
to a shortcrcuit between metallisation and
substrate (X 5000)
14 Failure analysis 407
14.10.13
Oxide defects
Case 29:
Fig. 14.47 Hole in a gate oxide,
leading to a shortcrcuit between
metallisation and substrate (X 5000)
Case 30:
Fig. 14.48 Cristallisation of a point
defect in a thermally grown SiO, (X
4400)
Case 31:
Fig. 14.49 Surface separation of an
aluminium metallisation covering an
oxide step (X 16000)
408 14 Failure analysis
14.10.14
Advantages of the potential contrast method
Case 32:
Fig. 14.50 Image of a biased transistor, evidenced by
potentional contrast method (X 1000)
Case 33:
Fig. 14.51 Discontinuity of a metallisation pad, evidenced
by potentional contrast method (X 500)
14 Failure analysis 409
14.10.15
Package opening
Case 34:
Metal or ceramic packages may be opened by polishing, cuting, soldering or hiting in a
certain point, carefully, to not damage the die. The pictures show the opened metal
packages for two hibrid circuits with multiple dies. The solder joints are the weak points of
the system (Fig. 14.52-14.53)
Fig. 14.52
Fig. 14.53
Case 35:
Fig. 14.54 For the plastic packages, the
opening is difficult. If in previous
researches input shortcircuits or opens
have been found, one may establish with
X-ray radiography , before opening the
package, if the defect is at the connection
between the pin and the die r14.26]
410 14 Failure analysis
References
14.1 Vissiere, M. (1972): L'analyse des defaillances: les moyens, les methodes d'analyse,
principaux mecanismes de defaillances chez l'utilisateur. Actes du Congres National de
Fiabilite, Sept. 20-22, Perros-Guirec, France, pp. 147-153
14.2 Schwartz, S. (1976): Postmortems prevent future failures. Electronics, Jan. 23, pp. 92
106
14.3 Mann, J. E. (1978): Failure analysis of passive devices. Proceedings of the 16th Annual
Reliability Physics Symp., pp. 89-92
14.4 Wunsch, D. C. (1978): The application of electrical overstress models to gate protective
networks. Proceedings of the 16th Annual Reliability Physics Symp., pp. 47-55
14.5 *** (1973): Parts, material and process experience summary. NASA SP-6507, vol. 2,
Washington D. C.
14.6 Smith, J. S. (1978): Electrical overstress failure analysis in microcircuits. Proceedings
of the 16th Annual Reliability Physics Symp., pp. 41-46
14.7 *** MlL-STD-883, Method T 5003
14.8 Parker, S. L., Lawson, L. E. (1976): Comparison of destruct physical analysis results on
electronic components. Proceedings of the 14th Annual Reliability Physiscs Symp., Jan.
20-22, Las Vegas, pp. 456-460
14.9 Takaide, A., Manabe, N. (1977): RA system using process failure analysis for ICs.
Proceedings ofthe 15th Annual Reliability Physiscs Symp., pp. 1-6
14.10 Tretter, J. (1976): Fehleruntersuchung, Fehlerklassifikation und Fehlerphysik bei
Bauelementen der Nachrichtentechnik. Fernmeldepraxis, Bd. 46, H. 6, pp. 197-216
14.11 Bonnaud, R., Guezou, P. (1978): Essais des composants mecaniques et electriques.
L'echo des Recherches, Jan., pp. 26-31
14.12 Behera, S. K., Speer, D.P. (1972): A procedure for the evaluation and failure anlysis of
MOS memory circuits using the SEM in potential contrast mode. Proceedings of the
10th Annual Reliability Physics Symp., pp. 5-11
14.13 Kranzer, D. (1978): Correlation of crystal defects and bipolar device behaviour. Revue
de Physique Appliquee, vol. 13, Dec., pp. 803-807
14.14 Ebel, G. H., Engelke, H. A. (1973): Failure analysis of oxide defects. Proceedings of the
11th Annual Reliability Physics Symp., pp. 108-116
14.15 Piwczik, B., Siu, W. (1974): Specialized SEM voltage contrast techniques for LSI
failure analysis. Proceedings of the 11th Annual Reliability Physics Symp., pp. 49-53
14.16 Zick, G. L., Sheffer, T. T. (1977): Remote failure analysis of microbased
instrumentation. Computer, Sept., pp.30-35
14.17 Alter, M. J., McDonald, B. A. (1971): The SEM as a defect analysis tool for
semiconductor memories. Proceedings of the 10th Annual Reliability Physics Symp., pp.
149-159
14.18 Patterson, J. M. (1978): Developing an approach to semiconductor failure analysis and
curve tracer interpretation. Proceedings of the 16th Annual Reliability Physics Symp.,
pp.93-100
14.19 Bums, D. J. (1978): Microcircuit analysis techniques using field effect liquid crystals.
Proceedings of the 16th Annual Reliability Physics Symp., pp. 101-105
14.20 *** (1972): Qualitatspriifung und Fehleranalyse an Bauelementen. Sonderheft der
Firma Wandel und Golterman, Reutlingen
14.21 Boulaire, J.-y' , Boulet, J.-P. (1977): Les composants en exploitation. Analyse des
composants defectueux. L'echo des recherches, July, pp.16-23
14.22 Hagenbusch, E. (1973): Auftrag, Auibagen, Arbeitsweise Qualitatspriiflabors fur
Bauelemente. Qualitatspriifung und Fehleranalyse an Bauelementen. Sonderheft der
Firma Wandel und Golterman, Reutlingen
14 Failure analysis 411
14.23 Boulaire, J.-Y. , Boulet, J.-P. (1978): Analyse des compos ants defectueux en
exploitation: methodes et resultats. Actes du Colloque International sur la Fiabilite et la
Maintenabilite. Paris, June 19-23, pp. 401-407
14.24 Belbeoch, J.-Y, Boulet, J.-P. (178): SADE - systeme d'analyse des defaillances en
exploitation. L'echo des recherches, July, pp.12-19
14.25 Doyle, R. Jr. (1979): Military microcircuits: failure analysis at RADC. Military
Electronics I Countermeasures, vol. 5, no. 2, pp.75-79
14.26 Becker, P. (1982): Ausfallanalyse als wesentlicher Bestandteil der Qualitats- und
Zuverlassigkeits-sicherung. Qualitat und Zuverlassigkeit, h. 8, Sonderdruck
14.27 Sebald, N. (1982): Qualitatssicherung integrierter Schaltkreise. IEE Productronic, vol.
27, no. 4, pp.20-22
14.28 Angerer, R. et al. (1982): Beispiel aus der Tiitigkeit der Komponenten-Evaluation. Neue
Technik, H. 11112, pp. 42-47
14.29 Schafer, W.; Niederauer, K. (1982): Rasterelektronenmikroscopie - ein Verfaren sur
Untersuchung fester oberflachen. Messen+PriifeniAutomatik, H.1I, pp. 744-749
14.30 Hersener, J. (1982): Rasterelektronenemikroscopie und Halbleiterbauelemente-
Entwiklung. Messen+PriifeniAutomatik, H.ll, pp. 750-753
14.31 Oatley, C. W. (1982): The early history ofSEM. J. Appl. Phys, pp. RIl-R13
14.32 Biijenescu, T. 1. (1984): Fehleranalyse an Halbleiterbauelemente. Elektronik Produktion
& Priiftechnik (West Germany), Mai, pp. 245-250
14.33 Weygang, A. H. (1979): Fehleranalyse an integrierten Halbleiterschaltungen.
Elektronik, H.12, pp. 55-61
14.34 Doyle, E. A. Jr. (1981): How parts fail. IEEE Spectrum, no. 10, pp. 36-43
14.35 Nenyei, Zs.; Kalmar, G. (1982): Einfluss verschiedener chlorierter Losemittel auf die
Zuverlassigkeit von Halbleiterbauelementen. Metalloberflache, H. 8, pp. 372-379
14.36 Schaffer, E. (1979): Zuverlassigkeit, Verfiigbarkeit und Sicherheit in der Elektronik.
Vogel-Verlag
14.37 Hosoya, N. (1981): "Pressurecooker" using steam pressure raises semiconductor
reliability. JEE, March, pp. 78-81
14.38 Dawes, C. J. (1976): An evaluation of techniques for bonding beam-lead devices to
gold thick films. Solid State Technology, March
14.39 Burgess, D. (1980): Physics of failure. In: Grant Ireson, W.; Coombs Jr., C. W. (eds.)
Handbook of reliability engineering and management. Mc Graw-Hill Book Comp.,
New York
14.40 Jaques, M. (1979): The chemistry of failure analysis. Proceedings of the 17tl1 Annual
Reliability Physics Symp., pp. 197-208
14.41. Papaioannou, G. (1998): Report on Schottky diode assessment. Phare/TTQM project
RO 9602-02, IMT-Bucharest (Romania)
14.42 Werner, H. W.; Garten, R. P. H. (1984): A comparative study of methods for thin-film
surface analysis. Rep. Prog. Phys., vol. 47, pp. 221-344
14.43 Leroux, c.; Blachier, D.; Briere, 0.; Reimbold, G. (1997): Light emission microscopy
for thin oxide reliability analysis. Microelectronic Engineering, vol. 36, p. 297
14.44 Nafria, M.; Sune, J.; Aymerich, X. (1993): Exploratory observations of post-breakdown
conducion in polycrystalinne-silicon and metal-gate thin-oxide metal-oxide-
semiconductor capacitors. J. Appl. Phys., vo1.74, pp. 205-209
14.45 Wu, E.Y.; Lo, S.-H.; Abadeer, W.W.; Acovic, A.; Buchanan, D.; Furukawa, T.;
Brochu, D.; Dufresne, R. (1997): Determination of ultr-thin oxide voltages and
thickness and the impact on reliability projection. Proceedings of the IEEE International
Reliability Physics Symp., pp. 184-191
14.46 Kim, Q.; Stark, B.; Kayali, S. (1998): A novel, high resolution, non-contact channel
temperature measurement technique. Proceedings of the IEEE International Reliability
Physics Symp., pp. 108-112
14.47 De Wolf, 1.; Howard, DJ.; Rasras, M.; Lauwers, A.; Maex, K.; Groeseneken, G.; Maes,
H.E. (1998): A reliability study of titanium silicide lines using micro-Raman
412 14 Failure analysis
15.1
Software-package RAMTOOL ++ [15.1]
15.1.1
Core and basic module R3 Trecker
The R3 Trecker is a utility for reliability prediction, ensuring failure rate prediction
and assessment of the Mean Time Between Failures (MTBF). R3 stands for the
implied three prediction methods for the reliability:
• Parts count technique and component stress analysis method, both according
to the reference handbooks (MIL-HDBK-217F2 and Nortel TR-322, Issue 6).
• Integrated calculation scheme on non-operating failure rate adapted on the
last version of the RADel publication TR-85 as attached on BETA version of
MIL-HDBK-217E.
This basic module of RAMTOOL++ may perform analyses for any equipment
during all life phases, covering a large range of requirements, such as:
• Parts count method for operating / active state.
• Parts count method for operating steady state including limited stress analysis.
• Parts in-circuit stress analysis method for operating lactive state.
15.1.2
RM analyst
15.1.3
Mechanicus (Maintainability analysis)
15.1.4
Logistics
The RM Logistics enables concurrent provisioning for the right logistic Gust in
time) and at optimised costs. It contains the provisioning sub module and the RCM
Optimator. The provisioning sub module is using concurrent engineering idea that
reliability data are already available. The user can make a set up to establish ground
rules related to what is repairable or not. The RCM Optimator (reliability cost
modeling optimisation module) uses RAM data to calculate the related costs of
ownership up to a life cycle cost at any time during the project.
15 Appendix 415
15.1.5
RM FFT-module
15.1.6
PPoF-module
15.2
Failure rates for components used in telecommunications
RESISTORS
Resistor families L TA ("C) 1.
Carbon resistors 0.3 40 ... 70 1.6 .. .5.2
Metal film high stability small power 0.1 40 ... 70 1.2 ... 1.7
resistors
Metal film isolated small power resistors 0.5 40 ... 70 6.0 ... 8.0
Metal film high power resistors 0.5 40 ... 70 930 .. .1020
Wirewound high power resistors 0.5 40 ... 70 54 ... 72
Wirewound precision small power 0.1 40 ... 70 42 ... 82
resistors
Wirewound precision high power 0.8 40 ... 70 210 .. .325
resistors
POTENTIOMETERS
Potentiometer families L TA ("C) A.
Film potentiometer with adjustable 0.3 40 ... 70 100.150
resitors
Film precision potentiometer 0.1 40 ... 70 1580 .. .1880
Wirewound small power potentiometer 0.1 40 ... 70 410 ... 600
Wirewound high power potentiometer 0.5 40 ... 70 725 .. .1050
Wirewound precision potentiometer 0.1 40 ... 70 685 ... 930
Wirewound fine tune potentiometer 0.5 40 ... 70 54 ... 78
CAPACITORS
Capacitor families L TA ("C) A.
Polyester / foil capacitors (70°C) 0.1 40 ... 70 1.8 .. .13.4
Polyester / foil capacitors (85°C) 0.1 40 ... 70 1.4 ... 2.0
Polyester / foil capacitors (125°C) 0.1 40 ... 70 1.2 ... 1.4
Glass capacitors 0.3 40 ... 70 24 ... 88
Ceramic capacitors with nondefined 0.5 40 ... 70 24 .. .26
temperature coefficient (85°C)
Ceramic capacitors with nondefined 0.5 40 ... 70 22
temperature coefficient (125°C)
Ceramic capacitors with defined 0.1 40 ... 70 4 ... 2
temperature coefficient
Tantalum capacitors with liquid electrolyte 0.5 40 ... 70 150 .. .200
Tantalum capacitors with solid electrolyte 0.5 40 ... 70 11...17
Aluminium capacitors with liquid 0.5 40 ... 70 60 ... 150
electrolyte
Aluminium capacitors with solid electrolyte 0.5 40 ... 70 94 ... 350
Mica humid capacitors 0.1 40 ... 70 2 ... 6
Mica button capacitors 0.1 40 ... 70 8.8 ... 9.6
15 Appendix 417
TRANSISTORS
Transistor families L TA (0C) Aa
NPN silicon transistors 0.03 40 ... 70 37 ... 50
PNP silicon transistors 0.03 40 ... 70 55 ... 80
NPN germanium transistors 0.3 40 ... 70 160 .. .475
PNP germanium transistors 0.3 40 ... 70 60 ... 175
Field effect silicon transistors 0.1 40 ... 70 75 .. .105
Unijonction silicon transistors 0.1 40 ... 70 65 ... 110
VARIOUS COMPONENTS
TJ'pes A (10-' h-l)
Thermistors 12
Quartz devices 200
Solder joints (wave / manual) 0.2 ... 1.0
Components with ferites 1000
Connectors (25 pins) 312
Equiped cards (double face / multilayer) with N (0.01...1) N
holes
418 15 Appendix
INTEGRATED CIRCUITS
IC families TA ("C) 1.
Digital IC with less than 400 transistors (100 gates):
• TTL+DTL 40 ... 70 50 ... 70
15.3
Failure types for electronic components [15.2]
Components fo fk Drift
Resistors 0.99 0.Q1
Film potentiometers 0.7 0.1 0.2
Wirewound potentiometers, small power 0.9 0.1
Ceramic capacitors with nondefined temperature 0.4 0.4
coefficient (type I)
Ceramic capacitors with nondefined temperature 0.1 0.9
coefficient (85°C, type II)
Ceramic capacitors with nondefined temperature 0.5 0.5
coefficient (125-150 oC/ type II)
Tantalum capacitors with solid or liquid electrolyte 0.2 0.8
Aluminium capacitors with liquid electrolyte 0.1 0.9
U<63V
15 Appendix 419
Components f. fk Drift
Aluminium capacitors with liquid electrolyte 0.1 0.1 0.8
63V<U<350V
Aluminium capacitors with liquid electrolyte 0.5 0.5
U>350V
Aluminium capacitors with solid electro1y!e 0.3 0.7
Mica humid capacitors 0.1 0.7 0.2
Mica button capacitors 0.2 0.8
Paper or plastic capacitors 0.2 0.8
Glass capacitors 0.8 0.2
Coils and transformers 0.8 0.2
Silicon signal and rectifier diodes 0.2 0.8
Z diodes 0.3 0.6 0.1
Tyristors 0.2 0.2 0.6
Field effect and unijunction transistors 0.3 0.3 0.4
Optocuplers 0.1 O.l 0.8
Bipolar integrated circuits 0.4 0.6
15.4
Detailed failure modes for some components
Components Failure
probability (%)
Silicon npn and pnp transistors
Breakthrough 25
Open circuit: EB / BC / EBC 15/5/10
Short circuit: EB / BC / EBC 5/ 10/20
Small current gain 5
High leakage currents 5
Bipolar integrated circuits
Open input / output 10/20
Short circuit input / output 20/20
Degradated input / output 5/5
Too high / nul supply current 10/5
Unrespected logic function 5
MOS integrated circuits
Short circuit input / output / supply 10110/5
Internal defect 75
Linear integrated circuits
Short circuits 30
Open circuits 10
Blocking 60
420 15 Appendix
15.5
Storage reliability data [15.3]
It has been shown that the storage has an important influence on the reliability of a
product. Further, American and European data on the reliability of components
storaged in given environmental conditions, data collected by the "Reliability"
group of AFCIQ are presented.
• Environment: fixed ground.
• Failure rate: in FITs, confidence level 60%.
Components A(FIT)
Bipolar analogic SSIIMSI ICs «100 gates) 958
Tantalum ca~pacitors 229
LED 152
Transistors (all types) 87
Bipolar digital SSIIMSI ICs «100 gates) 68
Diodes (all types) 32
Inductive components (all types) 23
Potentiometers 18
Resistors (all types) 3.2
Capacitors (all types) 0.44
Solder joints 0,32
15.6
Typical costs for the screening of plastic encapsulated ICs
(in Swiss francs) [15.4]
15.7
Failure criteria. Some examples
15.8
Results of 1000 h HTB life tests for 8 bit CMOS
microprocessors encapsulated in ceramics, type NSC 800
[15.5]
15.9
Results of 1000 h HTB life tests for linear circuits
encapsulated in plastic [15.5]
15.10
Average values of the failure rates for some Ie families
IC Number Mean failure Number Number of
families of items rate (%) of batches with
batches pre-treatment
TTL 288595 1.0 258 70
CMOS 257385 1.6 268 83
PMOS 384326 1.5 77 98
J..lP 116 123 2.2 131 2
Peripheries 92318 4.0 227 6
RAM 92503 2.0 100 12
EPROM 105924 3.2 144 98
Total: 1337174 1.8 1205 369
15 Appendix 423
15.11
Activation energy values for various technologies
- 1.0-l.35
• Surface charge accumulation; mobile ions
- 1.0-l.3
• Charge injection, slow trapping at the Si-
Si0 2 interface
Integrated circuits:
MlL-STD-883B:
0.44 0.44
• Bum-in test (method 1005.2)
1.0 1.0
• High temp. storage (method 1008.1)
1.0 1.0
• Steady-state life (method 1015.2)
424 15 Appendix
References
*** (1998): Proceedings of the Custom Integrated Circuits Conference, Santa Clara,
California (USA), Mai 11-14, 1998
Abdel, Ghaly, A A (1986): Ph. D. thesis. City University of London
Abel, D. (1990): Petri-Netze flir Ingenieure. Springer-Verlag, Berlin
AbramoWitz, M.; Stegun, 1. E., eds. (1965): Handbook of Mathematical Functions. Dover,
New York
Ackermann, W.-G. (1955): Einftihrung in die Wahrscheinlichkeitsrechnung. S. Hirzel
Verlag, Leipzig
Ackmann, W. (1961): Alterungskriterien bei Elektrolytkondensatoren. NTF 24. vol. I, pp.
115-126
Ackmann, W. (1973): Reliability and Failure of Capacitors. Proceedings of the 3rd
Symposium on Reliability, Budapest, pp. 3-12
Ackmann, W. (1976): Zuverlassigkeit elektronischer Bauelemente. Hiithig-Verlag,
Heidelberg
Ackmann, W.: Neuere Ergebnisse zur Zuverlassigkeit des Ta-Kondensators. SEL-
Nachrichten vol. 12, no. 1, pp. 38-41
Adams, E. N. (1984): Optimizing Preventive Service of Software Products. IBM Journal of
Research and Development, vol. 28, no. 1
AFCIQ (1981): Guide d'evaluation de fiabilite en mecanique. Paris
Aitchison, J.; Dunsmore, 1. R. (1975): Statistical Prediction Analysis, Cambridge University
Press, Cambridge
Akaike H. (1982): Prediction and Entropy. MRC Technical Summary Report, Mathematics
Research Center, University of Wisconsin-Madison
Amerasekera, A; Campbell, D. S. (1987): Failure Mechanisms in Semiconductor Devices. 1.
Wiley and Sons, Chichester
Amerasekera, A; Verwey, J. (1992): ESD in Integrated Circuits. Quality and Reliability
Engineering International, vol. 8, pp. 259-272
Amman, P. E.; Knight, J. C. (1987): Data Diversity: An Approach to Software Fault
Tolerance. Digest FTCS-17, 17th Internat. Symposium on Fault-Tolerant Computing,
pp. 122-126
Anderson, R. 1 (1985): A V-8B Design for Maintainability. Proc. Ann. ReI. & Maint.
Symp., pp. 28-33
Anderson, R. T. (1976): Reliability Design Handbook. lIT Research Institute, Chicago
Andre, G.; Regnault, 1 (1972): Problemes de la fiabilite lies a I'encapsulation plastique.
L'Onde electrique, vol. 2, fasc. 3, mars, pp. 121-125
Ankenbrandt, F. 1. (1960): Electronic Maintainability. vol. 3, London
Arrow, K. 1; Karlin, S.; Scarf, H. (1962): Studies in Applied Probability and Management
Science. Stanford University Press
Arsenault, 1 E.; Roberts, 1 A. (1980):Reliability and Maintenability of Electronic Systems.
Computer Science Press, Rockville, Maryland
Ascher, H.; Feingold, H. (1984): Repairable Systems Reliability. Dekker, New York
426 General bibliography
Bazu, M. (1996): Are, really, needed components for military use? Reliability and Quality
Assurance Symp., November
Biizu, M. (1996): Fuzzy-logic based reliability prediction for the building-in reliability
approach. In: Negoita, M, Zimmermann, H.-J., Dascalu, D. (eds.) Real word
applications of intelligent technologies, Romanian Academy, July, pp. 124-128,
Bucharest
Bazu, M. (1997): The quality of quality reasearches. Reliability and Quality Assurance
Symp., November
Biizu, M. et al. (1997): MOVES - a method for monitoring and verfying the reliability
screening. Proc. of the 20th Int. Semicond. Conf. CAS '97, October 7-11, Sinaia, pp.
345-348
Bazu, M. (1998): The reliability of semiconductor devices: an overview. Proc. of the 6th
Internat. Conf. on Optimization of Electrical and Electronic Equipments OPTIM '98,
Brasov (Romania), May 14-15, pp.785-788
Bazu, M. (1999): Reliability assessment based on fuzzy logic. International Conf. on
Computational Intelligence for Modelling, Control and Automation, CIMCA'99, Viena,
Austria, February 17-19
Bednars, S. M.; Mariott, D. L. (1988): Efficient Analysis for FMEA. Proc. Ann. ReI. &
Maint. Symp., pp.416-421
Bednarz, S. (1988): Efficient Analysis for FMEA. Proc. Annual Reliability and
Maintainability Symposium, pp.416-421
Beichelt, F. (1970): ZuverHissigkeit und Erneuerung. VEB Verlag Technik, Berlin
Beichelt, F. (1993): ZuverHissigkeits- und Instandhaltungstheorie. Teubner Verlag, Stuttgart
Beichelt, F.; Franken, P. (1983): ZuverHissigkeit und Instandhaltung - Mathematische
Methoden. VEB Verlag Technik, Berlin
Bell Communications Research (1985): Reliability Prediction Procedure for Electronic
Equipment. (TR-TSY-OOO 332), Bell, Morristown NJ
Bellut, S. (1990): La competitivite par la maitrise des couts, conception a cout objectif et
analyse de la valeur. AFNOR gestion, Paris
Bennets, R. G. (1996): Built-In Self Test Backgrounder. LogicVision
Benson, K. E. et al. (1990): Reaching the Limits in Silicon Processing. AT&T Technical
Journal, NovemberlDecember
Benz, G. E.; Bazovsky, 1. (1990): Adapting Mechanical Models to Fit Electronics. In Proc.
Annual Reliab. Maintainability Symp. 1990. IEEE Reliability Society, Los Angeles,
Janary 23-25, New York, pp. 153-156
Berman, A. (1981): Time-Zero Dielectric Reliability Test by a Ramp Method. International
Reliability Physics Symposium, pp. 204-208
Bernasconi, J. et al.: Investigation of Various Models for Metal Oxide Varistors. Journal of
Electron Materials, vol. 5, no. 5, pp. 473-495
Bernet. R. (1988): CARP - a Program System to Calculate the Predicted Reliabi-lity. 6th
Internat. Conference on Reliability and Maintainability, Strasbourg, pp. 306-310
Bertsche B.; Lechner, G. (1990): ZuverHissigkeit im Maschinenbau. Springer-Verlag, Berlin
Biancomano, V. (1983): Screening method points to causes of low-voltage failure in MLC
capacitors. Electronic Design, 23rd June, 47-48
Bickley, J. (1981): ZuverHissigkeit von Halbleiterbauelemente. Elektronik (West Germany),
no. 14,pp. 51-58
Billinton, R.; Allan, R. N. (1983): Reliability Evaluation of Engineering Systems. Pitman,
Boston
434 General bibliography
Braun, H.; Paine, J. M. 81977): A Comparative Study of Models for Reliability Growth.
Technical Report No. 126, series 2, Depart. of Statistics, Princeton University
Brender, D. M. (1968): The prediction and measurement of system availability: A Bayesian
treatment. IEEE Trans. Reliability, vol. R-17, pp. 127-147
Brinkmann, R. (1993): Modellierung des ZuverHissigkeitswachstums komplexer,
reparierbarer Systeme. Dissertationsarbeit an der ETHZ
British Telecom. Handbook of Reliability Data (HRD4), British Telecom, Birmingham,
1987
Brocklehurst, S. (1987): On the Effectiveness of Adaptive Software Reliability Modelling.
CSR Technical Report, City University, London
Buckley, F. J., Poston, R. (1984): Software Quality Assurance. IEEE Trans. Soft. Eng. vol.
10,no.l,pp.36-41
Bulucea, C. D. (1970): Investigation of deep depletion regime of MOS structures using
ramp-response method. Electron. Lett., vol. 6, pp. 479-481
Bulucea, C. D.; Antognetti, P. (1970): On the MOS structure in the avalanche regime. Alta
Frequenza, vol. 39,pp. 734-737
Sah, C. T. ; Pao, H. C. (1966): The effects of fixed bulk charge on the characteristics of
metal-oxide-semiconductor transistors. IEEE Trans. Electron Dev., vol. 13, pp. 393-
397
Bunday, B. D. et al. (1990): Likelihood and Bayesian Estimation Methods for Poisson
Process Models in Software Reliability. Intemat. 1. of Quality and Reliability
Management, vol. 7,no. 5,pp. 9-18
Calabro, S. R. (1962): Reliability Principles and Practice. McGraw-Hill, New York
Campbell, D. S. et al. (1991): Reliability Behaviour of Electronic Components as a Function
of Time. Proceedings ESREF'91, pp. 41-48, Bordeaux
Carada, E. (1975): L'affidabilita per l'electronica. Roma. La Goliardica
Carlson et al. (1986): A Procedure for Estimating Life Time of Gapless Oxid Surge Arresters
for an Application. IEEE Trans. Power Applic. and Systems PAS-Ol
Caroll,1. M. (1962): Reliability (mathematics of reliability -life testing - designing reliable
circuits - component reliability - system design - physics of failure). Electronics,
November, pp. 53-76
Carter, A. D. S. (1986): Mechanical Reliability. Macmillan, London, 2nd Edition
Catuneanu, V. M., Mihalache, A. N. (1989): Reliability Fundamentals. Elsevier, Amsterdam
Catuneanu, V. M., Popentiu Fl. (1987): Optimum Spare Allocation Policy for Preventive
Maintenance. Microel. & Reliability vol. 27, no.1, pp. 45-48
CECC 42000, CECC 42200: Harmonisiertes Geratebestatigungssystem flir Bauelemente der
Elektronik. Fachgrundspezifikationen und Rahmenspezifikationen, Varistoren.
Deutsche Elektrotechnische Kommission, Frankfurt
Chan, P. Y. et al. (1985): Parametric Spline Approach to Adaptive Reliability Modelling.
CSR Technical Report, City University, London
Chandramouli, R., Pateras, S. (1996): Testing Systems on a Chip. IEEE Spectrum,
November, pp. 42-47
Chapouille, P. de Pazzis, R. (1968): Fiabilite des systemes. Masson, Paris
Chen,1. C. et al. (1985): A Quantitative Physical Model for Time-Dependent Breakdown in
Si0 2. International Reliability Physics Symposium, pp. 22-31
Chiang, C.1.; Hurley, D.T. (1998): Dynamics of backside wafer level microprobing.
Proceedings of the IEEE International Reliability Physics Symp., pp. 137-149
Chiu, T. 1.; Sah, C. T. (1968): Correlation of experiments with a two-section-model theory
pofthe saturation drain conductance ofMOS transistors. Solid-St. Electron. vol. 11, pp.
1149-1157
436 General bibliography
Christou, A. (1992): Reliability of Gallium Arsenide. MMICs, 1 Wiley and Sons, Chichester
Chwastek, E. 1, Shaw, R. N. (1987): A Rapid Technique for Assessing the Moisture Ingress
Susceptibility of Plastic-Encapsulated Integrated Circuits. Quality and Reliability
Engineering International vol. 3, pp. 185-193
Ciappa, M. (1990): Ausfallmechanismen Integrierter Schaltungen. Bericht F1I31.1.1990.
ETH Ziircih
Ciappa, M. (1994): Package Reliability in Microelectronics: and Overview. Proc. of
WELDEC, Lausanne, October 4 - 7
Cluley,1 C. (1974): Electronic Equipment Reliability. 1 Wiley & Sons, New York
Cole Jr., E.1.; Soden, 1M.; Rife, 1L.; Baron, D.L.; Henderson, C.L. (1994): Novel failure
analysis techniques using photon probing in a scanning optical microscope.
Proceedings of the IEEE International Reliability Physics Symp., pp. 388-398
Conwell, E. M. (1967): High field transport in semiconductors. Academic Press, New York
Cooke, R. M. (1987): A Theory of Weights for Combining Expert Opinion. Dept. of
Mathematics, Delft University of Technology
Cooke, R. M. (1991): Experts in Uncertainity: Expert Opinion and Subjective Probability in
Science. Oxford University Pres
Cooke, R. M. et al. (1988): Calibration and Information in Expert Resolution: A Classical
Approach. Automatica vol. 24, pp. 87-94
Cooke, R. M. et al. (1988): Expert Opinion in Safety Studies: Case Report 4 - DSM Case.
Depart. of Mathematics, Delft University of Technology
Coppola, A. (1984): Reliability Engineering of Electronic Equipment - a Historical
Perspective. IEEE Transactions on Reliability vol. 33, pp. 29-35
Costes, A. et al. (1978): Reliability and Availability Models for Maintained Systems
Featuring Hardware Failures and Design Faults. IEEE Trans. Compo vol. 27, no. 6, pp.
548-560
Coulbourne, E. D. et al. (1974): Reliability ofMOS LSI Circuits. Proc. IEEE, Vol. 62, no. 2,
pp.244-259
Cox, D. R. (1962): Renewal Theory. Methuen, London
Cox, D. R. (1965): Erneuerungstheorie. R. Oldenbourg Verlag, Miinchen
Cox, D. R., Lewis, P. A. W. (1978): The Statistical Analysis of Series of Events. Chapman
and Hall, London
Cox, D. R., Smith, W. L. (1953): On the Superposition of Renewal Processes; In Biometrics,
vol. 40, pp. 1-11
Crook, D. L. (1991): Evolution of VLSI Reliability Engineering. Quality and Reliability
Engineering International, vol. 7, pp. 221-233
Crosby, P. B. (1971): Qualitiit kostet weniger. Verlag A. Holz
Crow, L. H. (1977): Confidence Interval Procedures for Reliability Growth Analysis.
Technical Report No. 197, US Army Material System Analysis Activity, Aberdeen, Md.
Csenki, A. (1991): Some Renewal-Theoretic Investigations in the Theory of Sojourn Times
in Finite Semi-Markov Processes. 1 Appl. Prob. vol.28, pp. 822-832
Csenki, A. (1991): Some Renewal-Theoretic Investigations in the Theory of Sojourn Times
in Finite Semi-Markov Processes. 1 Appl. Prob. vol.28, pp. 822-832
Csenki, A. (1992): Sojourn Times in Markov Processes for Power Transmission
Dependability Assessment with MATLAB. Microelec. and Reliability vol. 32, pp. 945-
960
Csenki, A. (1993): Occupation Frequencies for Irreductible Finite Semi-Markov Processes
ith Reliability Applications. Computers & Ops. Res., vol. 20, pp. 249-259
General bibliography 437
Eda, Iga, Matsuoko (1980): Degradation Mechanism of Nonohmic Zinc Oxide Ceramics.
Appl. Phys. 51, no. 5
Ega (1984): Destruction Mechanism of ZnO Varistors due to High Currents. J. Appl. Phys.
56,pp.810
Einzinger: Nichtlineare elektrische Leitfahigkeit von dotierten Zink-Oxid-Keramik.
Dissertation, Fakultiit flir Physik der TU Miinchen
Ellis, B. N. (1986): Cleaning and Contamination of Electronics Components and
Assemblies. Electrochem. Publ., Ayr (Scotland)
Engelmeier, W., Attarwala, A. 1. (1989): Surface-mount attachmente reliability of clip-
leaded ceramic chip carriers on FR-4 circuit boards. IEEE Trans. Comp .. Hybrids, and
Manuf. Technol., vol. 12, no. 2, pp. 284-296
Epstein, D. (1982): Applicaiton and use of acceleration factors in microelectronics testing.
Solid State Technoogy, November, pp. 116-122
ESA (1988) PSS-01-60 Issue 2, November
Etzrodt, A. (Herausgeber): ZuverHissigkeit in Einzeldarstellungen. Oldenbourg Verlag,
Miinchen / Wien
European Safety and Reliability Research and Development Association (1990): Expert
Judgement in Risk and Reliability Analysis: Experience and Perspective. ESSRDA
Report no. 2
Fagan, 1. (1987): Achieving Reliability in the Real World. Proceedings of the Annual
Reliability and Maintenability Symposium, pp. 152-158
Fauchier, E. et al. (1996): Impact of the VHDL Description on the Testability ofIntegrated
Systems. Quality Engineering vol. 8, no. 4, pp. 623-633
Faul, R., Bartosz, R. (1984): Ausfallmechanismen bei integrierten Halbleiterbauelementen.
Elektronikno.l0,pp.73-79
Feller, W. (1969): An introduction to probability theory and its applications. John Wiley,
New York
Fischer, K. (1984): Zuverliissigkeits- und Instandhaltungstheorie. Transpress. VEB, Berlin
Fischer, K. D. et al. (1996): PRML Detection Boosts Hard-Disk Drive Capacity. IEEE
Spectrum, Nov., pp. 70-76
Fisz, M. (1980): Wahrscheinlichkeitsrechnung und mathematische Staistik. VEB Deutscher
Verlag der Wissenschaften, Berlin
Footner, P. K. et al. (1987): Purple Plague: Eliminated or Just Forgotten? Quality and
Reliability Engineering International, vol. 3, pp. 177-184
Forman, E.H., Singpurwalla, N. D. (1977): An Empirical Stopping Rule for Debugging and
Testing Computer Software. J. of the American Statistical Assoc., vol. 72, pp. 750-757
Fougerousse, S., Germain, J. (1991): Pratique de la maintenance industrielle par Ie coilt
global. AFNOR gestion
Fox, R. W.: Six Ways to Control Transients. Electronic Design, vol. 22 no. II. pp. 52-57
Freiberger W. (Ed.) (1972): Statistical Computer Performance Evaluation. Academic Press,
New York, pp.465-484
French, S. (1985): Group Consensus Probability Distributions: a Critical Survey. In:
Bernardo, J. M. et al. (Eds.) Bayesian Statistics, North Holland, vol. 2, pp. 183-201
Freudenthal, AS. M., Gumbel, E. J. (1953): On the statistical interpretation of fatigue tests.
Proc. Roy. Soc., London, vol. 216, pp. 309-332
Frey, H. (1973): Computerorientierte Methodik der Systemzuverlassigkeits- und
Sicherheitsanalyse.Dissertation Nr. 5244 ETH ZUrich
Frey, H. H. (1974): Safety Evaluation of Mass Transit Systems by Reliability Analysis. IEEE
Trans. on Reliability vol. R-23, no. 3, pp. 161-169
440 General bibliography
Jacob, P. et al. (1995): IGBT Power Semiconductor Reliability Analysis for Traction
Application. Proceedings IPFA, Singapore
Jahn, R. (1973): Methoden der Zuverlassigkeitsarbeit - ein wichtiger Faktor der
Effektivitatserhoherung und Intensivierung im Industriebereich Elektrotechnik
Elektronik. Qualitat und Zuverlassigkeit no. 12, p. 305
Jankovic, G., Black, 1. (1996): Engineering a WEB Site. IEEE Spectrum, nov., pp. 62-69
Jarl, R. B. (1976): Radiation Effects on Power Transistors. L'Onde electrique, vol. 56, no. 3,
pp. 119-125
Jelinski, Z., Moranda, P. B. (1972): Software Reliability Research. In: Freiberger, W. (ed.)
Statistical Computer Performance Evaluation. Academic Press, New York, pp. 465-
484
Jensen, F., Petersen, N. E. (1982): Bum-In. Wiley, New York
Jeuland, F. et al. (1991): An Extension of the Rapid Wafer-Level Wijet Method and ist
Comparison with Conventional Electromigration Testing. Proceedings ESREF '91, pp.
187-192, Bordeaux
Jiang, S., Kececioglu, D. (1992): Graphical Representation of Two Mixed Weibull
Distributions. IEEE Trans. on Reliability, vol. 41, no. 2, pp. 241-247
Joe, H., Reid, N. (1995): Estimating the Number of Faults in a System. 1. of the American
Statistical Assoc., vol. 80, pp. 222-226
Johnson, A. M., Malek, M. (1988): Survey of Software Tools for Evaluating Reliability,
Availability, and Serviceability. ACM Compo Surveys vol. 20,
Johnson, G.M.: Evaluation of Microcircuits Accelerated Test Techniques. RADC-TR-76-
218, Rome Air Development Centre, Griffins Air Force Base, New York, 3441
Johnson, 1. G. (1964): The Statistical Treatment of Fatigue Experiments. Elsevier,
Amsterdam
Jones, R. D. (1982): Hybrid Circuit Design and Manufacture. Marcel Dekker, New York and
Basel
Jones, R. E., Smith, L. D. (1987): A New Wafer-Level Isothermal Joule-Heated
Electromigration Test for Rapid Testing of Integrated Circuit Interconnect. Journal of
Applied Physics, vol. 61, pp. 4670--4678
Jordan, W. E. (1972): Failure Modes, Effects and Criticality Analysis Nat. Symposium, pp.
30-37
Jowett, C. E. (1976): Electrostatics in the Electronics Environment. The Macmillan Press
Ltd., London and Basingstoke
Jubisch, H. (1976): Moglichkeiten und Grenzen der Anwendung von Umge--
bungsprtifverfahren zur Ermittlung der Zuverlassigkeit der Elektrotechnik / Elektronik.
Elektrie, vol. 30, no. 10, pp. 511-512
Kao, J. H. K. (1956): A new life-quality measure for electron tubes. IRE Trans. ReI. And
Qual. Control, April, pp. 1-11;
Kao, 1. H. K. (1960): A summary of some new techniques on failure analysis. Proc. Annual
Symp. Reliability, pp. 190-201
Kapur, K. 1., Lamberson, 1. R. (1977): Reliability in Engineering Design. 1. Wiley and
Sons, New York
Kapur, P. K., Kapur, K. R. (1983): Interval Reliability of a Two-Unit Stand-By Redundant
System. Microelec. and Reliability vol. 23, pp. 167-168
Karjalainen, 1. et al. (1996): Practical Process Improvement for Embedded Real-Time
Software. Quality Engineering vol. 8, no. 4, pp. 565-573
Kas, G. (1983): Qualitiit und Zuverlassigkeit elektronischer Bauelemente und Systeme. R.
Oldenbourg Verlag, Mtinchen / Wien
444 General bibliography
Nelson, J. J. et al. (1989): Reliability Models for Mechanical Equipment. Proc. Ann. ReI. &
Maint., pp. 146-153
Nelson, W. (1982): Applied Life Data Analysis. J. Wiley an, New York
Nelson, W. (1990): Accelerated Testing. J. Wiley and Sons, New York
Neumann, J. von (1956): Probabilistic Logics and the Synthesis of Reliable Organisms from
Unreliable Components. Annals of Math. Studies, Princeton University Press, no. 34,
pp.43-98
Newby, M. (1991): Reliability Modelling and Estimation. In: Sander, P. Badoux, R. (eds.)
Bayesian Methods in Reliability, Kluver Academic Publishers, Dordrecht
Niccollian, E. H.; Goetzberger, A (1967): The Si-Si0 2 interface-electrical properties as
determined by th metal-insulator-silicon conduction technique. Bell System Technical
Journal, vol. 46, pp. 1055-1063
Noyce, R. N.; Bohn, R. E.; Chua, H. T.(1969): Schottky diodes make IC scene. Electronics,
July 21, pp. 74-77
O'Connor, D. T. (1991): Practical Reliability Engineering. J. Wiley and Sons, Chichester
Olbrich, T. et al. (1996): Built-In Self-Test in Intelligent Microsystems as a Contributor to
System Quality and Performance. Quality Engineering, vol. 8, no. 4, pp. 60-613
Olson, C. (1989): Reliability of Plastic-Encapsulated Logic Circuits. Quality and Reliability
Engineering International, vol. 5, pp. 53-72
Osaki, S. (1985): Stochastic System Reliability Modeling. World Scientific, Singapore, pp.
11-18,p.35-39,pp.388-402
Osaki, S. (1992): Applied Stochastic System Modeling. Springer-Verlag, Berlin
Pasco, R. W., Schwarz, J. A (1983): The Application of Dynamic Technique to the Study
of Electro migration Kinetics. Internat. Reliability Physics Symposium, pp. 10-23
Pate-Cornell, M. E., Fischbeck, P. S. (1995): Probabilistic Interpretation of Command and
Control Signals.Reliability Engineering and System Safety no. 47, pp. 27-36
Pau, 1. F. (1981): Failure Diagnosis and Performance Monitoring. Marcel Dekker, Inc., New
York
Pecht, M. G., Palmer, M., Naft, J. (1987): Thermal Reliability Management in PCB Design.
Proc. Ann. ReI. & Maint. Symposium, pp. 312-315
Pecht, M., Ramappan, V. (1992): Are components still the major problem: A review of
electronic system and device field failure returns. IEEE Trans. Comp, Hybrids, and
Manuf. Technol., vol. 15, no. 6, pp. 1160-1164
Peck, D. S., Trapp, O. D. (1987): Accelerated Testing Handbook. Technology Associates,
Portola Valley (CA)
Petrick, P.: Das Dauerverhalten von Kondensatoren. Elektronikpraxis vol. 3, no. 2, pp. 7-17;
no. 3/4, pp. 9-16
Pfannschmidt, G. (1992): Ultrasonic Microscope Investigations of Die Attach Quality and
Correlations with Thermal Resistance. Quality and Reliability Engineering
International, vol. 8, pp. 243-246
Philipp, Levinson (1983): Degradation Phenomena in ZnO, a Review. Advances in
Ceramics, no. 7
Picart, B.; Deboy, G. (1992): Failure analysis on VLSI circuits using emission microscopy
for backside observation. Proceedings of ESREF, pp. 515-520
Pierret, R. F.; Sah, C. T. (1968): An MOS-oriented investigation of effective mobility
theory. Solid-State Eelectron., vol. 11, pp. 279-285
Pieruschka, E. (1963): Principles of Reliability. Prentice-Hall, Englewood Cliffs
Platz, G. (1983): Methoden der Software-Entwicklung. Hanser-Verlag, Miinchen
Pollard, A, Rivoire, C. (1971): Fiabilite et statistique previsionnelles. La methode de
Weibull. Eyrolles, Paris
General bibliography 449
Serra A, Barlow R. E. (1986): Theory of Reliability. Course XCIV at the E. Fermi School,
Amsterdam, North-Holland
Sethy, A (1981): Die praktische Arbeit zur Qualitatssicherung elektronischer Bauelemente
und Einrichtungen. E und M, vol. 98, no. 10, pp. 399--406
Shaw, L. et al. (1973): Time Dependent Stress-Strength Models for Non-Electrical and
Electrical Systems. Proc. Armu. Symp. Reliability, pp. 186-197
Shiomi, H. (1968): Application of cumulative degradation model to acceleration life test.
IEEE Trans. on Reliability, vol. 17, no. 1, March, pp. 27-33
Shockley, W. (1949): The theory of pn junctions in semiconductors and pn junction
transistors. Bell Syst. Techn. Journal, vol. 28, pp. 435--467
Schokley, W. (1952): A unipolar field-effect transistor. Proc. IRE, vol. 40, pp. 1365-1371
Shockley, W.; Prim, R. C. (1953): Space-scharge limited emission in semiconductors. Phys.
Rev., vol. 90, pp. 753-762
Shockley, W. (1954): Negative resistance arising from transit time in semiconductor diodes.
Bell Syst. Techn. Journal, vol. 33, pp. 799-809
Shockley, W. (1957): HighOfrequency negative resistance device. U.S. Patent 2794917, June
4
Shooman, M. (1973): Operational Testing and Software Reliability During Program
Development. Record 1973 Symp. on Computer Software Reliability, New York, 1973,
April 30 - May 2, pp. 51-57
Sichart, K. V., Vollersten, R.-P. (1991): Bimodal Lifetime Distributions of Dielectrics for
Integrated Circuits. Quality and Reliability Engineering International, vol. 7, pp. 299-
306
Siemens, SN 29 500 (1986): Failures Rates of Components. ZUrich, Siemens-Albis
Siewiorek, D.P., Swarz, R. S. (1982): The Theory and Practice of Reliable System Design.
Digital Press, Bedford, MA
Singh, c., Billinton, R. (1977): System Reliability Modelling and Evaluation. Hutchinson,
London
Sinnadurai, N (1980): Accelerated ageing of IMPATT diodes. Microelectronics and
Reliability, vol. 21, no. 2, pp. 209-219
Sinnadurai, N (1991): Environmental Testing and Component Reliability Observations in
Teleconununications Equipment Operated in Indian Climatic Condition's. Proceedings
ESREF'91, pp. 55-63, Bordeaux
Smith, AF.M., Skene, I.E.H., Naylor, I. C. (1987): Progress with Numerial and Graphical
Methods for Practical Bayesian Statistics. Statistician, no. 36, pp. 75-82
Smith, D. I., Babb, A H. (1973): Maintainability Engineering. Pitman Publishing, Bath
Smith, W. L. (1958): Renewal theory and its ramifications. J. Roy. Stat. Soc. Ser. B, vol. 20,
pp.243-302
Solid Aluminium Capacitors - Reliability and Stability. Philips Technical Information
057/12.6.79
Solovyev, A D. (1970): Standby with rapid renewal. Eng. Cybernetics, vol. 8, pp. 49-62
Soom, E. (1970): Einfiihrung in die mathematische Statistik und in die Wahr-
scheinlichkeitsrechnung. Hallwag, Bern
Srinivasan, G. R. (1996): Modeling the cosmic-ray-induced soft-error rate in integrated
circuits: An overview. IBM I. Res. Develop., vol. 40, no. 1, pp.77-89
Srinivasan, S. K., Gopalan, M. N. (1973): Probabilistic analysis of a two-unit system with a
warm standby and a single repair facility. Oper. Res. vol. 21, pp. 748-754; IEEE Trans.
ReI., vol. R-22, pp. 250-254
Srinivasan, V. S. (1966): The Effect of Standby Redundancy in System's Failures with
Repair Maintenance; Operations Research, vol. 14, no. 6, pp. 1024-1036
452 General bibliography
Van der Ziel, A. (1967): Normalized characteristics of nun devices. Solid-St. Electron. vol.
10,pp.267-172
Van der Ziel, A. (1968): Solid-state physical electronics. Prentice-Hall, New Jersey
Vanhecke, B. et al. (1991): Electromigration at Gold-Aluminium Interfaces and in Thin
Aluminium Tracks. Proceedings ESREF '91, pp. 193-199, Bordeaux
Vaucher, C. L. et al. (1996): The ppm Myth in Borad Assembly. Quality Engineering, vol. 8,
no. 4, pp.615-621
VDI 2221 (1987): Systematic Approach to the Design of Technical Systems and Products
VDI 4008: Handbuch Zuverliisssigkeitstechnik
VDI 4009 BI. 8 (1985): Zuverliissigkeitswachstum bei Systemen
Vetter, H. (1979): Zuverliissigkeit trotz steigender Komplexitiit der Anforderungen. NTG-
Tagung "Technische Zuverliissigkeit", Niirnberg, pp. 47-86
Viertel, R. (1988): Statistical Methods in Accelarated Life Testing. Vandenhoeck &
Ruprecht, Gottingen
Villemeur, A. (1988): Surete de fonctionnement des systemes industriels. Eyrolles, Paris
Vliet, H. (1993): Software Engineering. Principles and Practice. 1. Wiley & Sons, New York
Wada, Y. et al. (1981): Electrical testing for process evaluation. Microelectronics and
Reliability, vol. 21, no. 2, pp. 159-163
Wagner, G. R., Mischke, C. R. (1973): Cyc1es-to-Failure and Stress-to-Failure Weibull
Distributions in Steel Wire Fatigue. Proc. Annu. Symp. Reliability, pp. 445-451
Wallace, W. E. (1981): Progress in Electronic Systems Reliability. Proceedings of Annual
Reliability and Maintenability Symposium, pp. 272-274
Wallmark, 1. T.; Johnson, H. (1966): Field effect transistors - Physics technology and
applications. Prentice-Hall, New Jersey
Warner, R. M. (1965): Integrated circuits, design principles and fabrication. McGraw-Hill,
New York
Wasserman, G. S., Reddy, I. S. (1992): Practical Alternatives for Estimating the Failure
Probabilities of Censored Life Data. Quality and Reliability Engineering International,
vol. 8, pp. 61-67
Weber, G. G. (1974): State of Reliability in Europe. IEEE Trans. on Reliab. R-23
Weber, W. et al. (1991): Dynamic degradation in MOSFET's - part II: Application in the
circuit environment. IEEE Trans. El. Devices, vol. 38, no. 8, pp. 1859-1867
Webinger, R.: Aluminium-Elektrolytkondensatoren flir den Einsatz in Stromversorgungen.
Bauteile Report vol. 17, no. 2, pp. 37-41
Weibull, W. (1951): A Statistical Distribution Function of Wide Applicability. Journal.
Appl. Mech., vol. 18, pp. 293-297
Weick, W. W. (1980): Acceleration factors for IC leakage current in a steam environment.
IEEE Trans. on Reliability, vol. 29, no. 2, June, pp. 109-115
Westinghouse Summary Chart of 1984 to 1987 for Failure Analysing Memos (1988):
Westinghouse Electric Corporation. In: Pecht, M. et al. (1990): Temperature
Dependence of Microelectronic Device Fails. QRE International, vol. 6, no. 4, pp. 275-
284
Whitehead, A. P., Prince, M. D. H. (1991): Reliability Performance of Electronic
Components. Proc. of Reliability '91 (London), pp. 284-296
Whorf, B. L. (1984): Sprache-Denken-Wirklichkeit. Rowohlt-Verlag, Hamburg
Wiesen, 1. M. (1960): Mathematics of Reliability. Proc. 6th Nat. Symp. on Reliability and
Quality Control in Electr., pp. 110-120
Wilcox, R. H., Mann, W. C. (Ed.) (1962): Redundancy Techniques for Computing Systems.
Spartan Books
454 General bibliography
Wiper, M. P. (1990): Calibration and Use of Expert Probability Judgements. Ph. D. Thesis,
School of Computer Studies, University of Leeds
Wong, K. 1. (1982): A New Direction for Electronic Reliability Engineering in the 80's.
Proceedings ofEurocon '82, Copenhagen, pp. 3-10
Wong, K. 1. (1990): Reliability Prediction Models for Military Avionics. Technical Report
Project no. AF89-158, April
Woods, M. H. (1985): VLSI Reliability, NATO Seminar, Helsing0r
Woods, M. H. (1986): MOS VLSI Reliability and Yield Trends. Internat. Reliab. Physics
Symp., pp. 1715-1729
Wright, G. T. (1964): Theory of space-charge-limited surface-channel dielectric triode.
Solid-St. Electron., vol. 7, pp. 167-173
Wu, E.Y.; Lo, S.-H.; Abadeer, W.W.; Acovic, A; Buchanan, D.; Furukawa, T.; Brochu, D.;
Dufresne, R. (1997): Determination of ultrathin oxide voltages and thickness and the
impact on reliability projection. Proceedings of the IEEE International Reliability
Physics Symp., pp. 184-191
Wurnik, F. M. (1981): Quality Assurance System and Reliability Testing of LSI Circuits.
Microelectronic & Reliability vol. 23, no. 4, pp. 709-715
Zaininger, K. H.; Wang, C. C. (1970): MOS and vertical junction device characteristics of
epitaxial silicon on low aluminium-rich spinel. Solid-State Electronics, vol. 13, pp.
943-947
Zehnder, C. A (1986): Informatik-Projektentwicklung. Verlag der Fachvereine, Zurich
Zerbst, M. (Ed.) (1986): Mess- und Pruftechnik. Springer-Verlag, Berlin
Zio, E. (1995): Biasing the Transition Probabilities in Direct Monte-Carlo. Reliability
Engineering and System Safety vol. 47, pp. 59-63
Glossary of microelectronics and reliability terms
Abrasive trimming: Trimming a film resistor to its nominal value by notching the resistor
with a finely adjusted stream of an abrasive material (for example aluminium oxide,
directly against the resistor surface).
Accelerated lifetest: Test conditions used to bring about - in a short time - the deteriorating
effect obtained under normal service conditions.
Accelerated test: A test in which the applied-stress level is chosen to exceed that stated in
the reference conditions, in order to shorten the time required to observe the stress
responses of the item, or magnify the response in a given time. To be valid, an accelerated
test shall not alter the basic modes and/or mechanisms of failure, or their relative
prevalence.
Acceleration factor: The major failure mechanisms of a component stem from electrical
ageing and both electrical and mechanical wear. The electrical ageing is a chemical
process generally following the chemical reaction equation ofArrhenius:
F = A exp(-EjkT) where F = failure rate; A = a constant; Ea = activation energy (eV); k =
Boltzmann's constant (8.6 X1(J5 eV/K); T = absolute temperature (K). Since electrical
ageing is accelerated at increased temperatures, we can define a time acceleration factor
= exp{Ea/k[(1/TJ) - (1/T~J) where TJ = reference temperature (K), T2 = acceleration
temperature (K).
Acceptance test: 1) A test to demonstrate the degree of compliance of a device with
purchaser's requirements. 2) A conformance test to demonstrate the quality of the units of
a consignment, without implication of contractual relations between buyer and seller.
Active components: Electronic components (transistors, thyristors, etc.) which can operate
on an applied electrical signal so as to change its basic character; i. e. amplification,
switching, rectification, etc.
Active element: An element of a circuit in which an electrical input signal is converted into
an output signal by the non-linear voltage/current relationships of a semiconductor
device.
Active maintenance time: The time during which maintenance actions are performed on an
item either manually or automatically.
Active substrate: A substrate in which active and passive circuit elements may be formed to
provide discrete or integrated devices.
Add-on component: Discrete or integrated pre-packaged or chip components that are
attached to a film circuit to complete the circuit functions.
Adhesion: The property of one material to remain attached to another; a measure of the
bonding strength of the interface between, for example, film deposit and the surface
which receive the deposit; the surface receiving the deposit may be another film or
substrate.
Alloy: A solid-state solution of two or more metals.
Alumina: Al 20 3; alumina substrates are made offormulations that are primarily alumina.
456 Glossary of microelectronics and reliability terms
Bonding pad: A metallised area at the end of a thin metallic strip to which a cOimection is to
be made.
Bonding, stitch: A bonding technique where wire is fed through a capillary tube. A bent
section of the wire is bonded to the contact area by the capillary. The capillary is removed
and a cutter severs the wire, forming a new bend for the next bonding operation.
Bonding, thermal compression: Diffusion bonding where two carefully prepared surfaces
are brought into intimate contact under carefully controlled conditions of temperature,
time, and clamping pressure. Plastic deformation is induced by the combined effects of
pressure and temperature, which in tum results in atom movement causing the
development of a crystal lattice bridging the gap between the facing surfaces and results
in bonding. Generally, the process is performed under a protective atmosphere of inert
gas to keep the surfaces to be bonded clean while they are being heated.
Bonding, wedge: 1) A type of thermocompression bonding used in integrated-circuit
manufacturing where a wedge-shaped tool is used to press a small section of the lead wire
onto the bonding pad. 2) A bond formed when a heated wedge is brought down on a wire
prepositioned on a heated contact. The wedge's heat and pressure in combinations with
heat applied to the mounting contact form the bond.
Bonding, wire: 1) A lead-covered tie used for connecting two cable sheaths until a splice is
closed and covered permanently. 2) Fine gold or aluminium wire for making electrical
connections between various bonding pads on the semiconductor device substrate and
device terminals or substrate lands.
Bond lift-off: The failure mode whereby the bonded lead separates from the surface to
which it was bonded.
Brazing: Similar to soldering. The joining of metals with a non-ferrous filler metal at
temperatures above 425°C. Also called hard soldering.
Breakdown: Failure of a clamp or Zener diode.
Breakdown voltage: The voltage threshold beyond which there is a marked (almost infinite
rate) increase in electrical current conduction.
Burn-in: 1) The operation of items prior to their ultimate application intended to stabilise
their characteristics and to identify early failures. 2) The process of electrically stressing a
device (usually at an elevated temperature environment) for an adequate period of time to
cause failure of marginal devices. 3) (For nonrepairable items): Type of screening test
while an item is in operation. 4) (For repairable items): Operation of an item in a
prescribed environment with successive corrective mainte-nance at every failure during
the early failure period.
Bum-in - statically or dynamically - (125 "C for 160 h) provokes some 80% of the chip
related and 30% of the package related early failures; memories should be operate with
the same electrical signals as in the field. Should surface, oxide and metallisation
problems be dominant, a static bum-in is better. A dynamic bum-in activates practically
all failure mechanisms. The choice will be made on the basis of practical results.
Burn-out: Destruction of the junctions of a transistor due to extremely large currents caused
by latch-up.
Camber: A term that describes the amount of overall warpage present in a substrate.
Capability: Ability of an item to meet a service demand of stated quantitative characte-
ristics under given conditions.
Capillary: A hollow bonding tool used to guide the bonding wire and to apply pressure to
the wire during the bonding cycle.
Capillary tool: A tool used in bonding where the wire is fed to the bonding surface of the
tool through a bore located along the long axis of the tool.
458 Glossary of microelectronics and reliability terms
Centrifuge: Testing the integrity of bonds in a circuit by spinning the circuit at a high rate of
speed, thereby imparting a high g loading on the interconnecting wire bonds and bonded
elements.
Cermet: A solid homogeneous material usually consisting of a finely divided admixture of a
metal and ceramic in intimate contact.
Characterisation: A parametric, experimental analysis of the electrical properties of a given
IC; it investigates the influence of different operating conditions (supply voltage,
frequency, temperature, logic levels, etc.) on the IC's behaviour and delivers a cost-
effective test programme for incoming inspection. Characterisation testing is a key to
successful screening and incoming inspection testing.
Chip: 1) A single substrate on which all the active and passive circuit elements have been
fabricated using one or all of the semiconductor techniques of diffusion, passivation,
masking, photoresist, and epitaxial growth. A chip is not ready for use until packaged and
provided with external connectors. 2) A tiny piece of semiconductor material scribed or
etched from a semiconductor slice on which one or more electronic components are
formed. The percentage of usable chips obtained from a wafer is the yield.
Chip-scale package (CSP): package - introduced in 1994 - having a perimeter no more
than 1.2 times the perimeter of the die it contains. CSP combines the best features of bare
die assembly and traditional semiconductor packaging; it reduces overal system size,
something devoutly to be desired in portable electronic products. Unresolved issues
include reliability, thermal performance, design, materials, assembly test, shipping,
handling, and the CSP-system interaction. The length of the list reflects the newness of
the technology and the fact the few CSPs are as yet in production or use.
Clinch: A method of mechanically securing components prior to soldering, by bending that
portion of the component lead that extends beyond the lip of the mounting hole, against a
pad area.
Coefficient of thermal expansion: The ratio of the change in length to the change in
temperature.
Cold solder connection: A soldered connection where the surfaces being bonded moved
relative to one another while the solder was solidifYing, causing an uneven solidification
structure which may contain microcracks. Such cold joints are usually dull and grainy in
appearance.
Component: 1) A piece of equipment, a line, a section of line, or a group of items that is
viewed as an entity for purposes of reporting, analysing, and predicting outages. 2) An
essential functional part of a subsystem or equipment; it may be any self-contained
element with a specific function, or it may consist of a combination of parts, assemblies,
accessories, and attachments.
Component hazard (reliability data): The instantaneous failure rate of a component or its
conditional probability offailure versus time.
Compound (chemical): A substance consisting of two or more elements chemically united
in definite proportions by weight.
Conductive epoxy: An epoxy material (polymer resin) that has been made conductive by the
addition of a metal powder (usually gold or silver).
Conductivity: The ability of a material to conduct electricity. (The reciprocal of resistivity).
Confidence level: The probability (expressed as a percentage) that a given assertion is true
or that it lies within certain limits calculated from the data.
Confidence limits: Extremes of a confidence interval within which there is a designated
chance that the true value is included.
Glossary of microelectronics and reliability terms 459
Confidence test: A test primarily performed to provide a high degree of certainty that the
unit under test is operating acceptably.
Contact resistance: The apparent resistance between the terminating electrode and the body
of the device (the case of resistors or capacitors, for example).
Controllability: The possibility to modify internal signals at the outputs.
Corrective maintenance: 1) The maintenance carried out after a failure has occurred and
intended to restore an item to a state in which it can perform its required function. 2)
Maintenance carried out after recognition of a fault, intended to put an item back into a
state in which it can again perform its required function.
Critical charge: The amount of charge required to change the value stored in a memory cell.
Crosstalk: Signals from one line leaking into another nearby conductor because of
capacitive or inductive coupling or both.
Curie point: Above a critical temperature, ferromagnetic materials lose their permanent
spontaneous magnetisation and ferroelectric materials lose their spontaneous polarisation.
This critical temperature is the Curie point; there ferroelectric ceramic capacitors reach a
peak in capacitance.
Custom circuits: Circuits designed to satisfy a single application requirement.
Debug: To examine or test a procedure, routine, or equipment for the purpose of detecting
and correcting errors.
Debugging: The operation of an equipment or complex item prior to use to detect and
replace parts that are defective or expected to fail, and to correct errors in fabrication or
assembly.
Decoder malfunction: Inability to address a substantial part of the array due to an open
decoder line internal to the device, or a defective decoder.
Defect: 1) Any non-conformance of an item to specified requirements and that adversely
affects - or potentially affects - the quality of a device. 2) Nonfulfilment of an intended
usage requirement or reasonable expectation, essentially present at t = O.
Degradation: Change for the worse in the characteristics of an electric element because of
heat, high voltage, etc.
Dependability: Collective term used to describe the availability performance and its
influencing factors.
Depletion-mode transistor: An MOS transistor with a physically implanted channel that
conducts current at zero gate voltage.
Degradation: A gradual deterioration in performance as a function of time.
Derating: 1) The intentional reduction of stress-to-strength ratio in the application of an
item, usually for the purpose of reducing the occurrence of stress-related failures. 2) Non-
utilisation of the full load capability of an item with the intent to reduce the failure rate.
Design review: A formal documented, comprehensive, and systematic examination of a
design to evaluate the capability of the design to meet the requirements, to identify
problems, and propose solutions.
Dewetting: The condition in a soldered area in which liquid solder has not adhered
intimately and has pulled back from the conductor area.
Die (sometimes called chip): 1) A tiny piece of semiconductor material, broken from a
semiconductor slice, on which one or more active electronic components are formed.
(Plural: dice). 2) A portion of a wafer bearing an individual circuit or device cut or broken
from a wafer containing an array of such circuits or devices.
Dielectric breakdown: The breakdown of the insulation resistance in a medium under high
voltage.
460 Glossary of microelectronics and reliability terms
Dielectric loss: The power dissipated by a dielectric as the friction of its molecules opposes
the molecular motion produced by an alternating electric field.
Diffusion: The phenomenon of movement of matter at the atomic level from regions of high
concentration to regions oflow concentration.
DIN (Deutsche Industrie Normenausschuss): The abbreviation for the association in
Germany that determines the standards for electrical and other equipment in that country.
Similar to the American USAS.
Diode (semiconductor): 1) A semiconductor device having two terminals and exhibiting a
non-linear voltage-current characteristic. 2) A semiconductor device that has the
asymmetrical voltage-current characteristic exemplified by a single pn junction.
Direct chip attach: A method of forming the electrical connection from a die to a substrate
(supporting material) without the use of a package; it can be done either with wire bonds
or with flip-chip attach.
Discrete components: Individual components such as resistors, capacitors, and transistors.
Dissipation factor: Tangent of the dielectric loss angle. The ratio of the resistive component
of a capacitor (R.,) to the capacitive reactance (Xc) of the capacitor.
Doping: The addition of an impurity to a semiconductor to alter its conductivity.
Downtime: The period of time during which an item is not in a condition to perform its
intended function.
Dual-in-line pack (DIP): A package having two rows of leads extending at right angles
from the base and having standard spacings between leads and between rows of leads.
Dual in-line (DIL) package: A type of housing for integrated circuits. The standard form is
a moulded plastic container about 3/4 inch long and 1/3 inch wide, with two rows of pins
spaced 0.1 inch between centres.
Dynamic testing: Testing a hybrid circuit where reactions to ac (especially high frequency)
are evaluated.
Early failures: Often due to randomly distributed weaknesses in material or in item's
process (assembling, soldering, etc.), the early failures should be distinguished from
systematic failures (which are deterministic and are caused by an error or a mistake, and
whose elimination requires a change in the design, production process, operational
procedure, documentation, or other). The length of the early failure period varies between
some days and few thousand hours.
Effectiveness: The capability of the system or device to perform its function.
Engineering, reliability: The science of including those factors in the basic design that will
ensure the required degree of reliability.
Enhancement-mode transistor: An MOS transistor that creates a channel for minority
carriers by applying a gate voltage to drive out the majority carriers.
Environment: 1) The universe within which the system must operate. All the elements over
which the designer has no control and that affect the system or its inputs and outputs. 2)
The physical conditions which a component may be exposed to during storage or
operation. Environment usually covers climatic, mechanical, and electrical conditions.
Environmental stress screening (ESS): Test (or set of tests) intended to remove defective
items, or those likely to exhibit early failures.
Environmental test: A test (or series of tests) used to determine the sum of external
influences affecting the structural, mechanical, and functional integrity of any given
package or assembly.
Equipment: A general term including material, fittings, devices, appliances, fixtures,
apparatus, machines, etc. used as a part of - or in connection with - an electrical
installation.
Glossary of microelectronics and reliability terms 461
Exponential failure distribution: This is the failure distribution of a group of parts that
have a constant failure rate. After one fails, the probability is the same that the remaining
parts will survive the same length of time. The exponential curve results because of the
diminishing quantity remaining in the given group of parts.
Extrinsic failures: In essence, all non-intrinsic failures.
Extrinsic failure mechanisms: Mechanisms resulting from the device packaging and
interconnection process (the "back-end") of semiconductor manufacturing. As techno-
logies mature and problems in the manufacturers' fabrication lines are ironed out,
intrinsic failures are reduced, thereby making extrinsic failures all the more important to
device reliability.
Failure: 1) The termination of the ability of an item to perform a required function. 2) A part
that no longer meets its performance criteria. Failures include devices that have
drastically failed as well as components that ftmction, but are out of specification.
Failure analysis: 1) The logical, systematic examination of an item or its diagram(s) to
identifY and analyse the probability, causes, and consequences of potential and real
failures. 2) The analysis of a circuit to locate the reason for the failure of the circuit to
perform to the specified level.
Failure, catastrophic: Failure that is both sudden and complete.
Failure criteria: Limiting conditions, relating to the admissibility of the deviation from the
characteristic value due to changes after the beginning of stress.
Failure, complete: Failure resulting from deviations in characteristic(s) beyond specified
limits such as to cause complete lack of the required ftmction. The limits referred to in
this category are specified for this purpose.
Failure, critical: Failure which is likely to cause injury to persons or significant damage to
material.
Failure, degradation: Failure which is both gradual and partial. Note. In time, such a failure
may develop into a complete failure.
Failure, dependent: A failure which is caused by the failure of an associated item,
distinguished from independent failure.
Failure distribution: The distribution of failures plotted as a function of time. This is
usually plotted for a particular group of parts operating in a particular environment.
Failure, gradual: Failures that could be anticipated by prior examination or monitoring.
Failure in time (FIT): One failure in 109 device operating hours.
Failure, independent: A failure which occurs without being related to the failure of
associated items, distinguished from dependent failure.
Failure, inherent weakness: Failure attributable to weakness inherent in the item when
subjected to stresses within the stated capabilities of the item.
Failure, intermittent: Failure of an item for a limited period of time, following which the
item recovers its ability to perform its required ftmction without being subjected to any
external corrective action. Note: Such a failure is often recurrent.
Failure, major: Failure - other than a critical failure - which is likely to reduce the ability
of a more complex item to perform its required ftmction.
Failure mechanism: The basic chemical, physical or other process that result in failure (a
catastrophic, degradation, or intermittent failure).
Failure, minor: Failure - other than a critical failure - which does not reduce the ability of a
more complex item to perform its required ftmction.
Failure, misuse: Failure attributable to the application of stresses beyond the stated
capabilities of the item.
462 Glossary of microelectronics and reliability terms
Failure mode: The effect (the symptom) - the local effect - by which a failure is observed
(for example a catastrophic, degradation, or intermittent failure, usually in the form of
opens, shorts, functional faults, or parameters out of specification - for electronic
components - and brittle rupture, creep, cracking, etc. - for mechanical components).
Failure, nonrelevant: Failure to be excluded in interpreting test results or in calculating the
value of a reliability characteristic. Note: The criteria for the exclusion should be stated.
Failure, partial: Failure resulting from deviation in characteristic(s) beyond specified limits,
but not such as to cause complete lack of the required function. Note: The limits referred
to in this category are special limits specified for this purpose.
Failure, primary: Failure of an item, not caused either directly or indirectly by the failure of
another item.
Failure, random: Any failure whose cause and/or mechanism make its time of occurrence
unpredictable, but which is predictable only in a probabilistic or statistical sense.
Failure rate: The rate at which devices from a given population can be expected (or were
found) to fail as a function of time.
Failure rate (A): 1) The number offailures of an item per unit measure of life (cycles, time,
etc.); during the useful life period, the failure rate A is considered constant. 2) Limit for &
~ 0 of the probability that an itemwill fail in the time interval (t, t + &], given that the
item was new at t = 0 and did not fail in the interval (0, f], divided by &.
Failure rate, constant: After infant failures have been removed from a group of parts,
failures that occur in a completely random fashion will result in a constant failure rate. If
the events are random, one failure does not influence the probability of future failures.
Failure rate, observed (for a stated period in the life of an item): The ratio of the total
number of failures in a sample to the cumulative observed time on that sample. The
observed failure rate is to be associated with particular and stated time intervals (or
summation of intervals) in the life ofthe items, and with stated conditions.
Failure, relevant: Failure to be included in interpreting test result or in calculating the value
of a reliability characteristic. Note: The criteria for the inclusion should be stated.
Failure, secondary: Failure of an item caused either directly or indirectly by the failure of
another item.
Failure, sudden: Failure that could not be anticipated by prior examination or monitoring.
Failure, wearout: A failure that occurs as a result of deterioration processes or mechanical
wear and whose probability of occurrence increases with the time.
Fatigue: The weakening of a material under repeated stress.
Fault: A physical condition that causes a device, a component, or an element to fail to
perform in a required manner, for example, a short-circuit, a broken wire, an intermittent
connection.
Fault tree analysis (FT A): 1) Analysis to determine which fault modes of the elements of
an item and/or which external events may results in a stated fault mode of the item,
presented in the form of a fault tree. 2) FTA is a systems engineering technique which
provides an organised, illustrative approach to the identification of high risk areas.
Field-reliability test: A reliability compliance or determination test made in the field where
the operating and environmental conditions are recorded and the degree of conformity
founded.
Field-effect transistor: A transistor in which current carriers (holes or electrons) are
injected at one terminal (the source) and pass to another (the drain) through a channel of
semiconductor material whose resistivity depends mainly on the extent to which it is
penetrated by a depletion region.
Glossary of microelectronics and reliability terms 463
Hot spot: A small area on a circuit that is unable to dissipate the generated heat and
therefore operates at an elevated temperature above the surrounding area.
Hybrid circuit: A microcircuit consisting of elements which are a combination of the film
circuit type and the semiconductor circuit type, or a combination of one or both of these
types and many include discrete add-on components.
Imbedded layer: A conductor layer having been deposited between insulating layers.
Impart diode: Device whose negative characteristic is produced by a combination of impact
avalanche breakdown and charge-carrier transit-time effects. Avalanche breakdown
occurs when the electric field across the diode is high enough for the charge carriers
(holes or electrons) to create electron-hole pairs. With the diode mounted in an
appropriate cavity, the field patterns and drift distance permit microwave oscillations or
amplification.
Infant mortality failures: A characteristic pattern of failure - sometimes experienced with
new equipments which may contain marginal components - wherein the number of
failures per unit of time decreases rapidly as the number of operating hours increases. A
bum-in period may be utilised to age (or mature) an equipment to reduce the number of
marginal components.
Inherent defects: The underlying cause of intrinsic failures, in the useful life period.
Integrated circuit: A microcircuit (monolithic) consisting of interconnected elements
inseparably associated and formed in situ on or within a single substrate (usually silicon)
to perform an electronic circuit function.
Intermetallic bond: The ohmic contact made when two metal conductors are welded or
fused together.
Intraconnections: Those connections of conductors made within a circuit on the same
substrate.
Intrinsic failure mechanisms: mechanisms inherent to the semiconductor die itself,
including crystal defects, dislocations, and processing defects.
Intrinsic reliability: The reliability a system can achieve based on the types of devices and
manufacturing processes used
Ion migration: The movement of free ions within a material or across the boundary between
two materials under the influence of an applied electric field.
Item: 1) An all-inclusive term to denote any level of hardware assembly: that is system,
segment of a system, subsystem, equipment, component, part, etc. 2) Any level of
hardware assembly - system, equipment, component, part, and so on.
Item, non-repaired: An item that is not repaired after a failure.
Item, repaired: An item that is repaired after a failure.
Junction temperature: The temperature of the region of transition between the p- and n-
type semiconductor material in a transistor or diode element.
Kirkendall voids: The formation of voids by diffusion across the interface between two
different materials, in the material having the greater diffusion rate into the other.
Lands: Widened conductor areas on the major substrate used as attachment points for wire
bonds or the bonding of chip devices.
Laser bonding: Effecting a metal-to-metal bond of two conductors by welding the two
materials using a laser beam for a heat source.
Latch-up: A condition of a CMOS IC in which parasitic bipolar transistors are switched on,
drawing large currents that may destroy the device.
Latch-up tests simulate voltage overstresses on signal and power supply lines as well as
power-on / power-off sequences.
Glossary of microelectronics and reliability terms 465
Latent defect: Defect which will escape the normal quality control procedures; it requires
component stressing in order to be detected by inspection at the propagated failure level.
Lead frames: 1) The metallic portion of the device package that completes hybrid circuit
elements to the outside world. 2) A sheet metal framework to which a chip is attached,
wire-bonded, and the molded with plastic.
Leakage current: An undesirable small stray current which flows through or across an
insulator between two or more electrodes, or across a reverse-biased junction.
Leakage, input and output: Excessive leakage currents above specified limits.
Life cycle costs (LCC): Sum of the costs for acquisition, operation, maintenance, and
disposal or recycling of an item.
Life test: Test of a component or circuit under load over the rated life of the device.
Lifetime: Time span between initial operation and failure of a nonrepairable item.
Linear energy transfer: The energy per unit length transferred from an ionising particle to a
solid as the particle passes through it.
Human factors: A body of scientific facts about human characteristics. The term covers
biomedical and psycho-social considerations in the areas of human engineering, per-
sonnel selection, training, life support, job performance aid, and human performance
evaluation.
Maintainability: 1) A characteristic of design and installation expressed as the probability
that an item will be retained in or restored to a specified condition within a given period
of time, when the maintenance is performed in accordance with pre-scribed procedures
and resources. 2) Probability that preventive maintenance or repair of an item will be
performed within a stated time interval for given procedures and resources.
Maintenance: The combination of all technical and corresponding administrative activities
intended to retain an item, or restore it, to a specified state. Maintenance is thus
subdivided into preventive (carried out at predetermined intervals and according to
prescribed procedures, to reduce the probability of failures or the degradation of the
functionality of the item), and corrective (carried out after fault recognition and intended
to bring the item into a state in which it can again perform the required function).
Man-function: The function allocated to the human component of a system.
Mask: The photographic negative that serves as the master for making patterns.
Mathematical expectation (expected value) of a probability distribution:
w ro
E(x) = / xf(x)dx for a continuous probability, and I xJ(xJ for a discrete distribution.
i=]
Mean maintenance time: The total preventive and corrective maintenance time divided by
the number of preventive and corrective maintenance actions, during a specified period of
time.
Mean time between failures (MTBF): For a particular interval, the total operating life of a
population of an item divided by the total number of failures within the population,
during the measurement involved.
Mean time between maintenance (MTBM): The mean of the distribution of the time
intervals between maintenance actions (preventive, corrective or both).
Mean time to repair (MTTR): The total corrective maintenance time divided by the total
number of corrective maintenance actions during a given period oftime.
Metallisation: A film pattern (single or multilayer) of conductive material deposited on a
substrate to interconnect electronic components, or the metal film on the bonding area of
a substrate which becomes a part·· of the bond and perfonns both an electrical and a
mechanical function.
466 Glossary of microelectronics and reliability terms
Microcracks: A thin crack in a substrate or chip device, that can only be seen under
magnification and which can contribute to latent failure phenomena.
Micro-via: Miniature holes (up to 6!!m in diameter) for connections between different layers
of a multilayer printed-circuit board.
Migration: An undesirable phenomenon whereby metal ions, notably silver, are transmitted
through another metal, or across an insulated surface, in the presence of moisture and an
electrical potential.
Mil: A unit equal to O.OOlinch or 0.0254mm.
Mission profile: Specific task which must be fulfilled by an item during a stated time under
given conditions.
Mother board: A circuit board used to interconnect smaller circuit boards called "daughter
boards".
Multichip module: An electronic package that contains more than one die.
Multilayer substrates: Substrates that have buried conductors so that complex circuitry can
be handled.
Noise: Random small variations in voltage or current in a circuit due to the quantum nature
of electronic current flow, thermal considerations, etc.
Nonconformity: Nonfulfilment of a specified requirement.
Observability: The possibility to check internal signals at the outputs.
Observed reliability of non-repaired items (for a stated period of time): The ratio of the
number of items which performed their functions satisfactorily at the end of the period to
the total number of items in the sample at the beginning of the period. The criteria for
what constitutes satisfactory function shall be stated.
Operating conditions: The loading or demand cyclic operation, or both, of an item between
zero and 100% of its rated capability(ies).
Operating time: The period of time during which an item performs its intended function.
Operational reliability (software): The reliability of a system or software subsystem in its
actual use environment. Operational reliability may differ considerably from reliability in
the specified or test environment.
Optoelectronics: Technology dealing with the coupling of functional electronic blocks by
light beams.
Overlay: One material applied over another material.
Package: The container for an electronic component with terminals to provide electrical
access to the inside of the container.
Pad: 1) A device inserted into a Circuit to introduce transmission loss or to match
impedances. 2) A metal electrode that is connected to the output of a diathermy machine
and placed on the body over the region being treated.
Passivation (corrosion): 1) The process(es) (physical or chemical) by means of which a
metal becomes passive. 2) A process in which a dielectric material is diffused over the
entire wafer to provide mechanical and environmental protection for the circuits. Also
called glassivation.
Pattern sensitivity: The device response varies with the test pattern, reflecting differences in
address and/or data sequences; may also reflect timing and voltage specifications being
too close to actual failure regions.
Photoresist: A photosensitive plastic coating material which, when exposed to UV light,
becomes hardened and is resistant to etching solutions. Typical use is as a mask in
photochemical etching ofthin films.
Pitch (plastic packages): Separation between adjacent conductors.
Glossary of microelectronics and reliability terms 467
Planar technique: The formation of p-type and/or n-type regions in a semiconductor crystal
by diffusing impurity atoms into the crystal through holes in an oxide mask, which is on
the surface. The latter is left to protect the junctions so formed against surface
contamination.
Pinhole: Small holes occurring as imperfections which penetrate entirely through film
elements, such as metallisation films or dielectric films.
Power dissipation: The dispersion of the heat generated from a film circuit when a current
flows through it. (Pd = dissipated power)
Preconditioning: A stress test (or combination of stress tests) applied to devices (i. e. high
temperature storage, operating life, storage life, blocking life, humidity life, HTRB,
temperature cycles, mechanical sequence - which includes solderability, etc.), after which
a screening criteria is applied to separate good units from bad ones. This criteria may be
any combination of absolute value and parameter shift levels agreed to by the parties
involved.
Preform: To aid in soldering or adhesion, small circles or squares of the solder or epoxy are
punched out of thin sheets. These preforms are placed on the spot to be soldered or
bonded, prior to the placing of the object to be attached.
Probability density function: The first derivative of the probability distribution function; it
represents the probability of obtaining a given value.
Product assurance: All planned and systematic activities necessary to reach specified
targets for the reliability, maintainability, availability, and safety of an item, as well as to
provide adequate confidence that the item will meet given requirements for quality.
Product liability: Responsibility on a manufacturer (or others) to compensate for losses
related to injury to persons, material damage, or other unacceptable consequences caused
by a product.
Pull test: A test for bond strength of a lead, interconnecting wire, or a conductor.
Purple plague: One of several gold-aluminium compounds (very brittle, potentially leading
to time-based failure of the bonds), formed when bonding gold to aluminium and
activated by re-exposure to moisture and high temperature (> 340°C).
Quality: 1) A measure of the degree to which a device conforms to applicable specification
and workmanship standards. 2) Totality of features and characteristics of an item (product
or service) that bear on its ability to satisfy stated or implied needs.
Quality assurance: All planned and systematic activities necessary to provide adequate
confidence that an item (product or service) will satisfy given requirements for quality.
Quality, average outgoing: The ultimate average quality of products shipped to the
customer that results from composite sampling and screening techniques.
Quality defect: A defect which may be found by employing normal quality control
inspection equipment and procedures, without stressing the component.
Quality test: Test to verifY whether an item conforms to specified requirements.
Rad(Si): A unit of energy absorbed by silicon from radiation, equivalent to O.OlJ/kg.
Randomuess: The occurrence of an event in accordance with the laws of chance.
Redundancy: In an item, the existence of more than one means of performing its function.
Redundancy, active: 1) That redundancy wherein all means for performing a given function
are operating simultaneously. 2) That redundancy wherein all redundant items are
operating simultaneously rather than being switched on when need.
Redundancy, standby: That redundancy wherein the alternative means of performing the
function is inoperative until needed and is switched on upon failure of the primary means
of performing the function.
468 Glossary of microelectronics and reliability terms
Retlow soldering: A method of soldering involving application of solder prior the actual
joining. To solder, the parts are joined and heated, causing the solder to remelt, or retlow.
Refresh sensitivity: Dynamic RAM fail to retain data reliability during the specified
minimum interval between refresh cycles. Failure are due to excessive voltage or current
leakage from the storage element or fault in the rewrite circuits.
Relay: An electromechanical device in which contacts are opened and/or closed by
variations in the conditions of one electric circuit and thereby affect the operation of other
devices in the same or other electric circuits.
Reliability: Collective name for those measures of quality that reflect the effect of time in
storage or use of a product, as distinct from those measures that show the state of the
product at the time of delivery.
In the general sense, reliability is defined as the ability of an item to perform a required
function under stated conditions for a stated period of time.
Reliability assurance: The management and technical integration of the reliability activities
essential in maintaining reliability achievements, including design, production and
product assurance.
Reliability data: Data related to the frequency of failure of an item, equipment, or system.
These data may be expressed in terms ofJailure rate, MTBF, or probability ojsuccess.
Reliability engineering (design for reliability): The establishment, during design, of an
inherently high reliability in a product.
Reliability growth: A condition characterised by a progressive improvement of the
reliability of an item with time, through successful correction of design or production
weaknesses.
Reliability growth testing: The improvement process during which hardware reliability
increases to an acceptable level.
Reliability, inherent: The potential reliability of an item present in its design.
Reliability, intrinsic: 1) The probability that a device will perform its specified function,
determined by statistical analysis of the failure rates and other characteristics of the parts
and components the device comprises. 2) The reliability a system can achieve based on
the types of devices and manufacturing processes used
Reliability test: Test and analyses carried out in addition to other type tests and designed to
evaluate the level of reliability in a product, etc. as well as the dependability, or stability,
ofthis level relative to time and use under various environmental conditions.
Replaceability: A measure of the degree of which an item will be replaced within a given
time under specified conditions.
Required function: Function (or combination of functions) of an item which is considered
necessary to provide a given service.
Resist: A protective coating that will keep another material from attaching itself or coating
something, as in solder resist, plating resist, or photoresist.
Resistance: A property of conductors which - depending on their dimensions, material and
temperature - determines the current produced by a given difference of potential; that
property of substance which impedes current and results in the dissipation of power in the
form of heat.
Risk: The probability of making the wrong decision based on pessimistic data or analysis.
SA (Selective Availability): Encryption of P-code signal from GPS satellites, usually by
dithering the frequency, to deny unauthorised users access to precise positioning.
Safety: 1) The conservation of human life and its effectiveness, and the prevention of
damage to items, consistent with mission requirements. 2) Ability of an item to cause
Glossary of microelectronics and reliability terms 469
Storage life: The length of time an item can be stored under specified conditions and still
meet specified requirements.
Stress: Any influence or a part of the influences to which an item is exposed to at a certain
instant.
Stress-accelerated corrosion: Corrosion that is accelerated by stress.
Stress, component: The stresses on component parts during testing or use that affect the
failure rate and hence, the reliability of the parts; (voltage, power, temperature, and
thermal environmental stress are included).
Substrate: 1) The supporting material on or in which the parts of an integrated circuit are
attached or made. The substrate may be passive (thin film, hybrid) or active (monolithic
compatible). 2) A material on the surface of which an adhesive substance is spread for
bonding or coating; any material which provides a supporting surface for other materials,
especially materials used to support printed-circuit patterns.
Survivability: The measure of the degree to which an item will withstand hostile man-made
environments and not suffer abortive impairment of its ability to accomplish its
designated mission.
System: Aggregate of components, assemblies, and subsystems, as well as skills and
techniques, capable of performing and/or supporting autonomously an operational role.
System effectiveness: A measure of the degree to which an item can be expected to fulfil a
set of specified mission requirements, which may be expressed as a function of
availability, dependability, and capability.
Systems engineering: Application of the mathematical and physical sciences to develop
systems that utilise resources economically for the benefit of society.
Testability: Procedure that includes the degrees of failure detection and isolation, the
correctness of the results, and the test duration. It is achieved by improving observability
and controllability.
Test of failure: The practice of inducing increased electrical and mechanical stresses in
order to determine the maximum capability of a device so that conservative use in
subsequent applications will thereby increase its life through the derating determined by
these tests.
Thermal noise: Noise that is generated by the random thermal motion of charged particles
in an electronic device.
Thermal resistance: A measure for the ability of an interface to evacuate the heat (e. g. RtJ,o-
a) = thermal resistance junction/ambient).
Time, mission: That element of uptime during which the item is performing its designated
mission.
Time, up (uptime): The element of active time during which an item is either alert, reacting,
or performing a mission.
Time, down (downtime): That element of time during which the item is not in condition to
perform its intended function.
TO package: Abbreviation for transistor outline, established as an industry standard by
JEDEC of the EIA.
Transfer molding: An automated type of compression molding in which a preform of
plastic is melted and forced into a hot mold cavity.
Underencapsuiant: The plastic material that is dispensed between a flip-chip and package
in liquid form and then thermally cured to provide mechanical and environmental
protection to the active surface of a die.
Uptime ratio: The quotient of uptime divided by uptime plus downtime.
Glossary of microelectronics and reliability terms 471
Useful life: Total operating time of an item, ending for a nonrepairable item when the failure
probability becomes too high or the item's functionality is obsolete, and for repairable
item when the intensity of failures becomes unacceptable or when a fault occurs and the
item is considered to be no longer repairable.
Varistor: A two-electrode semiconductor device with a voltage-dependent non-linear re-
sistance that drops markedly as the applied voltage is increased.
Vibration: An oscillation wherein the quantity is a parameter that defines the motion of a
mechanical system.
WAAS (Wide-Area Augmentation Service): A service that uses geostationary satellites
and a network of ground stations to compute GPS integrity and differential correction
information and to transmit that data to mobile receivers.
Wafer: A thin semiconductor slice (of silicon, germanium or GaAs) with parallel faces on
which matrices of microcircuits or individual semiconductors can be formed. After
processing, the wafer is separated into dice or chips containing individual circuits.
Wearout: The process of attrition that results in an increase of hazard rate with increasing
age (cycles, time, miles, events, and so on as applicable for the item). Wearout and/or
fatigue could perhaps be explained with the theory of the limiting distribution (extreme-
value theory): a mechanical piece would fail if any single ("weak") spot fails.
Wearout failure: A failure caused by a mechanism that is related to the physics of a device,
its design, and process parameters. Wearout failures should be distinguished from random
failures, which are associated with the variability of workmanship quality.
Whisker: A very small, hairlike metallic growth (a micron size single crystal with a tensile
strength of the order of one million psi) on a metallic circuit component.
Wire bond: Includes all the constituent components of a wire electrical connection such as
between the terminal and the semiconductor.
Wire bonding: The method used to attach very fine wire to semiconductor components to
interconnect these components with each other or with package leads.
Yield: The ratio of usable components at the end of a manufacturing process to the number
of components initially submitted for processing.
Zener breakdown: A breakdown that is caused by the field emission of charge carriers in
the depletion layer.
Zener diode: A class of silicon diodes that exhibit in the avalanche-breakdown region a
large charge in reverse current over a very narrow range of reverse voltage.
Zener effect. A reverse-current breakdown due to the presence of a high electric field at the
junction of a semiconductor or insulator.
Sources
7 Brewer, R. (1972): Reliability terms and definitions based on the conceptual relationship between
reliability and quality. Microelectronics and Reliability, vol. II, pp. 435-461
8 Caillat, 1. (1976): Contribution au test des CI logiques. These, Universite de Grenoble
9 Calabro, S. R. (1962): Reliability principles and practices, Appendix I, McGraw-Hill, New York
10 CEI-Publication 134, (1961)
11 CEI-Publication 147-0, (1966)
12 CEl-56 (Bureau central) 62
13 CEI:56 (Secr.)84
14 DIN 40040
15 DIN 40041
16 DIN 40042
17 DIN 40043
18 EOQC Glossary
19 Glossary of terms (1982). International Society for Hybrid Microelectronics
20 Graf, R F. (1977): Modern Dictionary of Electronics. Howard W. Sams & Co., Inc., Indianapolis,
Indiana 46268 U. S. A.
21 Greene, A. E.; Bourne, A. 1. (1972): Reliability technology. Wiley Interscience, London, pp. 622-
627
22 Harper, C. A. (editor in chief) (1991): Electronic packaging and interconnection handbook. McGraw-
Hill Inc., New York
23 IEC Publication 271
24 IEEE Standard 352 (1975)
25 ISO (1977-07-01): Norme internationale 3534 - Statistique -Vocabulaire et symboles.
26 Jay, F. (editor in chief) 1984: IEEE Standard Dictionary of Electrical and Electronics Terms. The
Institute of Electrical and Electronics Engineers, Inc., New York, NY
27 Jensen, E.; Schneider, B. (1979): Characterization ofrandom access memories. ECR-93
28 Jones, R D. (1982): Hybrid Circuit Design and Manufacture. M. Dekker, New York and Basel
29 Lyon-Caen, R; Crozet, 1. M. (1977): Microprocesseurs et microordinateurs. Masson, Paris
30 Metzger, G.; Vabre, 1. P. (1974): Les memoires electriques. Masson, Paris
31 MIL-STD 7218
32 Naresky, 1. J. (1958): RADC reliability notebook; glossary. McGraw-Hill, New York
33 Neufang, O. (1983): Lexikon der Elektronik. Braunschweig, Wiesbanden
34 Reiche, H. (1972): Reliability definitions. Microelectronics and Reliability, vol. 11, pp. 425-427
35 Ryerson, C. M. (1957): Glossary and dictionary of terms and defmitions relating specifically to re-
liability. Third national symposium for reliability and quality control. Washington D. C., 15th
January, Papers, pp. 59-84
36 Thompson, P. (1997): Chip-scale packaging. IEEE Spectrum, August, pp. 36-43
37 Tummala, R. R, Rymaszewski, E. 1. (1989): Microelectronics packaging handbook .. Van Nostrand
Reinhold, New York
38 UlT - Repertoire des definitions des termes essentiels utilises dans Ie domaine des telecommuni-
cations; partie 1; 2eme impression, Geneve, 1961
List of abbreviations
eV electron volts
F lllumination. Total luminous flux incidents on a receiver,
normally in lumens
FAB-MS Fast Atom Bombardment-Mass Spectroscopy
FAMOS Floating-gate Avalanche-injection Metal-Oxide Semiconductor
FEM Field Electron Microscopy
FET Field-Effect Transistor
FIM Field Ion Microscopy
FIT Failure In unit Time (I failure per 109 device hours)
FLOTOX FLOating Oxide
Ga Gallium
GaAs Gallium Arsenide
GDMS Glow Discharge Mass Spectrometry
GDOES Glow Discharge Optical Emission Spectrometry
GIDL Gate-Induced Drain Leakage
H Irradiance. Radiant flux density incident on a receiver, usually
in watts per unit area
HAST Highly Accelerated Stress Test
HDBH High Day Busy Hour
HE Effective irradiance. The irradiance perceived by a given
receiver, usually in effective watts per unit area
HEIS High-Energy Ion Scattering
HEMT High Electron Mobility Transistors (MODFET)
hFE Current gain of a transistor biased common emitter. The ratio
of collector current to base current at a specified bias conditions
HMOS High-performance, n-channel silicon gate MOS
HTOT High-Temperature Operating Tests
HTRB High Temperature Reverse Bias operating life test current
IB Transistor base current
IBSCA Ion Bombardment Surface Chemical Analysis
IC Integrated Circuit
IC Transistor collector current
ID Dark current. The leakage current of an unilluminated
photodetector
Transistor emitter current
Forward bias current, usually of IRED.
Subscripts denote measurement or stress bias conditions, if
required
lETS Inelastic Tunneling Spectroscopy
IFR Increasing Failure Rate
IL Light current. The current through an illuminated
photodetector, at specified bias conditions
ILEED Inelastic Low-Energy Electron Difrraction
ILS lonisatin-Loss Spectrometry
IMMA Ion Microprobe Mass Analysis
IR Infrared
IRAS Infrared Absorption Spectrometry
IRED Infrared emitting diode
ISS Ion Scattering Spectrometry
List of abbreviations 477
Si Silicon
SILOX SILicon Oxide
SIMS Secondary-Ion Mass Spectrometry
Si)N4 silicon nitride
Si0 2 silicon dioxide
Sn tin
SNMS Sputtered Neutral Mass Spectrometry
SOA Safe Operating Area
SOl Silicon On Insulator
SOS Silicon On Sapphire
SQPA Software Quality Program Analysis
SRAM static RAM
SRQAC Software Reliability and Quality Acceptance Criteria
SSI Small Scale Integration
SSMS Spark Source Mass Spectrometry
STEM Scanning Transmission Electron Microscopy
SXAPS Soft X-ray Appearance Potential Spectroscopy
SWC Solderless Wire Wrap Connecting
T temperature (OC or K)
t Time
TA Ambient temperature
TC Case temperature
TEELS Transmission Electron Energy-Loss Spectrometry
TEM Transmission Electron Microscope
TEM-ED Transmission Electron Microscope - Electron Diffraction
THB Temperature Humidity Bias
THDBH Ten High Day Busy Hour
TIR Testing-In Reliability
TJ Junction temperature
TO Transistor Outline
TRXRFA Total Reflection X-ray Fluorescence Analysis
TTL Transitor-Transitor Logic
TTL-LS Transistor-Transistor Logic - Low power Schottky
TTS Transitor-Transitor logic Schottky barrier
UCL Upper Confidence Level
UJT UniJunction Transistor
UL Underwriters Laboratories
ULSI Ultra Large Scale Integration
UPS Ultraviolet Photoelectron Spectrometry
UV UltraViolet
V Voltage / Volts
VLSI Very Large Scale Integration
VMOS V-groove MOS / Vertical MOS
VPE Vapor Phase Epitaxy
VT threshold voltage
W Radiant emittance
X-ray energetic high-frequency electromagnetic radiation
XAES X-ray-induced Auger Electron Spectrometry
XPD X-ray Photoelectron Diffraction
480 List of abbreviations
References
accelerated testing 41, 42, 74, 91, 180, bipolar transistors 171, 172, 173, 188, 195
184, 186, 188, 189, 205, 219, 220, 257, bistable noise in operational amplifiers 329
293,317,343,347,356,357,415,424 bit defects 350
accelerated thermal stress 221 block refresh 300
acceleration factors 221, 351, 352, 358, bonding strength 272
366 bonding techniques 272, 276
acceleration stress 76 breakdown voltage 110, 117, 119, 131,
accidental failures 16 135, 173, 176
Acquisition Reform 340, 357 breakthrough 419
acquisition reform 41, 42, 48 brittle fracture 229
activation energy 76, 221, 222, 226, 227, bubble test 210
229,244,423 building-in reliability 41, 42, 46, 282
active tests 346 bulk resistivity 271,272
AES (Auger Electron Spectroscopy) 284 buried oxide 291
ageing models 223 burn-in 42, 49, 52, 54, 55-58, 62-65, 82,
ageing of substrate 269 89,203,209,230,233,234,236,237,244,
ageing problem 318, 327 261,283,285, 301, 310, 322, 324, 366,
aggressive liquids 342 367,375,420,423,424
alpha particles 292 capacitor-chip 264
alphanumeric display 313 capacitors 416
aluminium conductor 227 CAS (Colunm Address Strobe) 300
aluminium electrolytic capacitors 105, 125 catastrophic failures 161-163, 221, 222,
analysis 176, 186, 191 283,308,320,323
ANOVA method 52 cathodic spraying technique 259
AQL (Acceptable Quality Level) 233 cause-effect diagrams (Ishiqawa) 52
Arrhenius model 64, 71, 76, 139, 165, 166, causes of failure 205
192,207,219,226,227,316 CCD (Charge Coupled Devices) 290
assembly process 358 cell type 289
ATE (Automatic Test Equipment) 299 ceramic substrate 247, 250, 252, 258, 267,
Au-AI bond failures 423 . 315
Auger electron spectroscopy 390 CERDIP (ceramic dual-in-line packages)
automatic surface mounting 287 309,355
automatic wafer processing 287 challenge-response 64
automotive environment 310 channel degradation 295
availability 83, 86 characterisation test 23, 298
average lifetime, 349 charge induced failures 301
average quality 294 charge injection 423
average value, 334 charge loss 296, 297, 315
charge pumping 292
baking process 247 charging phenomenon 225
beam-leads 255 chemical means, 388
binomial probability function 7 cleanliness 283
bipolar chips 273 climatic tests 128
bipolar IC 215, 218, 221, 226, 229, 230, clock rates 289
233,234,238,240,241 C~OS 218,283,290,291,295,307,312,
bipolar technologies 282,294,295, 308 313
502 Index
early failures 9, 53-55, 57, 60, 64, 91, 99, failure mechanisms 50, 52-55,60, 63, 64,
102, 103, 105, 141, 164, 187, 229-231, 68-71, 74, 81, 82, 84, 88, 148, 158, 160,
233,234,243 182, 183, 186, 194, 197, 203, 205, 207,
early life test 316 219, 220-225, 228, 230-232, 234, 236,
EAROM (electrically alterable ROM) 290 239, 244, 261, 263, 276, 281, 294, 304,
ECL289 341, 350, 356-358, 381-383, 387, 390,
economic considerations 367 393
electric field 350, 351 failure mode 162, 183, 184,341,344,346,
electrical characteristics 343, 350 350,357,381,415,419,421,422
electrical measuring 381 failure probability 419
electrical overstress 294 failure rate 2, 8-l3, 18-24,29-31,35,36,
electrical stress 159,231 67,91,93,102-106,109,111,112,115,
electrical tests 366 123, 126, 127, 145-150, 154, 157, 158,
electrochemical stability 365, 366 161, 162, 165-167, 171, 179, 180, 187,
electromigration 216, 223, 227, 231, 232, 189-192, 206, 215, 219, 220, 224, 230,
243,244,297 236, 238, 252, 255, 256, 261, 276, 297,
electron charge 211 327, 339-341, 346, 349, 352-354, 358,
electron collisions 227 415
electron microprobe 284 failure rate prediction 295, 413
electron probe microanalysis 390 failure risk 84
electronic systems 3, 21, 24, 30, 38, 39, 41, failure types 16, l32, 254, 418
42,233 FAMOS technology 296
encapsulation 262, 263 fatigue 100, 228
energy barrier 226 ferroelectrics 279
environment 2,18,19,41 FET (Field Effect Transistors) 174
environmental conditions 55, 65, 66, 74 fibre optic 314
environmental reliability testing 42, 65 , field data 414
96, 102, 106, 107, 109, 111, 112, 115, 123, field-effect characteristic 325
124, 132, 138, 139, 141 final control 233
epoxy resins 248, 273,275,341,347,315, final electrical tests 264
323 fine adjustment of resistors 270
EPROM (erasable PROM) 290 fine leak test 59, 302
equiped cards 417 flash memories 281
equipped card control 233 flat pack packages 348
erase cycles 281, 296 flaws 363
error 364, 366, 369,370, 373 flicker noise 329, 330, 333-335, 337
ESD (Electrostatic Discharge) 223, 229, FMEA (Faill;lre mode and effect analysis)
283,294,295,298,301,303 415
evaporation 247, 248, 250 FMEAlFMECA method 30
excess low-frequency noise 329 FMECA (Failure mode, effect and
excess noise 330, 336, 338 criticality analysis) 415
exponential failure distribution 261,358 FTA (Fault Tree Analysis) 32
extrinsic failure mechanisms 315 functional test 229, 283, 303, 304
Eyring model 231 functional testers 376
functionning 68
fabrication cycle 367 fuzzy logic 60, 61, 75, 87, 90
fabrication defects 167
face bond 276 GaAs FET chip 273
failure 1,2,4, 7-26, 29-38, 41, 363-365, GaAs LED 317
367,375,377 generation ofhole-electron pairs 298
failure analysis 171, 181, 183, 184, 186, generation-recombination noise 329
224, 282, 284, 294, 303, 310, 346, 353, getter 226
381-386,389-393 GIDL (gate-induced drain leakage) 291
failure cause 381,383,390,392 gigabit memories 310
failure criteria 220, 221, 421 glass passivation 350, 351
504 Index
Z diodes 154--163