You are on page 1of 29

Failure Analysis and Principles Involved

Semiconductor Failure Analysis


Semiconductor Failure analysis (FA) is the process of determining how or why a semiconductor device has failed, often performed as a series of steps known as FA techniques. Device failure is defined as any non-conformance of the device to its electrical and/or visual/mechanical specifications. Failure analysis is necessary in order to understand what caused the failure and how it can be prevented in the future. Electrical failure can either be functional or parametric. Functional failure refers to the inability of a device to perform its intended function. Parametric failure refers to the inability of a device to meet the electrical specifications for a measurable characteristic (such as leakage current) that does not directly pertain to functionality. Thus, a parametric failure may be present even if the device is still functional or able to perform its intended function. For example, a DAC that can convert digital data into the correct analog voltage but draws excessive supply current is a parametric failure, but one that does not convert data at all is a functional failure. A device is said to be failing catastrophically if it is grossly failing all parametric and functional test blocks. Failure analysis starts with failure verification. It is important to validate the failure of a sample prior to failure analysis in order to conserve valuable FA resources. Failure verification is also done to characterize the failure mode. Good characterization of the failure mode is necessary to make the FA efficient and accurate. After failure verification, the analyst subjects the sample to various FA techniques step by step, collecting attributes and other observations along the way. Non-destructive FA techniques are done before destructive ones. Also, the results of these various FA techniques must be consistent or corroborative. Any inconsistency in results must be resolved before proceeding to the next step. For example, a pin that exhibits a broken wire during X-ray inspection but also shows an acceptable curve trace during curve tracing can not happen, so this inconsistency must be resolved by verifying which of the two results is correct. In general, the results of the various FA techniques would collectively point to the real failure site. The FA process is finished once there are enough information to make a conclusion about the location of the failure site and cause or mechanism of failure. Click here to see the various FA Techniques. FA Terminology Failure Mode - a description of how a device is failing, usually in terms of how much it is deviating from the specification that it is failing, e.g., excessive supply current, excessive offset voltage, excessive bias current Failure Mechanism - the physical phenomenon behind the failure of a device, e.g., metal corrosion, electrostatic discharge, electrical overstress Root Cause - the first event or condition that triggered, whether directly or indirectly, the occurrence of the failure, e.g., improper equipment grounding that resulted in ESD damage, a system problem that caused the usage of an incorrect mask set The objective of a failure analyst when conducting FA is to determine the failure mechanism that led to the failure mode of the device. Once the failure mechanism has been determined, the process owner or expert can work with the failure analyst to determine the root cause of the problem. The process owner

must always address the root cause of the failure mechanism, not just the intermediate failure causes that occurred after the root cause has already happened.

Failure Analysis (FA) Techniques


Failure Analysis Techniques, or simply FA Techniques, are the individual analytical steps performed to complete the failure analysis process. Each FA technique in the FA process is designed to provide its own, specialized information that will contribute to the determination of the failure mechanism of the sample. Although FA techniques are generally independent of each other, their results must nonetheless be consistent and corroborative in order to arrive at a strong conclusion for the FA cycle. During the FA process, all applicable non-destructive FA techniques must be performed prior to the conduct of any destructive FA technique. An FA technique that alters a sample permanently in whatever way (whether visual, mechanical, chemical, or electrical) is considered destructive. On the other hand, non-destructive techniques are those which do not cause any permanent change in the sample, ideally speaking. Table 1 shows links to FA techniques commonly used in the semiconductor industry. Table 1. Semiconductor FA Techniques Non-destructive Techniques
Technique Failure Verification Optical Microscopy X-ray Radiography Curve Tracing Hermeticity Testing SAM Application validation of reported failure external or internal visual inspection internal x-ray imaging current-voltage characterization check for hermetic sealing detection of delaminations Technique Decapsulation Sectioning Hot Spot Detection LEM Microprobing SEM/TEM EDX/WDX Focused Ion Beam FTIR Spectroscopy Auger Analysis SIMS LIMS ESCA or XPS AFM / STM EBIC OBIC Chromatography RGA Application opening of the IC package cross-sectioning of the sample detection of heat-generating defects detection of light-emitting defects direct electrical analysis of the die circuit high magnification real-time imaging elemental analysis high resolution die sectioning/imaging chemical analysis surface analysis compositional analysis compositional analysis surface analysis high-resolution probe imaging induced current imaging of defects induced current imaging of defects chemical analysis residual gas/moisture analysis

Destructive Techniques

Note that some non-destructive techniques can become destructive if improperly performed on the sample. Examples of these are curve tracing and bench testing, which can lead to electrical overstressing of the sample if improperly undertaken.

Basic Failure Analysis (FA) Flows


Every experienced failure analyst knows that every FA is unique. Nobody can truly say that he or she has developed a standard failure analysis flow for every FA request that will come his or her way. FA's have a tendency of directing themselves, with each subsequent step depending on the outcome of the previous step. The flow of failure analysis is influenced by a multitude of factors: the device itself, the application in which it failed, the stresses that the device has undergone prior to failure, the point of failure, the failure rate, the failure mode, the failure attributes, and of course, the failure mechanism. Nonetheless, FA is FA, so it is indeed possible to define to a certain degree a 'standard' FA flow for every failure mechanism. This article aims to give the reader a basic idea of how the FA flow for a given failure mechanism may be standardized. 'Standardization' in this context does not mean defining a step-by-step FA procedure to follow, but rather what to look for when analyzing failures depending on what the observed or suspected failure mechanism is. Basic Die-level FA Flow 1) Failure Information Review. Understand thoroughly the customer's description of the failure. Determine: a) the specific electrical failure mode that the customer is experiencing; b) the point of failure or where the failure was encountered (field or manufacturing line and at which step?); c) what conditions the samples have already gone through or been subjected to; and d) the failure rate observed by the customer. 2) Failure Verification. Verify the customer's failure mode by electrical testing. Check the datalog results for consistency with what the customer is reporting. 3) External Visual Inspection. Perform a thorough external visual inspection on the sample. Note all markings on the package and look for external anomalies, i.e., missing/bent leads, package discolorations, package cracks/chip-outs/scratches, contamination, lead oxidation/corrosion, illegible marks, non-standard fonts, etc. 4) Bench Testing. Verify the electrical test results by bench testing to ensure that all ATE failures are not due to contact issues only. The ideal case is for the customer's reported failure mode, ATE results, and bench test results to be consistent with each other. 5) Curve Tracing. Perform curve tracing to identify which pins exhibit current/voltage (I/V) anomalies. The objective of curve tracing is to look for open or shorted pins and pins with abnormal I/V characteristics (excessive leakage, abnormal breakdown voltages, etc.). FA may then be focused on circuits involving these anomalous pins. Dynamic curve tracing, wherein the unit is powered up while undergoing curve tracing, may be performed if static curve tracing does not reveal any anomalies. 6) X-ray Inspection. Perform x-ray inspection to look for internal package anomalies such as broken wires, missing wires, incorrect or missing die, excessive die attach voids, etc, without having to open the package. Xray inspection results must be consistent with curve trace results, e.g., if x-ray inspection revealed a broken wire at a pin, then curve tracing should reveal that pin to be open.

7) CSAM. Perform CSAM on plastic packages to determine if the samples have any internal delaminations that may lead to other failure attributes such as corrosion, broken wires, and lifted bonds. 8) Decapsulation. Once all the non-destructive steps such as those above have been completed, the samples may be subjected to decapsulation to expose the die and other internal features of the device for further FA. 9) Internal Visual Inspection. Perform internal visual inspection after decap. This is usually done using a low-power microscope and a high-power microscope, proceeding from low magnification to higher ones. Look for wire/bond anomalies, die cracks, wire and die corrosion, die scratches, EOS/ESD sites, fab defects, and the like. SEM inspection may be needed in some instances. 10) Hot Spot Detection. If curve trace results indicate some major discrepancies between the I/V characteristics (especially with regard to power dissipation) of the samples and known good units, then the samples may have localized heating on the die. For example, an abnormally large current flowing between an input pin and GND may mean a short circuit from this input pin to GND. Shorts such as this will emit heat that can be located by hot spot detection techniques. 11) Light Emission Microscopy. If the device does not exhibit abnormalities in power dissipation that may indicate hot spots, light emission microscopy may be performed to look for defects that emit light. Note that an emission site does not mean that it is the failure site. 12) Microprobing. Microprobing becomes necessary if no hot spots nor abnormal photoemissions were seen from the samples. Microprobing may entail extensive circuit analysis wherein the failure site is pinpointed by analyzing the die circuit stage by stage or section by section. The thought process used when troubleshooting a full-size circuit also applies to die circuit troubleshooting. 13) Die Deprocessing. Perform die deprocessing to look for subsurface damage or defects if the above FA steps were not successful in locating the failure site.

Basic Ball Lifting FA Flow


1) Failure Information Review. Check the customer's description of the failure for telltale signs of ball lifting, i.e., a) functional or catastrophic failures that may indicate an open bond; b) pins that become intermittently open when pressure is applied to the package or if the device is subjected to elevated or extremely low temperature; or c) high-resistance or permanently open pins. 2) Device/Lot History Review. Check the FA history of the device to determine if it has exhibited ball lifting returns previously. Check the assembly and test history of the lot to determine if the lot has exhibited any yield or process issues potentially related to ball lifting. Sad to say, most ball lifting issues have assignable causes and are non-random in nature, so containment or bounding of the problem must be meticulously pursued. 3) Failure Verification. Verify the customer's failure mode by electrical testing. If ball lifting is suspected but the unit is passing e-test, test the unit several times because the unit may have intermittently good bonds that allow it to pass. E-test must also be performed at elevated temperature if possible. 4) External Visual Inspection. Perform a thorough external visual inspection on the sample. Note all package anomalies that may indicate the unit having been subjected to thermo-mechanical stresses. 5) Bench Testing. Verify the electrical test results by bench testing at the temperature where the failure was seen. If e-test at high temperature did not verify the failure reported by the customer, perform the bench test at elevated temperature as well.

6) Curve Tracing. Perform curve tracing at ambient, elevated (125C-150C) and low temperature (-10C to -40C). This is the turning point of any ball lifting FA, because a lifted ball bond should be seen as an open pin at elevated, if not at ambient, temperature. Some lifted balls manifest at low temperature, although not as frequently. Note that the sample is unlikely to be a ball lifting failure if none of its pins is open, whether permanently or intermittently. 7) X-ray Inspection. Perform x-ray inspection as part of the FA routine. Don't expect to find any lifted balls in the xray image if no open pins were seen during curve tracing. On the other hand, if you see a lifted ball during xray inspection, then consider this as a gross case of ball lifting and ask yourself how this could have passed electrical testing. 8) CSAM. Perform CSAM on plastic packages to determine if the samples have any internal delaminations that may lead to ball lifting. Delaminations play an important part in aggravating, if not directly causing, lifted ball bonds. Movement of the plastic compound parallel to or away from the die surface as a result of delamination can shear ball bonds off their bond pads. 9) Decapsulation/Internal Visual Inspection. Perform internal visual inspection after decap. SEM inspection is most useful in verifying lifted ball bonds, since some lifted balls may not be visible optically due to the poor depth of field of optical microscopes. Once a lifted ball is found, perform further visual inspection on the affected bond pad, looking for signs of contaminants, deep probe marks/exposed oxide, cratering, metal lifting, corrosion, and other attributes that may lead to ball lifting. 10) Microprobing (optional). Some ball bonds will not appear to be 'lifted' visually, even under SEM inspection. In such cases, it is necessary to confirm that the ball bond has no electrical contact with the bond pad by microprobing. Of course, this works best if you've already identified which pin is anomalous during curve tracing. 11) Aspect Ratio Quantification. Use your SEM to estimate the aspect ratio of your ball bond. Ball bond aspect ratio is defined as the ratio of the ball diameter to the ball height, so flatter bonds will exhibit higher aspect ratios. Well-formed ball bonds would exhibit aspect ratios between 3 to 5. Balls are considered underbonded (AR<2.5) or overbonded (AR>5.5) if way outside this range. Poorly formed bonds mean a processing problem at wirebond that can lead to ball lifting. 12) IMC Quantification. Use your optical microscope to quantify the intermetallic coverage (IMC) of the ball bond. This is done by getting the percentage of the intermetallic formation on the ball bond surface. An IMC of at less than 50% (i.e., less than 50% of the bonded surface has intermetallics) indicate insufficient intermetallic formation. Try to correlate the amount and geometry of the IMC with whatever visual attributes are observed on the bond pad. Remember that poor IMC formation is most often due to bond pad anomalies that impede bonding. 13) EDX Analysis. Perform EDX analysis on the bond pads and ball bond surface to look for contaminants that may have impeded intermetallic formation. Note that silicon over the bond pad (unetched glass or Si saw dust) is a very common cause of ball lifting, so don't immediately presume that the silicon peak came from the wafer/substrate. Silicon is on top of the bond pad if its peak increases relative to that of aluminum when the SEM EHT is lowered. 14) Wire Pull Test/Ball Shear Test. If only one or two bonds have lifted, it may be useful to check the strengths of the other bonds of the sample(s). This will indicate whether the bonding problem is localized to a particular area of the die or it affects all the bonds. This is highly destructive, and must only be done as one of the last steps (if not the last one) of the analysis. 15) Conclusion. As may be discerned from above, the basic flow of a ball lifting FA consists of the following: a) looking for intermittent or open pins prior decap; b) visually and electrically confirming the ball lifting after decap; c) assessment of the IMC; d) identification of the physical and chemical abnormalities on the bond pad and the ball itself that correlate with the IMC observed; and e) subsequent investigations/simulations/evaluations to identify the root cause of these anomalies.

Basic Die Cracking FA Flow


1) Failure Information/Device and Lot History Review. Understand the customer's description of the failure, i.e., the failure mode, where it was encountered, what conditions the sample was subjected to, etc. Check the FA history of the device to determine if it has exhibited die cracking returns before. Check the assembly and test history of the lot to determine if the lot has exhibited any yield or process issues potentially related to die cracking. 2) Failure Verification. Verify the customer's failure mode by electrical testing. 3) External Visual Inspection. Perform a thorough external visual inspection on the sample. Note all package anomalies that may indicate the unit having been subjected to thermo-mechanical stresses, i.e., package cracks/chip-outs, tool marks, bent leads, discolored/burned package, etc. 4) Bench Testing. Verify the electrical test results by bench testing. 5) Curve Tracing. Perform curve tracing at ambient, elevated (125C-150C) and low temperature (-10C to -40C). Look for open or shorted pins which may indicate gross die cracking. Note, however, that some die crack failures may only exhibit subtle I/V curve anomalies. 7) X-ray Inspection. Perform x-ray inspection on the sample. Check for die attach problems such as excessive voids, die overhang, insufficient die attach coverage, and insufficient fillet. Check also for molding compound voids and cracks. Gross die cracks may also be found using sophisticated x-ray equipment. 8) CSAM. Perform CSAM on plastic packages to determine if the samples have any internal delaminations that are indicative of the unit having been subjected to extremely high temperatures. Units with severe die attach abnormalities will exhibit die cracking upon exposure to temperature extremes. 9) Decapsulation/Internal Visual Inspection. Perform internal visual inspection after decap to confirm the die crack. The crack pattern on the die surface as well as the die edge must be fully understood through extensive optical and SEM inspection. 10) Full Decapsulation. Many die cracking issues involve die cracks that originate from the backside of the die. If SEM inspection of the die surface and die edge indicates that the cracks most likely originated from the die backside, then full decapsulation must be done. Full decapsulation consists of immersing the entire unit in acid to disintegrate the entire package, leaving behind the die only. The die backside crack pattern may then be inspected freely once full decap is completed. 11) Fractography. Fractography is the systematic and scientific process of determining the origination and propagation of the cracking mechanism by studying the attributes of the fracture surface of the die. Fractography is a complicated process and can only be done reliably through years of study and experience. Once mastered, fractography would be an indispensable tool for analyzing die crack issues. Note that Steps 9, 10, and 11 all have one objective: to understand the crack origin and propagation pattern to determine what stresses were applied to the die. 12) Conclusion. As may be discerned from above, the basic flow of a die cracking FA consists of the following: a) taking note of all electrical and visual/mechanical attributes of the sample before decap; b) confirmation of the die crack after decap; c) determination of the point of origin and propagation pattern of the die crack; d) determination of the points of application and direction of the stresses most likely experienced by the die based on the crack origin and propagation; and e) subsequent investigations, simulations, or evaluations to identify the root cause of the stresses.

Basic Package Cracking FA Flow


1) Failure Information/Device and Lot History Review. Understand the customer's description of the package crack failure. Check the FA history of the device to determine if it has exhibited package cracking occurrences before, whether in the field or in the manufacturing line. Check the assembly and test history of the lot to determine if the lot has exhibited any yield or process issues potentially related to package cracking. 2) Failure Verification. Perform external visual inspection on the sample to confirm the package cracks reported by the customer. Note the similarities and differences between the customer's description of the package crack and the actual package crack. 3) External Visual Inspection. Perform a more thorough external visual inspection on the sample to completely characterize the package crack. Check how many distinct crack lines there are, where they originate and where they end, and how they propagated from these end points. Note also all other package anomalies that may indicate the unit having been subjected to thermo-mechanical stresses, i.e., package chip-outs, tool marks, bent/non-coplanar leads, discolored/burned package, etc. 4) Look for Origin/Propagation Patterns. Check how many distinct crack lines there are, where they originate and where they end, and how they propagated from these end points. If there are several units affected, check for specific patterns with regard to how the cracks are localized. Are they on one side of the package only? Do they affect certain pins only? Do they always occur at certain features of the package only, e.g., at the top-bottom package interface, at the tie bar, at the leads, etc.? 5) CSAM. Perform CSAM on the samples to check for any internal delaminations that are indicative of the unit having been subjected to extremely high temperatures. Check also for localized delaminations that correlate with the locations of the package cracks. 6) Stress Analysis. Analyze the package crack characteristics and internal delaminations to formulate your best hypothesis (or hypotheses) on how the unit was stressed. A good guideline to follow for this is that fractures always occur under tensile stresses. List down as many possible scenarios or conditions that can result in these cracks. Pay particular attention to the possibility that these have been caused in the manufacturing line. Be sure to enlist the help of the Back-end Assembly experts in generating the list of hypotheses. 7) Simulations. Perform simulations on good units to verify each of your hypothetical root causes. For example, if you think that debris under the package during DTF caused the problem, then perform DTF on units with debris underneath them. You know you've pinned down the actual cause if you've duplicated the exact package crack pattern.

Reliability Models for Failure Mechanisms


Failure Mechanism Reliability Modeling, or reliability modeling, or acceleration modeling, or simply modeling, is the mathematical representation of a failure mechanism in terms of a set of algebraic or differential equations from the perspective of its reliability implications. The term failure mechanism refers to the actual physical phenomenon behind a failure occurrence. Modeling is a means of determining and understanding the different variables or factors that bring out and accelerate a failure mechanism. Being able to model a mechanism and quantify how it is affected by various environmental factors will allow a reliability engineer to develop appropriate reliability tests for estimating field failure rates and predicting when failures will begin to occur. Modeling is often expressed in the form of time to failure, or tf, or the acceleration factor, AF.

The Arrhenius Equation Everything in this universe will decay or degrade with time, and the Second Law of Thermodynamics is there to make sure of this. Destruction or degradation of matter is generally due to atomic or molecular changes accelerated by external factors, one of which is temperature. The response dependence of degradation or failure mechanisms on temperature is given by the Arrhenius equation: R = Ae
(-Ea/kT)

where R=reaction rate, A=constant, Ea=activation energy, k= Boltzmanns constant (8.6e-5 eV/K), T=absolute temperature For any given reaction obeying the Arrhenius equation, R1t1=R2t2=constant, where R is the reaction rate and t is the elapsed reaction time. To illustrate this, consider a reaction process that occurs at a high temperature T1 and low temperature T2. Since temperature increases the reaction rate, then R1 is faster than R2, or R1 > R2 . However, the reaction process also takes a shorter duration at T1, or t1 < t2, such that R1t1=R2t2 =constant. Now, let tf=time to failure, then Rtf =constant, or tf=C1/R. (-Ea/kT) (Ea/kT) Thus, tf = C1/(Ae ) = (C)(e ). Let the acceleration factor AF be the ratio tfuse / tftest . (Ea/kTuse) (Ea/kTtest) (Ea/k) (1/Tuse-1/Ttest) Thus, AF=[(C)(e ) / (C)(e )]= e Estimating Ea and tf using Arrhenius Plots Recall that tf = (C)(e ). Then, ln(tf) = lnC + Ea/kT. Thus, the plot of ln(tf) vs. 1/T yields a straight line whose slope corresponds to Ea/k. Electromigration Electromigration is the movement of metal atoms of a metal line in the direction of the current flow through that metal line. This mechanism is similar to pebbles in a stream, which are picked up and transported by the water in the direction of the water currents. As such, during electromigration, metal atoms are removed from the starting end of the metal line and accumulates at the other end, forming voids at the entrance and hillocks at the exit of the metal line. Thus, electromigration can result in open circuits (due to the voids) or line-to-line short circuits (due to the hillocks). Electromigration is accelerated by temperature and current density, and is modeled as follows: tf = CJ e AF = tfuse / tftest n (Ea/k) (1/Tuse-1/Ttest) AF = (Jtest/Juse) e where: C = a constant based on metal line properties n = integer constant from 1 to 7 Tuse, Ttest = temperature during use and under test, respectively Juse, Jtest = current density during use and under test, respectively Ea = 0.5 - 0.7 eV for pure Al
-n (Ea/kT) (Ea/kT)

Corrosion

Corrosion is metal degradation due to chemical or electrolytic reactions in the presence of moisture, contaminants, and bias. Corrosion rate is a function of temperature (T), relative humidity (RH), and bias (V). Let AF = tfuse / tftest and -3 (0.9/kT) tf = C(RH) e . With no applied voltage: 3 (0.9/k) (1/Tuse-1/Ttest) AF = (RHtest/RHuse) e With voltage V applied: 3 (0.9/k) (1/Tuse-1/Ttest) AF = (V) (RHtest/RHuse) e where: C = a constant RHuse, RHtest = relative humidity during use and under test, respectively Tuse, Ttest = temperature during use and under test, respectively

Time-dependent Dielectric Breakdown (TDDB) Time-dependent Dielectric Breakdown, or TDDB, is the destruction of dielectric layers occurring over time. R = A1e AF = tfuse / tftest = Rtest /Ruse ([-Ea/k] [1/Ttest-1/Tuse] + C [Vtest-Vuse]) AF = e where: A1, C = constants Ea = 0.8 - 0.9 eV Vuse, Vtest = voltage applied during use and under test, respectively Hot Carrier Effects Hot carrier effects is a phenomenon involving the injection of highly energetic carriers into the gate oxide layer and the silicon substrate, resulting in volume charge build-up that can shift transistor threshold voltages. This mechanism isaccelerated by low temperatures. AF = tfuse / tftest AF = e([Ea/k] [1/Tuse-1/Ttest] + C [Vtest-Vuse]) where: V = voltage accelerating the carriers Ea = -0.2 eV to -0.06 eV C = constant
(-Ea/kT+CV)

Bond/Solderability Failures Bond/solderability failures related to intermetallic growths, e.g., ball lifting due to Kirkendall voids, Cu-Sn intermetallic growths towards the leadfinish surface, etc. are modeled as follows.

tf = Ae AF = tfuse / tftest (Ea/k) (1/Tuse-1/Ttest) AF = e where: A = constant Ea = 1 eV for Au-Al bonds Ea = 0.5-0.75 eV for Sn-based leadfinish

(Ea/kT)

TC-induced Package Cracking The occurrence of fracture anywhere in the package after it has undergone several temperature cycles has also been modeled. Since the zero-stress condition of the package is at a high temperature (around 175 deg C) , the low temperature (cooling) cycle has the main effect on this mechanism. AF = (Taccel/Tuse)m where: Taccel = Tmin(accel) - Tneutral Tuse = Tmin(use) - Tneutral Tneutral = zero stress temperature (approx. 175 deg C) m = 20 (fracture property-dependent)

Fatigue Failures Fatigue failures are failures due to application of cyclical stresses. AF = (Taccel/Tuse)n Nf = C(T)-n where: Nf = cycles to failure T = temperature difference n = temperature difference factor

Die-related Failure Mechanisms and Attributes I


Contact Migration Contact migration refers to the diffusion of the metal atoms of a contact (usually Al or an alloy thereof) into the silicon substrate. This phenomenon is due to the natural occurrence of interdiffusion between two different interdiffusible materials in contact with each other, which are Al and Si in this case. This phenomenon of interdiffusion occurs in both ways, i.e., Al diffuses into Si and Si diffuses into Al. This is not current-related and must not be confused with electromigration, which is a different mechanism. Junction spiking occurs when the amount of Al migration into the silicon substrate has reached the point wherein the Al has penetrated deep enough so as to short a p-n junction in its path. By that time an Al spike is said to have shorted the junction, damaging the device permanently. The reverse, wherein the Si atoms have entirely penetrated the Al layer above, may also happen and can result in an open circuit as a result of voids in the metal contact. Silicon aggregates that have diffused through the Al layer and reached the surface are known as silicon nodules. Silicon nodules are often observed over the bond pads as small but numerous hillocks, and are known to cause wirebonding problems as well. Al migration is usually reduced by doping the Al with Si or Cu or both, forming an alloy that is more resistant to Al-Si interdiffusion. A barrier metal such as TiW or Pt-Si may also be deposited between the Al layer and the silicon substrate. Die Corrosion Corrosion in Die and Package Corrosion is the degradation of metals as a result of electrochemical activity. The process of corrosion requires 4 components for it to occur: 1) an anode; 2) a cathode; 3) an electrolyte; and 4) electrical connection between the anode and the cathode. Thus, the key to corrosion prevention is the elimination of at least one of these components. The presence of an anode and a cathode implies that there is a potential difference between them, i.e., the anode has a greater tendency to lose electrons while a cathode has a greater tendency to gain them. The presence of this potential difference is the primary driver of corrosion. The anode is the metal or site with a higher potential to oxidize (lose electrons). Thus, a metal x+ undergoing corrosion is said to be 'anodic' if it is where the oxidation reaction takes place: M M + xe . The anode is often one of the following: 1) the more active metal; 2) a stressed region such as a crack, a scratch, a grain boundary, or a deformed structure; 3) an area that is starved of oxygen; and 4) an area with variations in its composition. The cathode is the metal or site with a higher potential for reduction (gaining of electrons), or lower tendency to oxidize. It is often one of the following: 1) a noble metal; 2) an unstressed region; 3) an area with high oxygen concentration; and 4) a non-metallic component. The electrolyte is the medium through which ions may move, the most common of which is water.

Die Corrosion Die corrosion refers to the corrosion of the metal areas on the surface of the die. Aluminum (Al) metal areas are the most prevalent on a typical die circuit, so Al corrosion is quite commonly encountered. Other thin-film layers on the die such as sichrome resistors can also corrode. Gross cases of chemical corrosion can lead to either electrically open or electrically shorted metal lines, with the latter being due to corrosion byproducts that can bridge two metal lines together. Corroded metal lines appear dark under an optical microscope (as shown in the picture on the right). Chemical corrosion of Al is triggered by the presence of moisture and contaminants on the die surface. Corrosion of Al can occur whether it is acting as an anode or as a cathode. Al bond pads, being unglassivated, are more vulnerable to corrosion. However, corrosion can also occur in subsurface Al lines that are accessible to moisture by imperfections in its protective glassivation or inter-metal dielectric layers. Corrosion is often a result of many wafer fab or packaging contamination problems. Improper rinsing or excessive use of corrosive contaminants such as P, S, and Cl during wafer fab can make the die highly susceptible to die corrosion. Packaging and passivation defects that allow excessive ingress of moisture and contaminants into the die can also lead to die corrosion. The use of plastic molding compounds with corrosive ingredients and the use of die attach material that exhibits resin bleeding of corrosive contaminants may likewise trigger die corrosion.

Fig 1. SEM photo of Fig 2. Photo of a corroded bond corroded aluminum metal pad that exhibited ball lifting; this lines corrosion was caused by Cl contamination

Lead/Leadframe Corrosion Lead corrosion, as the name implies, refers to the corrosion of the lead itself. Lead corrosion is often due to inadequate lead finish, the presence of contaminants on the leads, and exposure of the leads to excessive moisture. It can be accelerated by higher temperatures and the presence of electrical bias on the leads. Leadframe corrosion refers to the corrosion of any part of the leadframe. Although this mechanism becomes more critical if it occurs on the silver-plated areas (die pad where the die is set and the bonding fingers) of the leadframe, corrosion on any part of the leadframe must be rejected because the contaminants in the corroded area can easily spread in the presence of moisture. The most frequently encountered contaminants in fresh leadframes are chlorine, phosphorus, sulfur, and potassium. Newly-delivered leadframes from suppliers must undergo strict incoming quality screening for contaminants/foreign materials to minimize the risk of internal corrosion in semiconductor products.

Fig 3. SEM photos of corroded areas on contaminated die pads of various leadframes

Wire Corrosion Corrosion of the bond wires within a package can also occur, gross cases of which can lead to wire breaking or even total disintegration of the wire. This mechanism is more commonly encountered in aluminum wires that have been contaminated by chlorine, although rare cases involving gold wires have also be observed. In the case of gold wires, delaminated areas around the wire can act as conduit of Cl-contaminated moisture that can expose the entire length of the wire to Cl and make it vulnerable to massive corrosion. Other contaminants that accelerate gold wire corrosion include bromide, iodide, and cyanide ions.

Fig 4. SEM photo of a corroded aluminum wire and wedge bond

Electrode Reduction and Oxidation Potential Corrosion, the degradation of metals as a result of electrochemical activity, requires an anode and a cathode in order to occur.The anode is the metal or site with a higher potential to oxidize (lose electrons) while the cathode is the metal or site with a higher potential for reduction (gaining of electrons). In other words, the cathode has a lower potential to oxidize than the anode. The measure of a material to oxidize or lose electrons is known as its 'oxidation potential.' A difference between the oxidation potentials of two metals or sites can lead to corrosion that will consume the metal or site that is more anodic. This is assuming that the two other things needed for corrosion are also present: electrical connection between the two metals or sites with oxidation potential difference and the presence of an electrolyte (such as water) to conduct ions between them. Table 1 presents the standard oxidation potential values of various elements. The values of the oxidation potential in this table are used relative to each other, to determine the tendency of a metal to become a cathode (or anode) with respect to another metal, for corrosion to occur. Table 1. Standard Electrode Reduction and Oxidation Potential Values
Anodic - exhibits greater tendency to lose electrons o Reduction Reaction E (V) Oxidation Reaction + + Li + e Li -3.04 Li Li + e + + K +e K -2.92 K K + e 2+ 2+ Ba + 2e Ba -2.90 Ba Ba + 2e 2+ 2+ Ca + 2e Ca -2.87 Ca Ca + 2e E (V) 3.04 2.92 2.90 2.87
o

Na + e Na -2.71 Na Na + e 2+ 2+ Mg + 2e Mg -2.37 Mg Mg + 2e 3+ 3+ Al + 3e Al -1.66 Al Al + 3e 2+ 2+ Mn + 2e Mn -1.18 Mn Mn + 2e 2H2O + 2e H2 + 2 OH -0.83 H2 + 2 OH 2H2O + 2e 2+ 2+ Zn + 2e Zn -0.76 Zn Zn + 2e 2+ 2+ Cr + 2e Cr -0.74 Cr Cr + 2e 2+ 2+ Fe + 2e Fe -0.44 Fe Fe + 2e 3+ 3+ Cr + 3e Cr -0.41 Cr Cr + 3e 2+ 2+ Cd + 2e Cd -0.40 Cd Cd + 2e 2+ 2+ Co + 2e Co -0.28 Co Co + 2e 2+ 2+ Ni + 2e Ni -0.25 Ni Ni + 2e 2+ 2+ Sn + 2e Sn -0.14 Sn Sn + 2e 2+ 2+ Pb + 2e Pb -0.13 Pb Pb + 2e 3+ 3+ Fe + 3e Fe -0.04 Fe Fe + 3e Arbitrary Neutral : H2 o Reduction Reaction E (V) Oxidation Reaction + + 2H + 2e H2 0.00 H2 2H + 2e Cathodic - exhibits greater tendency to gain electrons o Reduction Reaction E (V) Oxidation Reaction + + S + 2H + 2e H2S 0.14 H2S S + 2H + 2e 4+ 2+ 2+ 4+ Sn + 2e Sn 0.15 Sn Sn + 2e 2+ + + 2+ Cu + e Cu 0.16 Cu Cu + e 2+ + 2+ + SO4 + 4H + 2e SO2 + 2H2O 0.17 SO2 + 2H2O SO4 + 4H + 2e AgCl + e Ag + Cl 0.22 Ag + Cl AgCl + e 2+ 2+ Cu + 2e Cu 0.34 Cu Cu + 2e ClO3 + H2O + 2e ClO2 + 2OH 0.35 ClO2 + 2OH ClO3 + H2O + 2e 2H2O + O2 + 4e 4OH 0.40 4OH 2H2O + O2 + 4e + + Cu + e Cu 0.52 Cu Cu + e I2 + 2e 2I 0.54 2I I2 + 2e + + O2 + 2H + 2e H2O2 0.68 H2O2 O2 + 2H + 2e 3+ 2+ 2+ 3+ Fe + e Fe 0.77 Fe Fe + e + + NO3 + 2H + e NO2 + H2O 0.78 NO2 + H2O NO3 + 2H + e 2+ 2+ Hg + 2e Hg 0.78 Hg Hg + 2e + + Ag + e Ag 0.80 Ag Ag + e + + NO3 + 4H +3 e NO + 2H2O 0.96 NO + 2H2O NO3 + 4H +3 e Br2 + 2e 2Br 1.06 2Br Br2 + 2e + + O2 + 4H + 4e 2H2O 1.23 2H2O O2 + 4H + 4e + 2+ 2+ + MnO2 + 4H + 2e Mn + 2H2O 1.28 Mn + 2H2O MnO2 + 4H + 2e 2+ 3+ 3+ 2+ Cr2O7 + 14H + 6e 2Cr + 7H2O 1.33 2Cr + 7H2O Cr2O7 + 14H + 6e Cl2 + 2e 2Cl 1.36 2Cl Cl2 + 2e 4+ 3+ 3+ 4+ Ce + e Ce 1.44 Ce Ce + e 3+ 3+ Au + 3e Au 1.50 Au Au + 3e + 2+ 2+ + MnO4 + 8H + 5e Mn + 4H2O 1.52 Mn + 4H2O MnO4 + 8H + 5e + + H2O2 + 2H + 2e 2H2O 1.78 2H2O H2O2 + 2H + 2e 3+ 2+ 2+ 3+ Co + e Co 1.82 Co Co + e 2222S2O8 + 2e 2SO4 2.01 2SO4 S2O8 + 2e + + O3 + 2H + 2e O2 + H2O 2.07 O2 + H2O O3 + 2H + 2e F2 + 2e 2F 2.87 2F F2 + 2e

2.71 2.37 1.66 1.18 0.83 0.76 0.74 0.44 0.41 0.40 0.28 0.25 0.14 0.13 0.04 E (V) 0.00 E (V) -0.14 -0.15 -0.16 -0.17 -0.22 -0.34 -0.35 -0.40 -0.52 -0.54 -0.68 -0.77 -0.78 -0.78 -0.80 -0.96 -1.06 -1.23 -1.28 -1.33 -1.36 -1.44 -1.50 -1.52 -1.78 -1.82 -2.01 -2.07 -2.87
o o

For example, if tin is deposited over copper, then there is a possibility for corrosion to occur. From Table 1, copper has a lower oxidation potential (-0.34 V) than tin (0.14 V), so Cu can serve as the cathode while Sn can serve as the anode, creating the potential difference necessary for corrosion to occur.

Die Scratches Die scratch is the presence of abrasion, scraping, or laceration damage on the surface of the die. Die scratching is a failure mechanism wherein the surface of the die is mechanically damaged by a rigid object that is accidentally dragged across or moved over it. Die scratching usually results in gross abrasion, scraping, or laceration damage on the die's active circuit (see Figure 1). The damage itself is referred to as a 'die scratch', while the damaged die is referred to as a 'scratched die.' Die scratches are caused by mechanical means, usually by mishandling. 'Mishandling' in this context also includes the improper or careless use of tools and accessories used by an operator while working. It is common to see die scratches that resulted from a pointed object such as a probe needle or tweezer accidentally touching the die and sweeping across its surface. Scratches that reach the active circuit beneath the glassivation will immediately lead to electrical failure due to shortedand open metal lines. Metal shorts are commonly seen in narrowly-spaced metal lines, wherein the displaced metal materials bridge the lines together. Open circuits are often induced in narrow, isolated lines. Shallow scratches on the die that do not reach the active circuit will not cause immediate electrical failure, but may pose reliability risks if the top passivation has been breached. For instance, the seepage of moisture and contaminants through a damaged portion of the glassivation can result in die corrosion.

Figure 1. Optical photo (left) and SEM photo (right) of die scratches Die scratching can occur anywhere from wafer fab to assembly prior to encapsulation. Picking up a die carelessly with a tweezer for eutectic die attach can result in the tweezer slipping out of position while scratching the die surface. Improper equipment set-up can cause probe needles, die overcoat dispense tools, and the like to land on and scratch the surface of the die. Foreign materials and dirt embedded at the pick-up tool tips of pick-and-place machines during die attach can also cause die scratches. Similarly, the use of defective, worn-out, or damaged pick-up tools can scratch the die surface. Manual capping of ceramic packages prior to sealing may also cause a die scratch, if the cap or lid inadvertently gets into contact with the surface of the die. Die scratches are quite easy to confirm by optical microscopy, since they truly resemble scratches seen everyday in common objects.

Dielectric Breakdown Dielectric breakdown refers to the destruction of a dielectric layer, usually as a result of excessive potential difference or voltage across it. It is usually manifested as a short or leakage at the point of breakdown. There are many types of dielectric in a typical die circuit, varying not only in purpose but in chemical composition as well. The most commonly used dielectric is SiO2, which is an oxide of silicon. The permanent breakdown of an oxide dielectric is also usually referred to as 'oxide rupture' or 'oxide breakdown.' The most common cause of dielectric breakdown in devices with no wafer fab problem is EOS/ESD, since this can expose the dielectric layer to high voltages. Non-EOS/ESD-related dielectric breakdowns may be classified into either an early life dielectric breakdown (ELDB) or a time-dependent dielectric breakdown (TDDB), depending on when in the device lifetime it occurs. Early life dielectric breakdown, usually occurring within the device's first year of operation, is just a special case of early life failure (ELF) involving a dielectric layer. A dielectric breakdown is usually classified as a TDDB if the device has been in operation for at least two years already. These are just guidelines, because the point at which a dielectric breakdown occurs is not just related to time, but to other factors as well. ELDB and TDDB failures are usually caused by a defect in the dielectric layer, such as stray particles which decrease the effective thickness of the dielectric making it prone to breakdown. Since SiO2 is a very common dielectric material, its breakdown mechanism has been understood over the years. SiO2 breakdown is believed to be due to charge injection, and may be broken down into 2 stages. During the first stage, current starts to flow through the oxide as a result of the voltage applied across it. High field/high current regions are then formed as charges are trapped in the oxide. Eventually, these abnormal regions reach stage 2, a critical point wherein the oxide heats up and allows a greater current flow. This results in an electrical and thermal runaway that quickly leads to the physical destruction of the oxide.

Oxide Breakdown
Oxide Breakdown refers to the destruction of an oxide layer (usually silicon dioxide or SiO2) in a semiconductor device. Oxide layers are used in many parts of the device: as gate oxide between the metal and the semiconductor in MOS transistors, as dielectric layer in capacitors, as inter-layer dielectric to isolate conductors from each other, etc. Oxide breakdown is also referred to as 'oxide rupture' or 'oxide punch-through.' Oxide breakdown has always been of serious reliability concern in the semiconductor industry because of the continuous trek towards smaller and smaller devices. As other features of the device are scaled down, so must oxide thickness be reduced. Oxides become more vulnerable to the voltages fed into the device as they get thinner. The thinnest oxide layers today are already less than 50 angstroms thick. An oxide layer can break down instantaneously at 8-11 MV per cm of thickness, or 0.08 - 0.11 V per angstrom of thickness. Oxide breakdowns may be classified as one of the following: 1) EOS/ESD-induced dielectric breakdown; 2) early-life dielectric breakdown; or 3) time-dependent dielectric breakdown (TDDB). Oxide rupture due to EOS/ESD events generally involves a high voltage being applied across the oxide layer, causing a 'weak' spot within it to exhibit dielectric breakdown and allow current to flow. This current flow, which is basically due to loss of dielectric isolation at that spot, causes localized heating, which induces the flow of a larger current. A vicious cycle of increasing current flow and localized heating ensues, eventually causing a meltdown of the silicon, dielectric, and other materials at the 'hot spot'. This

meltdown creates a short circuit between the layers supposedly isolated by the oxide. See also: EOS/ESD Failures.

Figure 1. Photo of an ESD-induced Oxide Breakdown Early-life and time-dependent oxide breakdowns will result in the same failure attributes, but the former involves a breakdown that occurs early in the life of the device (say, within the first 2 years of normal operation), while the latter involves a breakdown that occurs after a much longer time of use (mainly in the 'wear-out' stage). Both categories involve destruction of the oxide while under normal bias or operation. Early life and time-dependent dielectric breakdowns are primarily due to the presence of weak spots within the oxide layer arising from its poor processing or uneven growth. These weak spots or dielectric defects may be caused by: 1) the presence of mobile sodium (Na) ions in the oxide; 2) radiation damage; 3) contamination, wherein particles or impurities are trapped on the silicon prior to oxidation; and 4) crystalline defects in the silicon such as stacking faults and dislocations. The risk of dielectric breakdown generally increases with the area of the oxide layer, since a larger area means the presence of more defects and greater exposure to contaminants. The worse cases of oxide defects are the ones that result in early life dielectric breakdowns. It must be pointed out, however, that even very high quality oxides can suffer breakdown with time, especially in the 'wear-out' period of its lifetime. This latter case is the classic 'TDDB' mechanism.

The SiO2 TDDB Process


Previous studies have shown that SiO2 Time-Dependent Dielectric Breakdown (TDDB) is a charge injection mechanism, the process of which may be divided into 2 stages - the build-up stage and the runaway stage. During the build-up stage, charges invariably get trapped in various parts of the oxide as current flows in the oxide. The trapped charges increase in number with time, forming high electric fields (electric field = voltage/oxide thickness) and high current regions along the way. This process of electric field build-up continues until the runaway stage is reached. During the runaway stage, the sum of the electric field built up by charge injection and the electric fields applied to the device exceeds the dielectric breakdown threshold in some of the weakest points of the dielectric. These points start conducting large currents that further heat up the dielectric, which further increases the current flow. This positive feedback loop eventually results in electrical and thermal runaway, destroying the oxide in the end. The runaway stage happens in a very short period of time. The presence of defects in the dielectric greatly reduces the time needed to transition from the build-up to the runaway stage. These defects actually have the effect of 'thinning' down the oxide where they are located, since they are occupying space that should have been occupied by the dielectric. The effective electric field is higher in these thinned-out areas compared to defect-free areas for any given voltage. This is why it takes a lower voltage and shorter time to break down the dielectric at its defect points.

There are many lifetime equations used in the industry today to model the reliability of an oxide layer. One of the simplest, however, can be seen in www.semicon.toshiba.co.jp. According to this site, TDDB may be modelled by: Tf = Ae
(-BV)

where: Tf = the time to failure; A = a constant; V = the voltage applied across the dielectric layer; and B = a voltage acceleration constant that depends on the properties of the oxide. Numerous studies have shown that oxide breakdown is accelerated not just by the voltage applied across the oxide, but by elevated temperature as well. Thus, the tendency of a lot to fail by oxide breakdown is usually assessed by burn-in, which subjects the samples to both electrical and thermal stresses.

Die-related Failure Mechanisms and Attributes II


Electrical Overstress (EOS) Electrical Overstress, or EOS, refers to the destruction of the circuit because of excessive voltage, current, or power. EOS damage is usually very obvious. Metal lines are discolored, burnt, or melted (see photo on the right and article on metal burn-out). Thin-film resistors are severed, with the severing usually showing up as straight, whitish lines. Transistors and diodes exhibit metal migration from one terminal to another. The glassivation may even show mechanical damage. EOS is usually caused by improper application of excitation to the device, whether it's still being tested in the manufacturing line or it is already in the field. Simple socketting violations such as device misorientation and shifting can cause EOS damage, especially if the voltages intended for the power supply pins will be applied to stress-sensitive or power-limited pins. Improper excitation settings or voltage spikes in the excitation source are also common causes of EOS damage. EOS damage is not always obvious though. Some EOS events leave no apparent physical manifestation on the die surface at all. Such EOS events can still render the affected component non-functional, even if no physical anomalies are observable. Weak EOS events may also occur, simply shifting the parametric performance of the affected component, but nonetheless affecting the over-all performance of the device. Latch-up and Electrostatic Discharge (ESD) are special cases of EOS, and are discussed in more detail as separate failure mechanisms in this reference.

EOS and ESD Failures and their Attributes


Electrical Overstress, or EOS, is a failure mechanism wherein the device is subjected to excessive voltage, current, or power. Electrostatic Discharge, or ESD, is a special type of EOS mechanism in the form of a single-event, rapid transfer of electrostatic charge between two objects. Many people distinguish ESD from other EOS-related but non-ESD mechanisms, so this discussion will do the same and refer to ESD as a separate mechanism from conventional EOS. EOS and ESD can destroy a semiconductor device in many ways, resulting in observable signs of damage or failure attributes. There are, however, three (3) frequently-encountered and basic

mechanisms by which a device is damaged by EOS or ESD. These mechanisms are: 1) dielectric or oxide punchthrough; 2) fusing of a conductor or resistor; and 3) junction damage or burn-out. Dielectric or Oxide Punchthrough Dielectric or oxide punchthrough refers to the EOS/ESD mechanism involving a voltage pulse that is large enough to rupture an oxide or dielectric layer. This problem is prevalent in MOS circuits because the thin oxide isolating the gate and the channel of the MOS transistor can easily be 'punched through' by large voltage spikes. Trends in new fab processes that lean towards thinner oxide layers also aggravate the occurrence of this mechanism. A typical dielectric punchthrough event may occur in the following stages: 1) a high voltage spike occurs between two pins connected to opposite sides of a dielectric layer, in effect applying a large potential difference across the dielectric layer; 2) the breakdown voltage of the dielectric layer is exceeded by the large potential difference across it; 3) the dielectric breaks down and starts conducting current; 4) adiabatic or localized heating of the dielectric at the point of current conduction occurs; and 5) the conduction site melts down forming a filament that shorts the metal layer above the dielectric (connected to one of the pins) and the metal layer below the dielectric layer (connected to the other pin).

Figure 1. Photo of an oxide punchthrough after the top metal layer has been removed Dielectric punchthrough is minimized by using adequate ESD protection circuits and prevention of EOS occurrences, such as the inadvertent or random generation of voltage spikes in the circuit. Conductor / Resistor Fusing The phrase 'Conductor/Resistor Fusing' literally pertains to a metal line or resistor that acted as a 'fuse', or one that has become open due to excessive current. Such melting of a metal or resistor line is often due to intense heat produced by excessive power dissipation, or joule heating, caused by an EOS/ESD event that involves a large current flow through the conductor or resistor. Conductor/resistor fusing is also sometimes referred to as 'metal burn-out' or 'resistor burn-out.' The high power generated during the EOS/ESD event is equal to Ie R, where Ie is the EOS/ESD current and R is the resistance of the metal or resistor line. If this power produces enough localized heat to bring the EOS/ESD site's temperature above the melting temperature of the conductor or resistor, then the fusing, meltdown, or burn-out of the conductor/resistor occurs.
2

Figure 2. Photo of a fused metal line Conductor/resistor fusing is often just a secondary mechanism of another EOS/ESD failure, such as a dielectric or junction damage that has created a short circuit where large currents can flow to subsequently cause the conductor/resistor line to melt down or burn out.

Junction Damage or Burn-out Junction damage or burn-out refers to the destruction of a p-n junction due to joule-heating caused by the EOS/ESD event, resulting either in the junction's being open- or short-circuited. This type of damage also involves joule heating, and is more prevalent in bipolar devices. Hot spots arise in the junction when it undergoes joule heating, especially in parts where there are nonhomogeneities and geometrical shifts. Silicon where these hot spots arise become intrinsic in nature, whereby its resistivity goes down as temperature goes up. The reduction in resistivity further sinks more current, increasing the temperature further. This cycle continues, resulting in a thermal runaway that eventually melts the silicon with the hot spot when its temperature exceeds the melting point of silicon. The silicon meltdown often creates a short across the junction, although high-energy transient EOS/ESD events can also result in open junctions.

Figure 3. Photo of a junction short The power that heats up the junction is equal to IeVBD, where Ie is the EOS or ESD current and VBD is the breakdown voltage of the junction. Reverse-biased junctions are more vulnerable to EOS/ESD damage than forward-biased ones because its higher breakdown voltage results in a higher power dissipation in the depletion layer, requiring a smaller current to cause the damage. Electromigration Electromigration refers to the gradual displacement or mass transport of the metal atoms of a conductor as a result of current flowing through that conductor. It can lead to formation of voids or hillocks in the metal line, which may cause open and short circuits, respectively. Electromigration refers to the gradual displacement of the metal atoms of a conductor as a result of the current flowing through that conductor. The process of electromigration is analogous to the movement of small pebbles in a stream from one point to another as a result of the water gushing through the pebbles. Because of the mass transport of metal atoms from one point to another during electromigration, this mechanism leads to the formation of voids at some points in the metal line and hillocks or extrusions at other points. It can therefore result in either: 1) an open circuit if the void(s) formed in the metal line become big enough to sever it; or 2) a shortcircuit if the extrusions become long enough to serve as a bridge between the affected metal and another one adjacent to it. Electromigration is actually not a function of current, but a function of current density. It is also accelerated by elevated temperature. Thus, electromigration is easily observed in Al metal lines that are subjected to high current densities at high temperature over time. Electromigration is widely believed to be the effect of momentum transfer from the electrons of the metal, which move according to the applied electric field, to the ions that constitute the lattice of the metal. There are two major driving factors that make electromigration happen: 1) the direct action of the electric field on the charged atoms or ions of the metal; and the 2) frictional force or momentum

exchange between the flowing electrons and these ions. The total driving force is the sum of the effects of these two factors. All metal films have imperfections or microstructural variations that cause the atomic flow rates through them to be non-uniformly distributed. This non-uniform atomic flow rates (or flux divergence) through different sections of the conductor result in mass depletion (which causes voids) and mass accumulation (which causes hillocks) as the mass transport mechanism occurs during electromigration. In Al films, the dominant mechanism of atomic migration is along grain boundaries and surfaces. Lattice mismatches (such as those between adjacent large and small grains or when three grain boundaries meet) can create grain boundary interconnections that provide shorter paths for the atoms, enabling the latter to move faster through the film. Another important thing to note regarding how grain structures affect electromigration failure rates is the conclusion from various studies that below a critical value for the metal line width, electromigration is impeded. Electromigration failure rates predictably decrease with decreasing line widths, but up to a certain point only. At the critical limit, the width of the metal line becomes smaller than the grain size itself, such that all grain boundaries are now perpendicular to the current flow. Such a structure is also known as a 'bamboo structure.' This results in a longer path for mass transport, thereby reducing the atomic flux and electromigration failure rate. There is also a critical lower limit for the length of the metal line that will allow electromigration to occur. Known as theBlech length, any metal line that has a length below this limit will not fail by electromigration. Thus, the Blech length must be considered when designing test structures for electromigration. Otherwise, no failures may be observed, leading to an incorrect conclusion. The acceleration effect of high temperature on electromigration becomes emphasized only when a void has started to form in the metal line. Prior to any void formation, the metal can still be under uniform thermal distribution. Once a void forms, however, the current density at the section where the void is present increases as a result of the reduced cross-sectional area of the conductor, leading to current crowding around the void. The higher current density around the void results in localized heating that further accelerates the growth of the void, which again increases the current density. The cycle continues until the void becomes large enough to cause the metal line to fuse open. Electromigration may be modeled by the following equation, which is known as Black's Equation: t50 = CJ e
-n (Ea/kT)

where: t50 = the median lifetime of the population of metal lines subjected to electromigration; C = a constant based on metal line properties; J = the current density; n = integer constant from 1 to 7; many experts believe that n = 2; T = temperature in deg K; k = the Boltzmann constant; and Ea = 0.5 - 0.7 eV for pure Al. Electromigration failures take time to develop, and are therefore very difficult to detect until it happens. Thus, the best solution to electromigration problems is to prevent them from taking place. Electromigration can be prevented by: 1) proper design of the device such that the current densities in all parts of the circuit are practically limited; 2) increasing of the grain sizes of the metal lines such that

these become comparable to their widths (whereby bamboo structure is achieved); and 3) good selection and deposition of the passivation or thin films placed over the metal lines in order to limit extrusions caused by electromigration. Electromigration must not be confused with EOS-induced metal reflow, which is a different phenomenon. Electromigration occurs gradually whereas EOS-induced metal reflow is gross and abrupt. Electrostatic Discharge (ESD) Electrostatic Discharge, or ESD, is a single-event, rapid transfer of electrostatic charge between two objects, usually resulting when two objects at different potentials come into direct contact with each other. ESD can also occur when a high electrostatic field develops between two objects in close proximity. ESD is one of the major causes of device failures in the semiconductor industry. Electrostatic charge build-up occurs as a result of an imbalance of electrons on the surface of a material. Such a charge build-up develops an electric field that has measurable effects on other objects at a distance. The process of electron transfer as a result of two objects coming into contact with each other and then separating is known as 'triboelectric charging'. This charging process results in one object gaining electrons on its surface, and therefore becoming negatively charged, and another object losing electrons from its surface, and therefore becoming positively charged. A person can get triboelectrically charged in a number of ways, even by just walking across a room. The tendencies of various materials to charge up either positively or negatively are shown in a Triboelectric Series. There are three (3) predominant ESD models for IC's: 1) the Human Body Model (HBM); 2) the Charged Device Model (CDM); and 3) the Machine Model (MM). The HBM simulates the ESD event when a person charged either to a positive or negative potential touches an IC that is at another potential. The CDM simulates the ESD event wherein a device charges to a certain potential, and then gets into contact with a conductive surface at a different potential. The MM simulates the ESD event that occurs when a part of an equipment or tool comes into contact with a device at a different potential. HBM and CDM are considered to be more 'real world' models than the MM. ESD-related failures manifest in a number of ways, exhibiting one or more of these attributes: junction leakage, short, or burn-out; dielectric rupture; resistor-metal interface rupture; resistor/metal fusing; and die surface charging. ESD Controls ESD controls come in a vast variety of forms. However, they may be classified into three major categories: 1) prevention of static charge build-up; 2) safe dissipation of any charge build-up; and 3) improvements in the ESD robustness of the product.

Fig. 1. Example of a bench-top ionizer; see ESD Controlsfor more examples The first category works on the basic premise of 'No Charge/No discharge.' Elimination of charge buildup would include the use of materials that have less tendency to generate static charges in the work

area, i.e., antistatic and static dissipative materials. All equipment must be free of moving parts that may generate charges, e.g., rubber rollers, plastic stoppers, etc. Things that the devices may come in contact with or get transported on must also be antistatic or conductive. The use ofionizers to neutralize newly generated charges will also prevent charge build-up. The minimization of movements in the work area, as well as the use of ESD-safe apparel, will help in minimizing static charges generated by personnel. Everything in the production line, from equipment to work tables to cabinets and racks, must be connected to this common ground. If the factory uses conductive flooring, then this should also be connected at regular intervals to this common ground. Having a single or common ground will ensure that everything in the production floor will remain at the same potential. Any charge build-up will immediately be dissipated by a good grounding system. The use of properly grounded wrist and foot straps or conductive shoes will also fall under this category, since these will bring any charge build-up on personnel to the common ground.

Fig. 2. Examples of personnel grounding accessories: wrist strap, sole grounder, and conductive shoes

Control of RH is also important, since the moisture in the air acts as a conductive path that can bring static charges to the common ground. Thus, a very dry environment is inviting ESD. Care must be exercised though because excessive RH might trigger corrosion. The third category does not actually control the ESD phenomenon per se, but pertains to making devices more resistantto ESD damage. This involves incorporating ESD protection cells in the design of the IC, and the use of physically robust features that can withstand the high current brought about by an ESD event. Proper training of personnel on ESD precautions is also a must. A good ESD control program therefore incorporates a training scheme that will ensure that everyone is aware of the company's ESD controls and SOP's. A regular audit of the manufacturing line for ESD control compliance is important. Check out our ESD audit checklist. Gate Oxide Breakdown In a MOS transistor, the electrical characteristics of the channel through which the carriers flow are controlled by a gate. This gate is isolated from the channel by a thin layer of oxide. Gate oxide breakdown is therefore simply the destruction of this dielectric layer. Gate oxide breakdown is also sometimes referred to as gate oxide rupture, and often manifests as a short or leakage path from the gate to the channel or substrate. Gate oxide breakdowns are usually caused by electrical overstress (EOS) or electrostatic discharge (ESD), although imperfections or defects in the gate oxide layer can also lead to its early life or timedependent breakdown. These defects may be in the form of mobile ions, stray particles, or insufficient coverage.

Die-related Failure Mechanisms and Attributes III


Hot Carrier Effects The term 'hot carriers' refers to either holes or electrons (also referred to as 'hot electrons') that have gained very high kinetic energy after being accelerated by a strong electric field in areas of high field intensities within a semiconductor (especially MOS) device. Because of their high kinetic energy, hot carriers can get injected and trapped in areas of the device where they shouldn't be, forming a space charge that causes the device to degrade or become unstable. The term 'hot carrier effects', therefore, refers to device degradation or instability caused by hot carrier injection. According to the 5th Edition Hitachi Semiconductor Device Reliability Handbook, there are four (4) commonly encountered hot carrier injection mechanisms. These are 1) the drain avalanche hot carrier injection; 2) the channel hot electron injection; 3) the substrate hot electron injection; and 4) the secondary generated hot electron injection. The drain avalanche hot carrier (DAHC) injection is said to produce the worst device degradation under normal operating temperature range. This occurs when a high voltage applied at the drain under nonsaturated conditions (VD>VG) results in very high electric fields near the drain, which accelerate channel carriers into the drain's depletion region. Studies have shown that the worst effects occur when VD = 2VG. The acceleration of the channel carriers causes them to collide with Si lattice atoms, creating dislodged electron-hole pairs in the process. This phenomenon is known as impact ionization, with some of the displaced e-h pairs also gaining enough energy to overcome the electric potential barrier between the silicon substrate and the gate oxide. Under the influence of drain-to-gate field, hot carriers that surmount the substrate-gate oxide barrier get injected into the gate oxide layer where they are sometimes trapped. This hot carrier injection process occurs mainly in a narrow injection zone at the drain end of the device where the lateral field is at its maximum. Hot carriers can be trapped at the Si-SiO2 interface (hence referred to as 'interface states') or within the oxide itself, forming a space charge (volume charge) that increases over time as more charges are trapped. These trapped charges shift some of the characteristics of the device, such as its threshold voltage (Vth) and its conveyed conductance (gm).

Figure 1. DAHC injection involves impact ionization of carriers near the drain area; source: Hitachi Semiconductor Reliability Handbook Injected carriers that do not get trapped in the gate oxide become gate current. On the other hand, majority of the holes from the e-h pairs generated by impact ionization flow back to the substrate, comprising a large portion of the substrate's drift current. Excessive substrate current may therefore be an indication of hot carrier degradation. In gross cases, abnormally high substrate current can upset the balance of carrier flow and facilitate latch-up. Channel hot electron (CHE) injection occurs when both the gate voltage and the drain voltage are significantly higher than the source voltage, with VGVD. Channel carriers that travel from the source to

the drain are sometimes driven towards the gate oxide even before they reach the drain because of the high gate voltage.

Figure 2. CHE injection involves propelling of carriers in the channel toward the oxide even before they reach the drain area; source: Hitachi Semiconductor Reliability Handbook

Substrate hot electron (SHE) injection occurs when the substrate back bias is very positive or very negative, i.e., |VB|>> 0. Under this condition, carriers of one type in the substrate are driven by the substrate field toward the Si-SiO2 interface. As they move toward the substrate-oxide interface, they further gain kinetic energy from the high field in surface depletion region. They eventually overcome the surface energy barrier and get injected into the gate oxide, where some of them are trapped.

Figure 3. SHE injection involves trapping of carriers from the substrate; source: Hitachi Semiconductor Reliability Handbook Secondary generated hot electron (SGHE) injection involves the generation of hot carriers from impact ionization involving a secondary carrier that was likewise created by an earlier incident of impact ionization. This occurs under conditions similar to DAHC, i.e., the applied voltage at the drain is high or VD>VG, which is the driving condition for impact ionization. The main difference, however, is the influence of the substrate's back bias in the hot carrier generation. This back bias results in a field that tends to drive the hot carriers generated by the secondary carriers toward the surface region, where they further gain kinetic energy to overcome the surface energy barrier.

Figure 4. SGHE injection involves hot carriers generated by secondary carriers; source: Hitachi Semiconductor Reliability Handbook Hot carrier effects are brought about or aggravated by reductions in device dimensions without corresponding reductions in operating voltages, resulting in higher electric fields internal to the device.

Problems due to hot carrier injection therefore constitute a major obstacle towards higher circuit densities. Recent studies have even shown that voltage reduction alone will not eliminate hot carrier effects, which were observed to manifest even at reduced drain voltages, e.g., 1.8 V. Thus, optimum design of devices to minimize, if not prevent, hot carrier effects is the best solution for hot carrier problems. Common design techniques for preventing hot carrier effects include: 1) increase in channel lengths; 2) n+ / n-double diffusion of sources and drains; 3) use of graded drain junctions; 4) introduction of self-aligned n- regions between the channel and the n+ junctions to create an offset gate; and 5) use of buried p+ channels. Hot carrier phenomena are accelerated by low temperature, mainly because this condition reduces charge detrapping. A simple acceleration model for hot carrier effects is as follows: AF = R2 / R1 ([Ea/k] [1/T1-1/T2] + C [V2-V1]) AF = e where: AF = acceleration factor of the mechanism; R1 = rate at which the hot carrier effects occur under conditions V1 and T1; R2 = rate at which the hot carrier effects occur under conditions V2 and T2; V1 and V2 = applied voltages for R1 and R2, respectively; T1 and T2 = applied temperatures (deg K) for R1 and R2, respectively; Ea = -0.2 eV to -0.06 eV; and C = a constant. Junction Burn-out Junction burn-out refers to the destruction of a p-n junction as a result of excessive power dissipation from an electrical overstress (EOS) or electrostatic discharge (ESD) event. It is usually in the form of a silicon meltdown at the junction itself, causing the junction to become open or shorted. Junction Spiking See Contact Migration. Metal Burn-out Metal burn-out refers to the gross destruction of a metal line from excessive current or power dissipation. This is the most obvious attribute of gross electrical overstress (EOS) damage, although not all EOSdamaged devices will exhibit a metal burn-out. Metal burn-outs are often accompanied by carbonized plastic, metal reflow, and discoloration of the metal around it. Metal lines that become open after a metal burn-out are said to have 'fused.' The photo attached to the article on EOS shows metal burn-outs. On the right is another photo of a failure site with metal burn-outs. Mobile Ionic Contamination Mobile ionic contamination refers to the presence of mobile ions such as Na+, Cl-, and K+ in the device structures of an integrated circuit. These mobile ions can come from the environment, humans, wafer processing materials, and packaging materials. Mobile ionic contamination is commonly observed in the gate oxide of a MOS transistor. These ions can accumulate and cause charge build-ups that can shift the gate threshold of the MOS transistor. Inversion channels may also form in MOS transistors. In bipolar devices, mobile ions can affect carrier concentrations, changing the beta of the transistor.

Mobile ions respond to temperature and voltage, so failures due to mobile ionic contamination can be accelerated by burn-in. Mobile ionic contamination failures can also be made to recover by subjecting the device to unbiased bake, since this will redistribute the ions by promoting their random movement. Thus, a device is most likely a mobile ionic contamination failure if it fails after burn-in but recovers after unbiased bake. Oxide Rupture See Dielectric Breakdown. Silicon Nodules Silicon Nodules are silicon aggregates that come out of silicon-doped aluminum metal lines, causing the device to fail in several ways. Here are some key points about silicon nodules: 1) The aluminum metal lines used in die circuits are doped with silicon atoms in a very controlled manner to enhance their properties. A typical process involves sintering or alloying at 400-450 deg C, wherein the aluminum lines are doped with about 1-2% silicon. 2) During this alloying process, not all of the silicon dopants are dissolved in the aluminum metal lines. Instead of going into the solution, some Si atoms remain as silicon precipitates. Only about 0.4% silicon dissolves in the aluminum solution. 3) As the metal is cooled down after the alloying process, more silicon atoms separate from and come out of the aluminum solution. 4) The elemental silicon precipitates existing in the metal (as discussed in # 2) act as nucleation sites for silicon atoms that emerge from the solution during the cool-down phase. The silicon atoms that nucleate eventually form larger aggregates of silicon that are known as silicon nodules. 5) Silicon nodules grow bigger with long exposure to elevated temperatures. Studies have shown that silicon nodules can attain diameters greater than 1 micron. 6) The growth of silicon nodules to large diameters exert stress on the metal lines. In fact, narrow metal lines, i.e., those whose widths are less than 3 microns, can fracture and become open in the presence of silicon nodules with diameters greater than 1 micron. This phenomenon is often referred to as 'aluminum stress cracking.' 7) Aluminum stress cracking is aggravated by factors other than silicon nodules. During sputterdeposition of the aluminum, for instance, nitrogen may be trapped within the layer, producing additional strain on the aluminum. Differences among the coefficients of thermal expansion of silicon, silicon dioxide, and aluminum also result in stresses within the die circuit that can aggravate aluminum cracking. 8) Aside from aluminum stress cracking, the formation of silicon nodules on bond pads also impede wire bonding. As a result, excessive silicon nodule formation on bond pads has been confirmed to cause ball bond lifting issues as well.

Slow Charge Trapping Slow charge trapping refers to the long-term retention of electrons in the gate oxide of a MOS device due to the presence of imperfections in the gate oxide interface. These imperfections or 'traps' include structural damage, defects, and impurities in the oxide. Thus, improved oxide growth to minimize trap density will minimize the occurrence of slow trapping.

Slow trapping is prevalent in memory devices that require carrier movement in the oxide for proper operation. Trapped charges in the oxide can shift the threshold voltage of the device. Time-Dependent Dielectric Breakdown (TDDB)

Bond Lifting
Bond lifting refers to any of several phenomena in which a wire bond that connects the device to the outside world becomes detached from its position, resulting in loss or degradation of electrical and mechanical connection between that bond and its bonding site. In this context, a bond may be one that attaches to a bond pad of the die (also referred to as the first bond) or one that attaches to a lead or post of the package (also referred to as the second bond). First bonds are usually in the form of gold ball bonds or aluminum wedge bonds, while second bonds are usually gold or aluminum crescent bonds (also known as 'fishtail' bonds). Ball bond lifting, or simply ball lifting, is the detachment of a ball bond from the bond pad of a semiconductor device. It can be due to a variety of factors. Poor wire bond equipment set-up and bond pad surface contamination are primary causes of ball lifting. Poor set-up includes improper wirebond parameter settings, unstable workpiece holders, and worn-out wirebonding tools. These result in poor initial welding and inadequate intermetallic formation between the bond pad and the ball. Ball lifting can also be due to contaminants on the bond pad, which act as barriers between the ball and the bond pad. Common contaminants that inhibit good bonding include unetched glass, unremoved photoresist, and Si saw dust. Resin bleed-out from the die attach material can also impede good bonding and result in ball lifting. Halides such as Cl on the bond pad can trigger corrosion, which is again another source of ball lifting. A disturbed or uneven bond pad surface also inhibits bonding. Excessive probe digging results in aluminum heaps and an exposed substrate or barrier metal area, which prevent good intermetallic formation. Silicon nodules on the surface of bond pads can also result in poor ball bonding.

Fig 1. Photo of a lifted ball bond

Fig 2. Photos of bond pads w/ contamination that prevented good intermetallics and led to ball lifting

Lifted balls may also result from excessive interdiffusion between the bond pad and ball bond metals. Kirkendall voiding, which is the formation of voids underneath the ball bond due to excessive diffusion of Al from the bond pad to the Au ball bond to form purple plague, is an example of this mechanism. The reflow of thermoplastic die attach material at the bonding temperature also results in ball lifting, because it allows movement of the die during the thermosonic bonding itself. Cratering, which is considered to be a different failure mechanism, can also manifest as a lifted ball, with the Si underneath the bond pad coming off with the bond. Excessive probing and overbonding are common causes of cratering. Similarly, bond pad peel-off, or the mechanism wherein the bond pad metal peels off from the barrier metal or substrate, can result in ball lifting.

Fig 3. Photo of a bond pad crater

Fig 4. Photo of a bond pad metal peel-off that led to ball lifting

Wedge lifting is the detachment of a wedge bond from the bond pad or bonding post, or the crescent bond from the leadframe bonding finger. Like ball lifting, it can be due to a variety of factors, primarily poor wirebonder set-up and bond pad surface contamination. Poor set-up includes improper parameter settings, unstable workpiece holders, and worn-out tools. These result in poor bonding between the bond pad, post, or finger and the wedge. Wedge lifting can also be due to contaminants on the bond pad, post, or bonding finger. Contaminants act as barriers between the wedge and the bonding area. Common contaminants that inhibit good bonding include unetched glass, unremoved photoresist, and Si saw dust. Halides such as Cl on the bond pad can also trigger corrosion, which is another cause of wedge lifting. Silicon nodules on the surface of bond pads with no barrier metallization underneath can also result in poor ball bonding. Sub-bond pad cratering, which is considered to be a different failure mechanism, can also manifest as a lifted wedge, with the Si underneath the bond pad coming off with the bond. Similarly, bond pad peel-off from the barrier metal can result in wedge lifting. Wedge lifting due to metallization peeloff from the bonding post and fingers are likewise possible. Studies have also shown that excessive probing damage on the bond pad can cause wedge lifting.