Professional Documents
Culture Documents
ABSTRACT Reliability of an electronic device, concerning if it can function reliably over its designated
lifetime in the field (such as 10 or 15 years), has become more and more important in today’s safety-critical
applications such as automotive electronics. Traditionally, the ageing has been performed in an offline setting
where stress test has been applied to accelerate the ageing process and then a model is established to make
the futuristic prediction. This kind of offline method has a drawback of not being able to take into account
the factor of the unique operating condition and environment that a device could have experienced in the
field. In this work, we present the first cloud-based ageing monitoring system to the best of our knowledge,
for the Internet-of-Things (IoT) devices. It has many advantages. First of all, one can know of the ageing
status of an IoT device remotely and continuously. Secondly, through data analysis in a cloud server, more
accurate prediction can be achieved. Thirdly, an ageing hazard can be alarmed in advance before it actually
strikes, and thereby pre-caution actions (such as online repair, or even call-for-maintenance request) can be
taken in advance to avoid unnecessary system fatal failure. A prototype system using test chips with built-
in design-for-ageing-monitoring circuitry will be demonstrated with measurement data collected through a
cloud server.
INDEX TERMS Ageing monitoring, reliability, Internet of Things, stress test, ring oscillator.
I. INTRODUCTION amount of time. But after that, the ageing effect may start
The aggressive scaling of integrated circuit technology brings to take its toll and deteriorate a device’s performance or even
numerous advantages, such as the reduction of the form fac- disrupt its functionality.
tor, higher speed, and lower power. However, the concern of There are several different types of ageing mecha-
reliability over a long lifetime as required by safety critical nisms [3]–[6], e.g., Bias Temperature Instability (BTI),
applications (e.g., automobile electronics, biomedical elec- Electro-Migration (EM), Hot Carrier Injection (HCI), and
tronics, and various IoT devices) is getting more and more Thin-Oxide Breakdown, etc. When the ageing effect turns
challenging [1], [2]. For example, an automotive IC is often serious, sudden functional failure could occur. As a result,
required to operate in the field for more than 10 to 15 years a safety-critical electronic device should have a set of anti-
under hostile conditions (e.g., −40◦ C to 150◦ C). ageing solutions during the design stage, manufacturing test
It is well known that the reliability (in terms of the failure stage, and in-the-field stage, so as to meet the target reliability
rate) is a function of time, following a bathtub curve, divided requirement.
into three stages - including the infant mortality stage, normal The ageing prediction can be done statically by offline
lifetime stage, and final ageing stage. Ideally, the high infant ageing analysis, as depicted in Fig. 1. At the cell
mortality stage can be skipped by applying stress tests (with level, the ageing phenomenon of a transistor or an
higher temperature and/or higher VDD) to eliminate the weak interconnect under a specific process technology is first
devices with potential latent defects. It is hopeful that, most characterized by embedded ageing sensors (e.g., ring-
shipped devices can operate in a system during its normal oscillator) to measure the Vth and/or performance degra-
lifetime stage with reasonably low failure rate for certain dation under certain temperature or VDD stress conditions,
(e.g., 195◦ C for 6 months) [7]–[10]. This offline characteriza-
The associate editor coordinating the review of this manuscript and
tion process produces various fundamental cell-level ageing
approving it for publication was Giovanni Merlino. models.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
135964 VOLUME 7, 2019
G.-H. Lian et al.: Cloud-Based Online Ageing Monitoring for IoT Devices
FIGURE 4. The user, cloud, and IoT edge parts of our cloud-based ageing
monitoring system [19].
Just like the timing model for a standard cell, our ‘‘accel-
erated ageing model’’ are derive in three cases: (1) worst-
case, (2) typical-case, and (3) best-case, where the worst-case
‘‘accelerated ageing model’’ is acquired by taking the top
envelop of those ‘‘accelerated ageing data’’ at each record-
ing time obtained during the accelerated ageing process.
Similarly, the best-case is acquired by taking the bottom
FIGURE 5. The overall flow for ageing and remaining lifetime prediction. envelop, while the typical-case is acquired by taking the
average ‘‘accelerated ageing data’’. As shown in Fig. 6, our
accelerated ageing process reaches a 10% ageing on these test
It is divided into four phases. Phase 1 and Phase 2 are one- chips in 14 days in the worse case, 24 days in the typical case,
time effort made on a number of selected ICs after they and 32 days in the best case. We use the following terms to
are fabricated. On the other hand, Phase 3 and Phase 4 are assist our subsequent discussions.
performed in the field at pre-defined monitoring intervals for • Tworst : The time needed to reach an ageing threshold
each edge device. (e.g., 10% ageing) using the worst-case accelerated age-
Phase 1 (Perform Accelerated Ageing Process): ing model.
It is known that stress test can be applied to accelerate • Ttypical : The time needed to reach an ageing threshold
the ageing process. This phase is to observe how a handful using the typical-case accelerated ageing model.
of ICs selected for characterization during the offline testing • Tbest : The time needed to reach an ageing threshold
will age under stressed conditions. For example, we apply (e.g., 10% ageing) using the best-case accelerated age-
a ‘‘boosted supply voltage’’ 2 times the rated VDD level ing model.
in our test case, and then, their aged behaviors (reflected Then, in our test case, Tworst = 14 days, Ttypical =
by the average ROCP values) are recorded over time. The 24 days, Tbest = 32 days. Taking Ttypical as a reference,
ageing acceleration process is continued until the IC under we further calculated two ratios to be used later when
stress test has aged beyond a preset threshold (say 10%). For we conduct the remaining lifetime prediction, namely, the
example, in our test case, we have three ICs under stress, and Ratio(worst−to−typical) = 14/24 = 0.58 (or -42%), and the
TABLE 1. The ageing prediction error (%) versus the coarse- stretching
parameter, n.
FIGURE 10. An IoT edge device harboring one of our test chips.
its ‘‘retire time in the typical case’’. Then, the retire time
in the typical case is multiplied by the two ratios we have
derived previously in the ‘‘accelerated ageing models’’ - i.e.,
Ratio(worst−to−typical) and Ratio((best−to−typical) - to derive the
retire times in the worst case, and the retire time in the best
case. Once we know the retire times, we can calculate the
remaining lifetimes as follows:
(Remaining Lifetime) = (Retire Time) - (Current Time)
E. APPLICATION NOTES
In reality, an IC could have many functional blocks. Each
functional block could experience its own unique workload
and thus age at a different pace. Even though the proposed
methodology is only demonstrated on the ring oscillators
used as the ageing sensors, it can be integrated with other
run-time ageing sensors as well [21]–[25]. Each ageing sen-
sor is used to collect the online ageing history of a specific
functional block (such as cache memory, SRAM memory,
logic blocks, etc.), and then based on the prebuilt accel-
FIGURE 11. Final ageing model and remaining retire times (for one
fabricated chip). erated ageing model of that specific functional block and
the workload-aware online ageing history, one can thereby
accurately predict the aging status and the remaining lifetime
of each function block. At an ‘‘ageing monitoring center’’,
console interface, with additional 400 lines of Javascript code a scoreboard can be maintained to keep track of the ageing
to communicate with the Cloud. condition of each individual functional block.
(2) In the Cloud part, there are four software components:
including User communication agent, Database manager,
PVT and ageing prediction, and Edge communication agent. V. CONCLUSION
The two communication agent provides bridge between User For safety-critical applications, a convincing methodology
part and an Edge part for transmitting commands, ROCP that can demonstrate how long an IC can operate reliably
raw data, and PVT and ageing information, with about under the influence of ageing is desperately needed. Tradi-
850 lines of PHP code. The database manager has 142 lines tional ways of using purely software-based prediction may
of PHP+SQL code to support {insert, delete, and search} not be adequate, as the results could be rough. In light of
database functions. The ROCP modeling and PVT and this, we have proposed a more accurate method in this paper.
ageing prediction software component is the major software Our contributions can be summarized in two aspects. First,
component of this system, realized by 1360 lines of code the ageing and lifetime prediction can become more credible
in both Perl and Python language. It is responsible for pro- since it is derived by not only the offline ‘‘accelerated age-
cess calibration, temperature-aware VDD-drop prediction, ing model’’, but also online ‘‘ageing history’’ under normal
and remaining lifetime estimation. workload. Also, one can produce a unique ‘‘lifetime’’ pre-
(3) In the Edge part, there are 336 lines of C code, used to diction for each individual IC. Second, an ageing monitoring
perform wireless network connection (using API functions system contains both hardware and software, and involves the
provided with LinkIt-ONE) and edge device regulation. data analysis for a larger number of ICs spreading around
In the following, we present measurement results of our the globe. By adopting a cloud-based method, the system
cloud-based monitoring system for ageing prediction. integration and maintenance become easier as well. Measure-
ment data of test chips have been used to demonstrate the
effectiveness of the proposed methodology.
D. AGEING AND REMAINING LIFETIME PREDICTION
We have applied the proposed ‘‘accelerated ageing process’’
ACKNOWLEDGMENT
on three test chips to derive the worst-case, typical-case, and
The authors would like to thank the Taiwan Semiconductor
worst-case ‘‘accelerated ageing models’’. Then, we use the
Research Institute (TSRI) for providing the access to the EDA
three models to predict the ‘‘final ageing model’’ for one chip
tools [26].
by the coarse-stretching and fine-stretching operations. The
retire times of this chip are indicated in Fig. 11, as {592,
1020, 1357} days, in the worst case, the typical case, and the REFERENCES
best case, respectively. With the information of these retire [1] V. Prasanth, D. Foley, and S. Ravi, ‘‘Demystifying automotive safety and
times, the remaining lifetime can be calculated thereby. For security for semiconductor developer,’’ in Proc. IEEE Int. Test Conf. (ITC),
Oct./Nov. 2017, pp. 1–10.
example, if the retire time is predicted when the chip has [2] G. A. Klutke, P. C. Kiessler, and M. A. Wortman, ‘‘A critical look at the
been operated for 80 days, then its remaining lifetime would bathtub curve,’’ IEEE Trans. Rel., vol. 52, no. 1, pp. 125–129, Mar. 2003.
be{592-80, 1020-80, 1357-80} = {512, 940, 1277} days = [3] J. H. Stathis and S. Zafar, ‘‘The negative bias temperature instability
{1.4, 2.6, 3.5} years in the worst case, the typical case, and in MOS devices: A review,’’ Microelectron. Rel., vol. 46, nos. 2–4,
pp. 270–286, 2006.
the best case, respectively. Note that these lifetimes may seem [4] T. Wang, L.-P. Chiang, N.-K. Zous, C.-F. Hsu, L.-Y. Huang, and
relatively short. It is partly because we operate this test chip T.-S. Chao, ‘‘A comprehensive study of hot carrier stress-induced drain
in a non-stop manner when we collected its ‘‘normal ageing leakage current degradation in thin-oxide n-MOSFETs,’’ IEEE Trans.
Electron Devices, vol. 46, no. 9, pp. 1877–1882, Sep. 1999.
data’’. If a chip is operated in a more relaxed manner (e.g.,
[5] M. Kimura, ‘‘Field and temperature acceleration model for time-dependent
spending most of its time in the OFF-state), then it could age dielectric breakdown,’’ IEEE Trans. Electron Devices, vol. 46, no. 1,
more slowly. pp. 220–229, Jan. 1999.
[6] K. N. Tu, ‘‘Recent advances on electromigration in very-large-scale- [24] N. Rohbani and S.-G. Miremadi, ‘‘A low-overhead integrated aging and
integration of interconnects,’’ J. Appl. Phys., vol. 94, no. 9, pp. 5451–5473, SEU sensor,’’ IEEE Trans. Device Mater. Rel., vol. 18, no. 2, pp. 205–213,
2003. Jun. 2018.
[7] K. K. Kim, W. Wang, and K. Choi, ‘‘On-chip aging sensor circuits for [25] H. A. Balef, K. Goossens, and J. P. de Gyvez, ‘‘Chip health tracking using
reliable nanometer MOSFET digital circuits,’’ IEEE Trans. Circuits Syst. dynamic in-situ delay monitoring,’’ in Proc. IEEE Design, Autom. Test
II, Exp. Briefs, vol. 57, no. 10, pp. 798–802, Oct. 2010. Europe Conf. Exhibit. (DATE), Mar. 2019, pp. 304–307.
[8] B. Jang, J. K. Lee, M. Choi, and K. K. Kim, ‘‘On-chip aging prediction [26] CIC Reference Flow for Cell-Based IC Design, document CIC-DSD-RD-
circuit in nanometer digital circuits,’’ in Proc. IEEE SoC Design Conf. 08-01, Chip Implementation Center, CIC, Taipei, Taiwan, 2008.
(ISOCC), Nov. 2014, pp. 68–69.
[9] S. Majerus, X. Tang, J. Liang, and S. Mandal, ‘‘Embedded silicon
odometers for monitoring the aging of high-temperature integrated cir-
cuits,’’ in Proc. IEEE Nat. Aerosp. Electron. Conf. (NAECON), Jun. 2017,
pp. 98–103. GUAN-HAO LIAN received the B.S. degree in
[10] D. Sengupta and S. S. Sapatnekar, ‘‘Estimating circuit aging due to BTI electronic and computer engineering from the
and HCI using ring-oscillator-based sensors,’’ IEEE Trans. Comput.-Aided National Taiwan University of Science and Tech-
Design Integr. Circuits Syst., vol. 36, no. 10, pp. 1688–1701, Oct. 2017. nology, Taipei, Taiwan, in 2015, and the M.S.
[11] A. Goel and R. J. Graves, ‘‘Electronic system reliability: Collating predic- degree in electrical engineering from National
tion models,’’ IEEE Trans. Device Mater. Rel., vol. 6, no. 2, pp. 258–265, Tsing Hua University, Hsinchu, Taiwan, in 2017.
Jun. 2006. His research interests include PVTA conditions
[12] IEEE Standard Framework for the Reliability Prediction of Hardware monitoring inside an IC and cloud-based system
IEEE Standard 1413, 2009. designs.
[13] D. Lorenz, G. Georgakos, and U. Schlichtmann, ‘‘Aging analysis of circuit
timing considering NBTI and HCI,’’ in Proc. IEEE Int.-Line Test. Symp.,
Jun. 2009, pp. 3–8.
[14] J. G. Elerath and M. Pecht, ‘‘IEEE 1413: A standard for reliability predic-
tions,’’ IEEE Trans. Rel., vol. 61, no. 1, pp. 125–129, Mar. 2012.
[15] Z. Abuhamdeh, V. D’Alassandro, R. Pico, D. Montrone, A. Crouch, and WEI-YI CHEN received the B.S. degree in electri-
A. Tracy, ‘‘Separating temperature effects from ring-oscillator readings to cal engineering from National Chung Hsing Uni-
measure true IR-drop on a chip,’’ in Proc. IEEE Proc. Int. Test Conf. (ITC), versity, Taiwan, in 2014, and the M.S. degree
Oct. 2007, pp. 1–10. in electrical engineering from National Tsing
[16] Y. Miura, Y. Sato, Y. Miyake, and S. Kajihara, ‘‘On-chip temperature and Hua University, Hsinchu, Taiwan, in 2017. His
voltage measurement for field testing,’’ in Proc. Eur. Test Symp., 2012, research interests include PVT conditions moni-
pp. 28–31. toring inside an IC and system construction on
[17] C.-H. Hsu, S.-Y. Huang, D.-M. Kwai, and Y.-F. Chou, ‘‘Worst-case IR-drop FPGA.
monitoring with 1 GHz sampling rate,’’ in Proc. VLSI Design, Automat.,
Test (VLSI-DAT), Apr. 2013, pp. 1–4.
[18] Y. Miyake, Y. Sato, S. Kajihara, and Y. Miura, ‘‘Temperature and voltage
estimation using ring-oscillator-based monitor for field test,’’ in Proc.
IEEE Asian Test Symp., Nov. 2014, pp. 156–161.
[19] G.-H. Lian, S.-Y. Huang, and W.-Y. Chen, ‘‘Cloud-based PVT monitoring SHI-YU HUANG (S’94–M’97–SM’12) received
system for IoT devices,’’ in Proc. IEEE Asian Test Symp. (ATS), Nov. 2017, the B.S. and M.S. degrees in electrical engineer-
pp. 76–81. ing from National Taiwan University, in 1988 and
[20] C.-W. Tzeng, S.-Y. Huang, and P.-Y. Chao, ‘‘Parameterized all-digital PLL 1992, respectively, and the Ph.D. degree in elec-
architecture and its compiler to support easy process migration,’’ IEEE trical and computer engineering from the Univer-
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 3, pp. 621–630, sity of California at Santa Barbara, Santa Barbara,
Mar. 2014. in 1997. He has been on the Faculty of Electrical
[21] R. Nazari, N. Rohbani, H. Farbeh, Z. Shirmohammadi, and S. G. Miremadi,
Engineering Department, National Tsing Hua Uni-
‘‘A2 CM2 : Aging-aware cache memory management technique,’’ in Proc.
versity, Taiwan, since 1999. His research interests
IEEE CSI Symp. Real-Time Embedded Syst. Technol. (RTEST), Oct. 2015,
pp. 1–8.
include VLSI design, automation, and testing, with
[22] M. Karimi, N. Rohbani, and S.-G. Miremadi, ‘‘A low area overhead a current emphasis on all-digital phase-locked loop (ADPLL) design and its
NBTI/PBTI sensor for SRAM memories,’’ IEEE Trans. Very Large Scale application in parametric fault testing and monitoring in 3-D ICs. He has
Integr. (VLSI) Syst., vol. 25, no. 11, pp. 3138–3151, Nov. 2017. served as a Program Co-Chair or the General Co-Chair in a number of sym-
[23] A. Vijayan, A. Koneru, S. Kiamehr, K. Chakrabarty, and M. B. Tahoori, posia (2004, 2009 Asian Test Symposium, and 2014, 2015 VLSI, Design,
‘‘Fine-grained aging-induced delay prediction based on the monitoring of Automation, and Test Symposium). He served as an Associate Editor for the
run-time stress,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., IEEE TRANSACTIONS ON COMPUTERS, from 2015 to 2018.
vol. 37, no. 5, pp. 1064–1075, May 2018.