You are on page 1of 8

Received August 28, 2019, accepted September 14, 2019, date of publication September 17, 2019, date of current

version October 1, 2019.


Digital Object Identifier 10.1109/ACCESS.2019.2941998

Cloud-Based Online Ageing Monitoring


for IoT Devices
GUAN-HAO LIAN, WEI-YI CHEN, AND SHI-YU HUANG , (Senior Member, IEEE)
Electrical Engineering Department, National Tsing Hua University, Hsinchu 30013, Taiwan
Corresponding author: Shi-Yu Huang (syhuang@ee.nthu.edu.tw)
This work was supported in part by the Ministry of Science and Technology (MOST) of Taiwan under Grant MOST-108-2221-E-007-084.

ABSTRACT Reliability of an electronic device, concerning if it can function reliably over its designated
lifetime in the field (such as 10 or 15 years), has become more and more important in today’s safety-critical
applications such as automotive electronics. Traditionally, the ageing has been performed in an offline setting
where stress test has been applied to accelerate the ageing process and then a model is established to make
the futuristic prediction. This kind of offline method has a drawback of not being able to take into account
the factor of the unique operating condition and environment that a device could have experienced in the
field. In this work, we present the first cloud-based ageing monitoring system to the best of our knowledge,
for the Internet-of-Things (IoT) devices. It has many advantages. First of all, one can know of the ageing
status of an IoT device remotely and continuously. Secondly, through data analysis in a cloud server, more
accurate prediction can be achieved. Thirdly, an ageing hazard can be alarmed in advance before it actually
strikes, and thereby pre-caution actions (such as online repair, or even call-for-maintenance request) can be
taken in advance to avoid unnecessary system fatal failure. A prototype system using test chips with built-
in design-for-ageing-monitoring circuitry will be demonstrated with measurement data collected through a
cloud server.

INDEX TERMS Ageing monitoring, reliability, Internet of Things, stress test, ring oscillator.

I. INTRODUCTION amount of time. But after that, the ageing effect may start
The aggressive scaling of integrated circuit technology brings to take its toll and deteriorate a device’s performance or even
numerous advantages, such as the reduction of the form fac- disrupt its functionality.
tor, higher speed, and lower power. However, the concern of There are several different types of ageing mecha-
reliability over a long lifetime as required by safety critical nisms [3]–[6], e.g., Bias Temperature Instability (BTI),
applications (e.g., automobile electronics, biomedical elec- Electro-Migration (EM), Hot Carrier Injection (HCI), and
tronics, and various IoT devices) is getting more and more Thin-Oxide Breakdown, etc. When the ageing effect turns
challenging [1], [2]. For example, an automotive IC is often serious, sudden functional failure could occur. As a result,
required to operate in the field for more than 10 to 15 years a safety-critical electronic device should have a set of anti-
under hostile conditions (e.g., −40◦ C to 150◦ C). ageing solutions during the design stage, manufacturing test
It is well known that the reliability (in terms of the failure stage, and in-the-field stage, so as to meet the target reliability
rate) is a function of time, following a bathtub curve, divided requirement.
into three stages - including the infant mortality stage, normal The ageing prediction can be done statically by offline
lifetime stage, and final ageing stage. Ideally, the high infant ageing analysis, as depicted in Fig. 1. At the cell
mortality stage can be skipped by applying stress tests (with level, the ageing phenomenon of a transistor or an
higher temperature and/or higher VDD) to eliminate the weak interconnect under a specific process technology is first
devices with potential latent defects. It is hopeful that, most characterized by embedded ageing sensors (e.g., ring-
shipped devices can operate in a system during its normal oscillator) to measure the Vth and/or performance degra-
lifetime stage with reasonably low failure rate for certain dation under certain temperature or VDD stress conditions,
(e.g., 195◦ C for 6 months) [7]–[10]. This offline characteriza-
The associate editor coordinating the review of this manuscript and
tion process produces various fundamental cell-level ageing
approving it for publication was Giovanni Merlino. models.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
135964 VOLUME 7, 2019
G.-H. Lian et al.: Cloud-Based Online Ageing Monitoring for IoT Devices

FIGURE 2. Ring-Oscillator based monitor [19].

FIGURE 1. Traditional static ageing analysis to derive the potential


lifetime of an IC based on a given set of cell-level ageing models and
circuit operation statistics. fabricated chips to derive the so-called collective ‘‘acceler-
ated ageing models’’.
(3) For the online operation, each individual chip sends its
ageing data periodically to a designated cloud server as ‘‘age-
With the cell-level ageing models in place, a designer ing history’’. Then, over the time, more accurate IC-specific
can then perform the ageing analysis [10]–[14], taking into ageing and lifetime prediction is performed in the cloud and
account the Circuit Operation Statistics (COS), such as signal warning message is sent to each specific chips with the threat
probability, current information at each node, and tempera- of ageing. During this process, the collective ‘‘accelerated
ture profile, etc. In some sense, COS information reflects how ageing models’’ is converted into a IC-specific ageing model,
rigorously each transistor and interconnect has been exer- which takes into account its unique ‘‘ageing history’’.
cised or stressed. Then, a component-level ageing model can The proposed cloud-based online ageing monitoring and
be derived, describing the performance degradation of each prediction is particularly effective as compared to the previ-
circuit block or component over the time. In some previous ous works in two aspects. First, the genuine COS informa-
works, COS might be referred to as workload. Throughout tion through an IC’s life cycle is faithfully reflected in the
this paper, we will use these two terms without distinction. ageing history, to render accurate lifetime prediction. Sec-
Next, at the chip level, the ageing model for each component ond, the ageing threat can be detected quickly and therefore
can be combined to predict the overall lifetime of a chip, and preventive measures can be taken in a more timely manner
hopefully revealing the ageing-vulnerable spots at the same to avert the ageing-induced catastrophe. The preventive mea-
time. sures could take multiple forms, such as reconfiguration of
The above ageing analysis is becoming more and more the functional units in a chip, immediate replacement by a
inevitable during the design process for a reliability conscious refreshing unit, or a warning signal calling for maintenance.
IC product. Even though it may not be very accurate, it does We believe that our work is the first to use the ‘‘ageing
provide first-degree estimation and early feedback to the history’’ in conjunction with the ‘‘accelerated ageing model’’
design process for lifetime enhancement. to produce IC-specific ageing model, and thereby making
The inaccuracy of static ageing analysis mainly arises from accurate lifetime estimation possible for each individual IC.
the hypotheses used in mapping the cell-level ageing models The rest of this paper is organized as follows. In Section II,
to the component-level models. These hypotheses may not be we first provide preliminaries. In Section III, we present the
very consistent with the reality. For example, it is an still issue proposed online ageing monitoring and prediction method-
how a cell-level model derived based on measurement data ology – including the ageing monitor, the overall system
over a timing window of just 6 months can be extrapolated architecture, and the model translation scheme and system
to a extended lifetime such as 10 years. Also, the assumed integration. In Section IV, we report the silicon results based
COS information is often too simplified and does not fully on fabricated test chips, and in Section V we conclude.
reflect the true situations in the field. In reality, each chip
is likely to have an unique ageing process based on its own II. PRELIMINARIES
Circuit Operation Statistics (COS). To take this practicality It is quite common that Ring-Oscillators (RO) are used as
into account, a more integrated method is needed. monitors for the measuring the effects of Process, VDD
To address this challenge, we propose a highly inte- voltage, Temperature, and Ageing (or jointly called PVTA
grated cloud-based online ageing monitoring and predic- effects) [15]–[19]. Fig. 2 shows an example. The ring oscil-
tion methodology in this work, with the following key lator produces a clock signal, with the clock period, called
features: ROCP, bearing the information of combined PVTA effects.
(1) Ageing monitors are inserted in the chip to collect After a time-to-digital converter, the time-varying clock
online ageing information, as some previous works [9], [10]. period is converted to some digital codes for further analysis,
But how these data are utilized afterwards is different. some capturing the average ROCP, and some capturing the
(2) An offline ageing characterization process (with some worst-case ROCP. It is possible to further decipher from the
assumed stress test procedure) is performed on a number of resulting digital ROCP codes the individual effect of each

VOLUME 7, 2019 135965


G.-H. Lian et al.: Cloud-Based Online Ageing Monitoring for IoT Devices

contributing factor, leading to useful information about the


process status, the average and worst-case VDD voltages, and
the temperature, respectively [15], [19].
In this work, we use ROs for the purpose of ageing monitor-
ing. In general, it is similar to the monitoring of PVT effects,
but different in the following two aspects:
(1) Ageing effect occurs much more slowly than the effects
of the VDD voltage (changing in nano-seconds), and the
temperature (changing in mini-seconds). Usually, it takes
weeks or even months for ageing effect to be noticeable under
a typical workload. Therefore, one only needs to take the
ageing samples once in a while (e.g., one sample per day).
(2) For monitoring dynamic VDD drop, one need to FIGURE 3. A two-level row-column daisy-chain for transferring a selected
capture and analyze not only the average, but also the worst- RO_CLK to the CPM circuit [19].
case Ring-Oscillator Clock Period (ROCP) over a monitoring
interval. For monitoring ageing effects, we only need to (2) In the cloud, the ageing histories of all ICs of the same
capture the average ROCP values at some specific sampling type can be gathered and compared, and therefore, an IC with
times. However, these average ROCP values need to be mea- abnormal ageing conditions from the population can be easily
sured at a ‘‘proper condition’’ so the average performance of identified as an outlier in terms of some features. This kind of
an RO monitor is only affected by the ageing effect alone, not peer-based over-ageing detection cannot be easily performed
by any other wanted PVT effects. In our system, we use the when each IC is only monitored individually inside its own
following condition: system.
Whenever we attempt to sample an average ROCP value (3) As discussed earlier, a monitoring system is composed
bearing only the ageing effect, we set the chip to operate in of not only hardware, but also software. In a cloud-based
the idle mode for a while (e.g., a few seconds) and so it will monitoring system, the software responsible for the ‘‘ageing
cool down to the ambient temperature with only little leakage data analysis’’ can be run in the cloud, rather than on the
current. Also, the current drained from the power/ground edge device containing the IC. Such an arrangement is more
pads are stable and little, and thus the VDD voltage driving modular and flexible. Whenever there is a new version of
our RO monitor is also stable to remove the unwanted effect of ‘‘ageing analysis algorithm’’, we only need to install it in the
dynamic VDD variation. The resulting average ROCP value cloud. There is no need to update the software at the numerous
is further compared to its time zero reference (recorded when edge devices, which can save a lot of efforts and costs.
the chip was just installed on the system board), and their In terms of the architecture, a cloud-based monitoring
difference reflects only the accumulated ageing effect from system can be divided into three different domains as shown
time zero up to the current sampling time. in Fig. 4, namely, (1) User domain, (2) Cloud domain, and
Certainly, one may need many RO monitors inserted in the (3) IoT Edge domain. This framework is not only scalable,
chip, one for each selected site of interest. These RO monitors but also easy-to-integrate in a way that the IoT edge devices
can be arranged into a special architecture, as shown in Fig. 3, are only responsible for producing the raw data, while the
so they can share the same clock period measurement circuit. complicated back-end data analysis (which maps the raw data
At any given time, only one RO monitor is active and trans- into meaningful ageing information) is performed at the cloud
fers its output clock signal, RO_CLK, to the Clock Period server. By such a hardware/software collaboration, the system
Measurement (CPM) circuit. becomes more flexible in its ability to distribute the functions
among the edge devices and the server in a cost-effective
III. PROPOSED CLOUD-BASED AGEING MONITORING manner.
In this section, we introduce the proposed cloud-based online In more detail, the User part comprises a ‘‘control console’’
ageing monitoring methodology, including its benefits, archi- that regulates the overall monitoring flow of all IoT devices.
tecture, operation flow, and test chip design. The Cloud part is located in a designated server (e.g., one
in our lab), which stores the gathered raw data (mainly the
A. OVERVIEW average ROCP values) and also supports the subsequent data
For an IC used in an Internet of Things (IoT) system, the data analysis to predict ageing and lifetime. The IoT Edge part is
produced during the monitoring process can be sent to the associated with each device under monitoring, responsible for
cloud via existing wireless internet connection [19]. Such a producing the ROCP raw data and transmitting the raw data
cloud-based monitoring methodology can have several bene- through the internet to the cloud part.
fits.
(1) The ageing history of an IC (consisting of the average B. AGEING MONITORING FLOW
ROCP samples recorded over its lifetime) can be checked at In this subsection, we explain our monitoring flow for ‘‘age-
anytime and from anywhere in the world. ing and remaining lifetime prediction’’, as outlined in Fig. 5.

135966 VOLUME 7, 2019


G.-H. Lian et al.: Cloud-Based Online Ageing Monitoring for IoT Devices

we recorded the average ROCP values of all 16 RO monitors


inserted in the IC once per day until all three ICs have aged
by a threshold value of 10% (as compared to their time zero
references).
Phase 2 (Derive ‘‘Accelerated Ageing Model’’):
Having the results from the above ageing acceleration
process, an ‘‘accelerated ageing model’’ is built by fitting the
accelerated ageing data. We use a polynomial of degree 5
as the fitting function as shown in Fig. 6. The resulting
‘‘accelerated ageing model’’ is a function of time, denoted
as A(t).

FIGURE 4. The user, cloud, and IoT edge parts of our cloud-based ageing
monitoring system [19].

FIGURE 6. Accelerated ageing models derived by curve-fitting using


polynomials of degree 5. Note that the accelerated ageing data in this
figure are based on actual data based on the measurement results of
fabricated chips.

Just like the timing model for a standard cell, our ‘‘accel-
erated ageing model’’ are derive in three cases: (1) worst-
case, (2) typical-case, and (3) best-case, where the worst-case
‘‘accelerated ageing model’’ is acquired by taking the top
envelop of those ‘‘accelerated ageing data’’ at each record-
ing time obtained during the accelerated ageing process.
Similarly, the best-case is acquired by taking the bottom
FIGURE 5. The overall flow for ageing and remaining lifetime prediction. envelop, while the typical-case is acquired by taking the
average ‘‘accelerated ageing data’’. As shown in Fig. 6, our
accelerated ageing process reaches a 10% ageing on these test
It is divided into four phases. Phase 1 and Phase 2 are one- chips in 14 days in the worse case, 24 days in the typical case,
time effort made on a number of selected ICs after they and 32 days in the best case. We use the following terms to
are fabricated. On the other hand, Phase 3 and Phase 4 are assist our subsequent discussions.
performed in the field at pre-defined monitoring intervals for • Tworst : The time needed to reach an ageing threshold
each edge device. (e.g., 10% ageing) using the worst-case accelerated age-
Phase 1 (Perform Accelerated Ageing Process): ing model.
It is known that stress test can be applied to accelerate • Ttypical : The time needed to reach an ageing threshold
the ageing process. This phase is to observe how a handful using the typical-case accelerated ageing model.
of ICs selected for characterization during the offline testing • Tbest : The time needed to reach an ageing threshold
will age under stressed conditions. For example, we apply (e.g., 10% ageing) using the best-case accelerated age-
a ‘‘boosted supply voltage’’ 2 times the rated VDD level ing model.
in our test case, and then, their aged behaviors (reflected Then, in our test case, Tworst = 14 days, Ttypical =
by the average ROCP values) are recorded over time. The 24 days, Tbest = 32 days. Taking Ttypical as a reference,
ageing acceleration process is continued until the IC under we further calculated two ratios to be used later when
stress test has aged beyond a preset threshold (say 10%). For we conduct the remaining lifetime prediction, namely, the
example, in our test case, we have three ICs under stress, and Ratio(worst−to−typical) = 14/24 = 0.58 (or -42%), and the

VOLUME 7, 2019 135967


G.-H. Lian et al.: Cloud-Based Online Ageing Monitoring for IoT Devices

TABLE 1. The ageing prediction error (%) versus the coarse- stretching
parameter, n.

ageing prediction error is minimized. Table 1 lists the ageing


prediction error for n = 1, 2, . . . , 43. It can be seen that the
error monotonically drops from 3032% to 2.781% (when n
FIGURE 7. Predicted ageing model by coarse stretching. increases from 1 to 42) and then starts to increase slightly to
2.789% when n is further incremented to 43. As a result, this
coarse-stretching procedure will select n = 42 as the final
Ratio(best−to−typical) = 32/24 = 1.33 (or +33%). In some coarse-stretching parameter for this test case.
sense, the variation of accelerated ageing is in a range of (Step 3.2) Derive the fine ageing model, by another suc-
[−42%, 33%]. cessive approximation procedure.
Phase 3 (Derive ‘‘Predicted’’ Ageing Model):
While the accelerated ageing model dictates how a device 2) FINE-STRETCHING OPERATION
will age under a particular stress condition, the ‘‘predicted Slightly stretch the time-axis of the typical-case accelerated
ageing model’’ in our system predicts the actual ageing ageing model called A(t) into a fine ageing model, by the
process under normal workload in the field. This model is following formula:
progressive and specific for each IC, in a sense that it takes
t
into account the real ageing information of each IC when A(t) → A( 1
)
it is operated in the field, and we dynamically update this n+ Ttypical k
‘‘predicted ageing model’’ over the entire lifetime of an edge
Note that n is the previously determined coarse-stretching
device based on the ageing information seen so far for each
parameter, while k is a fine-stretching parameter to be further
specific IC, and store them in the database in the cloud.
decided in this stage.
In general, the predicted ageing model is a time-
In our test case, Ttypical = 24 days. Again, we increase
stretched version of the accelerated ageing model in our
the value of k from 1 gradually, to seek a proper integer
methodology and derived by a successive approximation pro-
value such that the overall ageing prediction error (against
cedure in two stages – coarse-stretching and fine-stretching
the actual ageing history) is minimized.
as described in the following.
(Step 3.1) Derive the coarse ageing model, based on the
TABLE 2. The ageing prediction error (%) versus the fine-stretching
following operation. parameter, k. (Note that n has been fixed at 42 in the previous
coarse-stretching process.)
1) COARSE-STRETCHING OPERATION
Stretch the time-axis of the typical-case accelerated ageing
model called A(t), shown as the BLUE curve in Fig. 7, with
a known stretching time unit: Ttypical , into a function denoted
as A(t/n), shown as the GREEN curve in Fig. 7. That is,
t Table 2 lists the ageing prediction error for k = 1,
A(t) → A( )
n 2, . . . , 14. It can be seen that the error monotonically drops
where n is an integer, or called coarse-stretching param- from 2.7809% to 2.7726% (when k increases from 1 to 13)
eter, which is incrementally determined by our successive and then starts to increase slightly to 2.7742% when n is 14.
approximation procedure, which tries to minimize the ageing As a result, this fine-stretching procedure selects k = 13
prediction error. as the final coarse-stretching parameter for this test case. As
Definition 1(Ageing Prediction Error): At any given time, illustrated in Fig. 8, the final ageing model for this device will
the actual ageing data accumulated so far for a device under be determined as (Ttypical, n, k) = (24 days, 42, 13):
monitoring forms an ageing history. The mean square error t t
between an ‘‘ageing model’’ and the actual ‘‘ageing history’’ A( 1
) = A( )
42 + 13· 24 42.54
can be used for the calculation of the ageing prediction error.
Based on the above definition, the determination of the Phase 4 (Determine the Range of Remaining Lifetime):
coarse-stretching parameter will become easier – it finds Once we have derived the final ageing model in the typ-
a positive integer n by an iterative process such that the ical case, for a specific IoT edge device, we can find out

135968 VOLUME 7, 2019


G.-H. Lian et al.: Cloud-Based Online Ageing Monitoring for IoT Devices

FIGURE 8. Final ageing model (in RED color) after fine-stretching.

FIGURE 10. An IoT edge device harboring one of our test chips.

TABLE 3. List of software components in our cloud-based PVTA


monitoring system.

FIGURE 9. A benchmark circuit ‘‘B17’’ embedded with 16 monitors,


a logic Built-In Self-Test circuit, and a CPM circuit. Note that, in addition
to supporting ageing monitoring, this test chip also supports PVT
monitoring proposed in [19].

its ‘‘retire time in the typical case’’. Then, the retire time
in the typical case is multiplied by the two ratios we have
derived previously in the ‘‘accelerated ageing models’’ - i.e.,
Ratio(worst−to−typical) and Ratio((best−to−typical) - to derive the
retire times in the worst case, and the retire time in the best
case. Once we know the retire times, we can calculate the
remaining lifetimes as follows:
(Remaining Lifetime) = (Retire Time) - (Current Time)

IV. MEASUREMENT RESULTS


A. TEST CHIP DESIGN
A benchmark circuit (B17) adopting the above ageing mon- B. HARDWARE COMPONENTS
itoring methodology has been designed and fabricated as Our test chips are fabricated in a 90nm CMOS process. A chip
shown in Fig. 9. This is also the chip we have used to conduct under monitoring is put into the socket on the device board,
the experiments for PVT (Process, Voltage, and Tempera- which is further connected to the FPGA interface board, sen-
ture) monitoring in [19]. However, the methodology used in sor hub, ambient temperature sensor, and two battery modules
this paper for ageing prediction is very different from those as shown in Fig. 10, in our prototype system. We have built
used in [19], which is more concerned about capturing the three of such prototype systems.
worst-case dynamic VDD drops and the temperature vari-
ation. This test chip design is embedded with 16 monitors C. SOFTWARE COMPONENTS
and a logic BIST (Built-In Self-Test) circuit. The CPM cir- In addition to hardware, there are software components in our
cuit contains a Time-to-Digital Converter [17] as well as a prototype system as listed in Table. 3. These software compo-
cell-based Phase-Locked Loop [20] produced by in-house nents are implemented in six different kinds of programming
compilers. Based on our test flow, the 16 monitors inserted languages, including HTML, Javascript, PHP+SQL, Perl,
would take turn to be observed, each of which produces an Python, and C.
RO_CLK signal reflecting the ageing condition at the site it (1) In the User part, there are about 500 lines of code
monitors. using HTML language to build the overall structure of control

VOLUME 7, 2019 135969


G.-H. Lian et al.: Cloud-Based Online Ageing Monitoring for IoT Devices

E. APPLICATION NOTES
In reality, an IC could have many functional blocks. Each
functional block could experience its own unique workload
and thus age at a different pace. Even though the proposed
methodology is only demonstrated on the ring oscillators
used as the ageing sensors, it can be integrated with other
run-time ageing sensors as well [21]–[25]. Each ageing sen-
sor is used to collect the online ageing history of a specific
functional block (such as cache memory, SRAM memory,
logic blocks, etc.), and then based on the prebuilt accel-
FIGURE 11. Final ageing model and remaining retire times (for one
fabricated chip). erated ageing model of that specific functional block and
the workload-aware online ageing history, one can thereby
accurately predict the aging status and the remaining lifetime
of each function block. At an ‘‘ageing monitoring center’’,
console interface, with additional 400 lines of Javascript code a scoreboard can be maintained to keep track of the ageing
to communicate with the Cloud. condition of each individual functional block.
(2) In the Cloud part, there are four software components:
including User communication agent, Database manager,
PVT and ageing prediction, and Edge communication agent. V. CONCLUSION
The two communication agent provides bridge between User For safety-critical applications, a convincing methodology
part and an Edge part for transmitting commands, ROCP that can demonstrate how long an IC can operate reliably
raw data, and PVT and ageing information, with about under the influence of ageing is desperately needed. Tradi-
850 lines of PHP code. The database manager has 142 lines tional ways of using purely software-based prediction may
of PHP+SQL code to support {insert, delete, and search} not be adequate, as the results could be rough. In light of
database functions. The ROCP modeling and PVT and this, we have proposed a more accurate method in this paper.
ageing prediction software component is the major software Our contributions can be summarized in two aspects. First,
component of this system, realized by 1360 lines of code the ageing and lifetime prediction can become more credible
in both Perl and Python language. It is responsible for pro- since it is derived by not only the offline ‘‘accelerated age-
cess calibration, temperature-aware VDD-drop prediction, ing model’’, but also online ‘‘ageing history’’ under normal
and remaining lifetime estimation. workload. Also, one can produce a unique ‘‘lifetime’’ pre-
(3) In the Edge part, there are 336 lines of C code, used to diction for each individual IC. Second, an ageing monitoring
perform wireless network connection (using API functions system contains both hardware and software, and involves the
provided with LinkIt-ONE) and edge device regulation. data analysis for a larger number of ICs spreading around
In the following, we present measurement results of our the globe. By adopting a cloud-based method, the system
cloud-based monitoring system for ageing prediction. integration and maintenance become easier as well. Measure-
ment data of test chips have been used to demonstrate the
effectiveness of the proposed methodology.
D. AGEING AND REMAINING LIFETIME PREDICTION
We have applied the proposed ‘‘accelerated ageing process’’
ACKNOWLEDGMENT
on three test chips to derive the worst-case, typical-case, and
The authors would like to thank the Taiwan Semiconductor
worst-case ‘‘accelerated ageing models’’. Then, we use the
Research Institute (TSRI) for providing the access to the EDA
three models to predict the ‘‘final ageing model’’ for one chip
tools [26].
by the coarse-stretching and fine-stretching operations. The
retire times of this chip are indicated in Fig. 11, as {592,
1020, 1357} days, in the worst case, the typical case, and the REFERENCES
best case, respectively. With the information of these retire [1] V. Prasanth, D. Foley, and S. Ravi, ‘‘Demystifying automotive safety and
times, the remaining lifetime can be calculated thereby. For security for semiconductor developer,’’ in Proc. IEEE Int. Test Conf. (ITC),
Oct./Nov. 2017, pp. 1–10.
example, if the retire time is predicted when the chip has [2] G. A. Klutke, P. C. Kiessler, and M. A. Wortman, ‘‘A critical look at the
been operated for 80 days, then its remaining lifetime would bathtub curve,’’ IEEE Trans. Rel., vol. 52, no. 1, pp. 125–129, Mar. 2003.
be{592-80, 1020-80, 1357-80} = {512, 940, 1277} days = [3] J. H. Stathis and S. Zafar, ‘‘The negative bias temperature instability
{1.4, 2.6, 3.5} years in the worst case, the typical case, and in MOS devices: A review,’’ Microelectron. Rel., vol. 46, nos. 2–4,
pp. 270–286, 2006.
the best case, respectively. Note that these lifetimes may seem [4] T. Wang, L.-P. Chiang, N.-K. Zous, C.-F. Hsu, L.-Y. Huang, and
relatively short. It is partly because we operate this test chip T.-S. Chao, ‘‘A comprehensive study of hot carrier stress-induced drain
in a non-stop manner when we collected its ‘‘normal ageing leakage current degradation in thin-oxide n-MOSFETs,’’ IEEE Trans.
Electron Devices, vol. 46, no. 9, pp. 1877–1882, Sep. 1999.
data’’. If a chip is operated in a more relaxed manner (e.g.,
[5] M. Kimura, ‘‘Field and temperature acceleration model for time-dependent
spending most of its time in the OFF-state), then it could age dielectric breakdown,’’ IEEE Trans. Electron Devices, vol. 46, no. 1,
more slowly. pp. 220–229, Jan. 1999.

135970 VOLUME 7, 2019


G.-H. Lian et al.: Cloud-Based Online Ageing Monitoring for IoT Devices

[6] K. N. Tu, ‘‘Recent advances on electromigration in very-large-scale- [24] N. Rohbani and S.-G. Miremadi, ‘‘A low-overhead integrated aging and
integration of interconnects,’’ J. Appl. Phys., vol. 94, no. 9, pp. 5451–5473, SEU sensor,’’ IEEE Trans. Device Mater. Rel., vol. 18, no. 2, pp. 205–213,
2003. Jun. 2018.
[7] K. K. Kim, W. Wang, and K. Choi, ‘‘On-chip aging sensor circuits for [25] H. A. Balef, K. Goossens, and J. P. de Gyvez, ‘‘Chip health tracking using
reliable nanometer MOSFET digital circuits,’’ IEEE Trans. Circuits Syst. dynamic in-situ delay monitoring,’’ in Proc. IEEE Design, Autom. Test
II, Exp. Briefs, vol. 57, no. 10, pp. 798–802, Oct. 2010. Europe Conf. Exhibit. (DATE), Mar. 2019, pp. 304–307.
[8] B. Jang, J. K. Lee, M. Choi, and K. K. Kim, ‘‘On-chip aging prediction [26] CIC Reference Flow for Cell-Based IC Design, document CIC-DSD-RD-
circuit in nanometer digital circuits,’’ in Proc. IEEE SoC Design Conf. 08-01, Chip Implementation Center, CIC, Taipei, Taiwan, 2008.
(ISOCC), Nov. 2014, pp. 68–69.
[9] S. Majerus, X. Tang, J. Liang, and S. Mandal, ‘‘Embedded silicon
odometers for monitoring the aging of high-temperature integrated cir-
cuits,’’ in Proc. IEEE Nat. Aerosp. Electron. Conf. (NAECON), Jun. 2017,
pp. 98–103. GUAN-HAO LIAN received the B.S. degree in
[10] D. Sengupta and S. S. Sapatnekar, ‘‘Estimating circuit aging due to BTI electronic and computer engineering from the
and HCI using ring-oscillator-based sensors,’’ IEEE Trans. Comput.-Aided National Taiwan University of Science and Tech-
Design Integr. Circuits Syst., vol. 36, no. 10, pp. 1688–1701, Oct. 2017. nology, Taipei, Taiwan, in 2015, and the M.S.
[11] A. Goel and R. J. Graves, ‘‘Electronic system reliability: Collating predic- degree in electrical engineering from National
tion models,’’ IEEE Trans. Device Mater. Rel., vol. 6, no. 2, pp. 258–265, Tsing Hua University, Hsinchu, Taiwan, in 2017.
Jun. 2006. His research interests include PVTA conditions
[12] IEEE Standard Framework for the Reliability Prediction of Hardware monitoring inside an IC and cloud-based system
IEEE Standard 1413, 2009. designs.
[13] D. Lorenz, G. Georgakos, and U. Schlichtmann, ‘‘Aging analysis of circuit
timing considering NBTI and HCI,’’ in Proc. IEEE Int.-Line Test. Symp.,
Jun. 2009, pp. 3–8.
[14] J. G. Elerath and M. Pecht, ‘‘IEEE 1413: A standard for reliability predic-
tions,’’ IEEE Trans. Rel., vol. 61, no. 1, pp. 125–129, Mar. 2012.
[15] Z. Abuhamdeh, V. D’Alassandro, R. Pico, D. Montrone, A. Crouch, and WEI-YI CHEN received the B.S. degree in electri-
A. Tracy, ‘‘Separating temperature effects from ring-oscillator readings to cal engineering from National Chung Hsing Uni-
measure true IR-drop on a chip,’’ in Proc. IEEE Proc. Int. Test Conf. (ITC), versity, Taiwan, in 2014, and the M.S. degree
Oct. 2007, pp. 1–10. in electrical engineering from National Tsing
[16] Y. Miura, Y. Sato, Y. Miyake, and S. Kajihara, ‘‘On-chip temperature and Hua University, Hsinchu, Taiwan, in 2017. His
voltage measurement for field testing,’’ in Proc. Eur. Test Symp., 2012, research interests include PVT conditions moni-
pp. 28–31. toring inside an IC and system construction on
[17] C.-H. Hsu, S.-Y. Huang, D.-M. Kwai, and Y.-F. Chou, ‘‘Worst-case IR-drop FPGA.
monitoring with 1 GHz sampling rate,’’ in Proc. VLSI Design, Automat.,
Test (VLSI-DAT), Apr. 2013, pp. 1–4.
[18] Y. Miyake, Y. Sato, S. Kajihara, and Y. Miura, ‘‘Temperature and voltage
estimation using ring-oscillator-based monitor for field test,’’ in Proc.
IEEE Asian Test Symp., Nov. 2014, pp. 156–161.
[19] G.-H. Lian, S.-Y. Huang, and W.-Y. Chen, ‘‘Cloud-based PVT monitoring SHI-YU HUANG (S’94–M’97–SM’12) received
system for IoT devices,’’ in Proc. IEEE Asian Test Symp. (ATS), Nov. 2017, the B.S. and M.S. degrees in electrical engineer-
pp. 76–81. ing from National Taiwan University, in 1988 and
[20] C.-W. Tzeng, S.-Y. Huang, and P.-Y. Chao, ‘‘Parameterized all-digital PLL 1992, respectively, and the Ph.D. degree in elec-
architecture and its compiler to support easy process migration,’’ IEEE trical and computer engineering from the Univer-
Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 3, pp. 621–630, sity of California at Santa Barbara, Santa Barbara,
Mar. 2014. in 1997. He has been on the Faculty of Electrical
[21] R. Nazari, N. Rohbani, H. Farbeh, Z. Shirmohammadi, and S. G. Miremadi,
Engineering Department, National Tsing Hua Uni-
‘‘A2 CM2 : Aging-aware cache memory management technique,’’ in Proc.
versity, Taiwan, since 1999. His research interests
IEEE CSI Symp. Real-Time Embedded Syst. Technol. (RTEST), Oct. 2015,
pp. 1–8.
include VLSI design, automation, and testing, with
[22] M. Karimi, N. Rohbani, and S.-G. Miremadi, ‘‘A low area overhead a current emphasis on all-digital phase-locked loop (ADPLL) design and its
NBTI/PBTI sensor for SRAM memories,’’ IEEE Trans. Very Large Scale application in parametric fault testing and monitoring in 3-D ICs. He has
Integr. (VLSI) Syst., vol. 25, no. 11, pp. 3138–3151, Nov. 2017. served as a Program Co-Chair or the General Co-Chair in a number of sym-
[23] A. Vijayan, A. Koneru, S. Kiamehr, K. Chakrabarty, and M. B. Tahoori, posia (2004, 2009 Asian Test Symposium, and 2014, 2015 VLSI, Design,
‘‘Fine-grained aging-induced delay prediction based on the monitoring of Automation, and Test Symposium). He served as an Associate Editor for the
run-time stress,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., IEEE TRANSACTIONS ON COMPUTERS, from 2015 to 2018.
vol. 37, no. 5, pp. 1064–1075, May 2018.

VOLUME 7, 2019 135971

You might also like