You are on page 1of 7

Cloud-Based Online Ageing Monitoring for

IoT Devices

Harshad Danawale , Mohan Yelpale


Department of computer Engineering, NBN Sinhagad School of Engineering, Pune

danawaleharshad@gmail.com
Department of computer Engineering, NBN Sinhagad School of Engineering, Pune

mohansaeit@gmail.com

Abstract— In today's safety-critical chips with built-in design-for-ageing-


applications, such as automotive monitoring circuits a server in the
electronics, electronic device reliability, cloud.
or whether it can perform reliably
during its designated lifetime in the Key words: age monitoring, Iot devices, cloud
field (such as 10 or 15 years), is
Introduction
becoming increasingly vital.
The aggressive scaling of integrated circuit technology
Traditionally, ageing has been done in
has a number of benefits, including a smaller form
an offline setting, with stress tests used
factor, faster speeds, and lower power consumption.
to speed up the ageing process,
However, the requirement for long-term reliability in
followed by the creation of a model to
safety-critical applications (e.g., vehicle electronics,
make futuristic predictions. This type
biomedical electronics, and other IoT devices) is
of offline method has the disadvantage
becoming increasingly difficult [1], [2]. An automotive IC,
of not being able to account for the
factor of a device's unique operating
for example, is frequently required to operate in the field
condition and environment that it may for more than 10 to 15 years. It is commonly known that
have encountered in the field. We reliability (in terms of failure rate) is a function of time,
present the first cloud-based ageing following a bathtub curve with three stages - infant
system in this paper. It provides a lot mortality, normal, and advanced .There are two stages of
of benefits. For starters, an IoT device's ageing: lifespan and terminal ageing. The high infant
ageing state may be monitored mortality rate is ideal. The mortality stage can be
remotely and in real time. Second, by bypassed by using stress testing (within reason).To
analysing data on a cloud server, more eliminate the weak, raise the temperature and/or
information can be obtained .It is increase the VDD devices that may have hidden flaws It
possible to make an accurate is hoped that, for the most part Shipped gadgets can be
prediction. Finally, an ageing threat can used in a system during normal operation. For certain, a
be forewarned before it occurs. Pre- lifetime stage with a low failure rate is required a period
cautionary measures (such as online of time However, the ageing effect may begin after that
repair or even a call-for-maintenance may take its toll on a device's functioning or possibly
request) can be taken in the event of a cause it to fail disrupt its fuctionality. Bias Temperature
strike taken ahead of time to avert Instability (BTI), Electro-Migration (EM), Hot Carrier
unnecessarily disastrous system failure Injection (HCI), and others are examples of ageing
Measurement data will be acquired mechanisms [3]–[6]. Thin-Oxide Breakdown, for example,
through a prototype system using test occurs when the ageing effect is turned on. It's possible
that a significant, unexpected functioning failure will monitored separately inside its own system, this
occur. As a consequence. During the design and type of peer-based over-ageing detection is
manufacturing stages of a safety-critical electronic
difficult to achieve.
equipment, a set of anti-aging methods should be
implemented. stage, and in-field stage, in order to (3) As previously said, a monitoring system is made
achieve the desired level of reliability required.
up of both hardware and software. The software
I. III. PROPOSED CLOUD-BASED
responsible for "ageing data analysis" in a cloud-
AGEING MONITORING based monitoring system can be run in the cloud
The proposed cloud-based online ageing monitoring
rather than on the edge device hosting the IC. This
methodology is described in this part, along with its
benefits, architecture, operation flow, and test chip.
type of setup is more modular and adaptable. We
merely need to install a new version of the "ageing
A. INTRODUCTION analysis algorithm" in the cloud whenever a new
The data generated during the monitoring process for version is released. There is no need to update the
an IC utilised in an Internet of Things (IoT) system can be software on the multiple edge devices, saving both
transferred to the cloud via an existing wireless internet time and money.
connection [19]. There are various advantages to using a
cloud-based monitoring system. A cloud-based monitoring system's architecture
can be separated into three domains, as shown in
(1) An IC's ageing history (consisting of average ROCP Fig. 4, namely (1) user domain, (2) cloud domain,
samples collected over its lifetime) can be seen at any
and (3) IoT Edge domain.
time and from any location on the planet.

(2) In the cloud, all ICs of the same type can have
their ageing histories gathered and compared,
allowing an IC with anomalous ageing conditions
to be easily detected as an outlier in terms of
particular attributes. When each IC is only
acceleration process is repeated until
the IC under stress has reached a
predetermined age limit (say 10
percent ).

For example, we had three ICs under


stress in our test scenario, and we
recorded the average ROCP values of
all 16 RO monitors put in the IC once
a day until all three ICs have aged by
a threshold value of 10% (as
compared to their time zero
references).

Phase 2 (Develop a "Accelerated


Aging Model"):

An "accelerated ageing model" is


produced by fitting the accelerated
ageing data using the findings of the
aforesaid ageing acceleration
technique. As illustrated in Fig. 6, we
employ a polynomial of degree 5 as
the fitting function. The resulting
"accelerated ageing model,"
represented by A, is a function of
time (t).
There are four stages to it. After a
number of selected ICs have been
produced, Phase 1 and Phase 2 are
one-time efforts. Phases 3 and 4 are,
on the other hand, carried out in the
field at pre-determined monitoring
intervals for each edge device.

Perform the Accelerated Aging


Process in Phase 1:

Stress tests are well known for their


ability to speed up the ageing
process. The goal of this step is to see
how a small number of ICs chosen for
characterisation during offline testing
age under stress. For example, in our
test case, we apply a "boosted supply
voltage" of 2 times the rated VDD
level, and then monitor their ageing
behaviours (as shown by the average
ROCP values) over time. The ageing Our "accelerated ageing model," like
the timing model for a standard cell,
is derived in three cases: (1) worst-
case, (2) typical-case, and (3) best-
case, with the worst-case "accelerated
ageing model" being obtained by
taking the top envelop of those
"accelerated ageing data" at each
recording time obtained during the
accelerated ageing process.

Similarly, the best-case scenario is


obtained by using the bottom
envelop, whereas the typical-case
scenario is obtained by using the
average "accelerated ageing data".
Our accelerated ageing procedure, as
shown in Fig. 6, achieves a 10%
ageing on these test chips in 14 days Best-to-typical ratio = 32/24 = 1.33
in the worst case, 24 days in the (or +33%). In various ways, the rate of
average case, and 32 days in the best accelerated ageing varies between
situation. [42 percent and 33 percent].

• Tworst: Using the worst-case Phase 3 (Develop a "Probable"


accelerated ageing model, the time Ageing Model):
required to reach an ageing threshold
While the accelerated ageing model
(e.g., 10% ageing).
predicts how a device would age
• Ttypical: The time it takes for the under a certain stress situation, our
typical-case accelerated ageing system's "predicted ageing model"
model to reach an ageing threshold. predicts how a device will age in the
field under regular workload. This
• Tbest: Using the best-case model is progressive and specific for
accelerated ageing model, the time each IC in that it considers the real
required to reach an ageing threshold ageing information of each IC when it
(e.g., 10% ageing). is used in the field, and we
Then Tworst = 14 days, Ttypical = 24 dynamically update this "predicted
days, and Tbest = 32 days in our test ageing model" over the lifetime of an
situation. Using Ttypical as a edge device based on the ageing
reference, we derived two ratios to be information seen thus far and store it
utilised later in the remaining lifespan in cloud.
prediction, namely the In general, the anticipated ageing
Ratio(worsttotypical) = 14/24 = 0.58 model is a timestretched version of
and the Ratio(worsttotypical) = 14/24 the accelerated ageing model in our
= 0.58. (or -42 percent ), technique, and it is obtained using a
successive approximation procedure
in two steps – coarse-stretching and
fine-stretching, as detailed below.
(Section 3.1) Based on the following increased to 43. As a result, for this
operation, derive the coarse ageing test instance, this coarse-stretching
model. technique will choose n = 42 as the
final coarse-stretching parameter. 2)
1) OPERATION OF COARSE-
OPERATION OF FINE-STRETCHING
STRETCHING
To make a fine ageing model, slightly
Stretch the time-axis of the typical-
expand the time-axis of the typical-
case accelerated ageing model A(t),
case accelerated ageing model A(t)
shown as the BLUE curve in Fig. 7,
with the following formula:
into a function labelled as A(t/n), seen
as the GREEN curve in Fig. 7. A(t) → A( t /n + 1 /Ttypical k ))

Our successive approximation Note that n is the coarse-stretching


procedure, which tries to minimise value that was previously established,
the ageing prediction error, whereas k is a fine-stretching
incrementally determines A(t) = A(t/ n parameter that will be computed at a
), where n is an integer, or called later stage.
coarse-stretching parameter, which is
Ttypical = 24 days in our test
incrementally determined by our
situation. We gradually increase the
successive approximation procedure,
value of k from 1 to find a correct
which tries to minimise the ageing
integer value that minimises the total
prediction error.
ageing prediction error (versus actual
(Ageing Prediction Error) Definition 1: ageing history).
The actual ageing data acquired so
far for a device under monitoring
constitutes an ageing history at any Phase 4 (Establish the Remaining Lifetime Range):
given period. The ageing prediction
We can determine a given IoT edge device's "retire time in
error can be calculated using the
the typical case" once we've calculated the final ageing
mean square error between a "ageing
model in the typical case. The retire time in the typical
model" and the actual "ageing scenario is then multiplied by the two ratios we derived
history." previously in the "accelerated ageing models" - i.e.,
Ratio(worsttotypical) and Ratio((besttotypical)- to derive the
The determination of the coarse-
worst case and best case retire times. We can determine the
stretching parameter will become
remaining lifespan once we know the retire times:
easier as a result of the
aforementioned description — it (Remaining Lifetime) - (Retire Time) = (Remaining Lifetime) -
discovers a positive integer n through (Retire Time) - (Retire Time) (Current Time)
an iterative procedure such that the V. CONCLUSION
ageing forecast error is kept to a
minimum. The ageing prediction A credible methodology that can demonstrate how long
error for n = 1, 2,..., 43 is listed in an IC can operate reliably under the influence of ageing
Table 1. It can be seen that the error is badly needed for safety-critical applications.
decreases monotonically from 3032 Traditional methods of entirely software-based
percent to 2.781 percent (as n prediction may be insufficient, as the outcomes may be
increases from 1 to 42) before slightly sloppy. As a result, in this study, we suggest a more
increasing to 2.789 percent when n is accurate method.
Our contributions can be divided into two categories. degradation in thin-oxide n-
First, because the ageing and longevity prediction is MOSFETs,’’ IEEE Trans. Electron
based on both an offline "accelerated ageing model" Devices, vol. 46, no. 9, pp. 1877–1882,
and an online "ageing history" under normal workload, it Sep. 1999.
can become more believable. Also, for each individual IC, [5] M. Kimura, ‘‘Field and temperature
a unique "lifetime" prediction can be generated. Second, acceleration model for time-
an ageing monitoring system includes both hardware dependent dielectric breakdown,’’
and software, as well as data analysis for a higher IEEE Trans. Electron Devices, vol. 46,
number of ICs distributed globally. System integration no. 1, pp. 220–229, Jan. 1999.
and maintenance are also simplified when using a cloud- [6] K. N. Tu, ‘‘Recent advances on
based technique. The efficiency of the proposed electromigration in very-large-
methodology has been demonstrated using test chip scaleintegration of interconnects,’’ J.
measurement data. Appl. Phys., vol. 94, no. 9, pp. 5451–
5473, 2003.
[7] K. K. Kim, W. Wang, and K. Choi, ‘‘On-
chip aging sensor circuits for reliable
nanometer MOSFET digital circuits,’’
IEEE Trans. Circuits Syst. II, Exp. Briefs,
vol. 57, no. 10, pp. 798–802, Oct.
ACKNOWLEDGMENT 2010.
[8] B. Jang, J. K. Lee, M. Choi, and K. K.
authors would like to thank the Taiwan
Kim, ‘‘On-chip aging prediction circuit
Semiconductor Research Institute (TSRI) for in nanometer digital circuits,’’ in Proc.
providing the access to the EDA tools [26]. IEEE SoC Design Conf. (ISOCC), Nov.
2014, pp. 68–69.
[9] S. Majerus, X. Tang, J. Liang, and S.
1) References Mandal, ‘‘Embedded silicon
odometers for monitoring the aging of
[1] V. Prasanth, D. Foley, and S. Ravi, high-temperature integrated circuits,’’
‘‘Demystifying automotive safety and in Proc. IEEE Nat. Aerosp. Electron.
security for semiconductor Conf. (NAECON), Jun. 2017, pp. 98–
developer,’’ in Proc. IEEE Int. Test 103. [
Conf. (ITC), Oct./Nov. 2017, pp. 1–10. [10] D. Sengupta and S. S. Sapatnekar,
[2] G. A. Klutke, P. C. Kiessler, and M. A. ‘‘Estimating circuit aging due to BTI
Wortman, ‘‘A critical look at the and HCI using ring-oscillator-based
bathtub curve,’’ IEEE Trans. Rel., vol. sensors,’’ IEEE Trans. Comput.-Aided
52, no. 1, pp. 125–129, Mar. 2003. Design Integr. Circuits Syst., vol. 36,
[3] J. H. Stathis and S. Zafar, ‘‘The no. 10, pp. 1688–1701, Oct. 2017.
negative bias temperature instability [11] A. Goel and R. J. Graves, ‘‘Electronic
in MOS devices: A review,’’ system reliability: Collating prediction
Microelectron. Rel., vol. 46, nos. 2–4, models,’’ IEEE Trans. Device Mater.
pp. 270–286, 2006. Rel., vol. 6, no. 2, pp. 258–265, Jun.
[4] T. Wang, L.-P. Chiang, N.-K. Zous, C.-F. 2006.
Hsu, L.-Y. Huang, and T.-S. Chao, ‘‘A [12]IEEE Standard Framework for the
comprehensive study of hot carrier Reliability Prediction of Hardware IEEE
stress-induced drain leakage current Standard 1413, 2009.
[13]D. Lorenz, G. Georgakos, and U. Trans. Very Large Scale Integr. (VLSI)
Schlichtmann, ‘‘Aging analysis of Syst., vol. 22, no. 3, pp. 621–630, Mar.
circuit timing considering NBTI and 2014.
HCI,’’ in Proc. IEEE Int.-Line Test. [21] R. Nazari, N. Rohbani, H. Farbeh, Z.
Symp., Jun. 2009, pp. 3–8. Shirmohammadi, and S. G. Miremadi,
[14] J. G. Elerath and M. Pecht, ‘‘IEEE ‘‘A2CM2 : Aging-aware cache memory
1413: A standard for reliability management technique,’’ in Proc.
predictions,’’ IEEE Trans. Rel., vol. 61, IEEE CSI Symp. Real-Time Embedded
no. 1, pp. 125–129, Mar. 2012. Syst. Technol. (RTEST), Oct. 2015, pp.
[15] Z. Abuhamdeh, V. D’Alassandro, R. 1–8.
Pico, D. Montrone, A. Crouch, and A. [22] M. Karimi, N. Rohbani, and S.-G.
Tracy, ‘‘Separating temperature Miremadi, ‘‘A low area overhead
effects from ring-oscillator readings to NBTI/PBTI sensor for SRAM
measure true IR-drop on a chip,’’ in memories,’’ IEEE Trans. Very Large
Proc. IEEE Proc. Int. Test Conf. (ITC), Scale Integr. (VLSI) Syst., vol. 25, no.
Oct. 2007, pp. 1–10. 11, pp. 3138–3151, Nov. 2017.
[16] Y. Miura, Y. Sato, Y. Miyake, and S. [23]A. Vijayan, A. Koneru, S. Kiamehr, K.
Kajihara, ‘‘On-chip temperature and Chakrabarty, and M. B. Tahoori, ‘‘Fine-
voltage measurement for field grained aging-induced delay
testing,’’ in Proc. Eur. Test Symp., prediction based on the monitoring of
2012, pp. 28–31. run-time stress,’’ IEEE Trans. Comput.-
[17] C.-H. Hsu, S.-Y. Huang, D.-M. Kwai, Aided Design Integr. Circuits Syst., vol.
and Y.-F. Chou, ‘‘Worst-case IR-drop 37, no. 5, pp. 1064–1075, May 2018.
monitoring with 1 GHz sampling rate,’’ [24] N. Rohbani and S.-G. Miremadi, ‘‘A
in Proc. VLSI Design, Automat., Test low-overhead integrated aging and
(VLSI-DAT), Apr. 2013, pp. 1–4. SEU sensor,’’ IEEE Trans. Device
[18]Y. Miyake, Y. Sato, S. Kajihara, and Y. Mater. Rel., vol. 18, no. 2, pp. 205–
Miura, ‘‘Temperature and voltage 213, Jun. 2018.
estimation using ring-oscillator-based [25]H. A. Balef, K. Goossens, and J. P. de
monitor for field test,’’ in Proc. IEEE Gyvez, ‘‘Chip health tracking using
Asian Test Symp., Nov. 2014, pp. 156– dynamic in-situ delay monitoring,’’ in
161. Proc. IEEE Design, Autom. Test Europe
[19] G.-H. Lian, S.-Y. Huang, and W.-Y. Conf. Exhibit. (DATE), Mar. 2019, pp.
Chen, ‘‘Cloud-based PVT monitoring 304–307. [26] CIC Reference Flow for
system for IoT devices,’’ in Proc. IEEE Cell-Based IC Design, document CIC-
Asian Test Symp. (ATS), Nov. 2017, pp. DSD-RD08-01, Chip Implementation
76–81. Center, CIC,
[20] C.-W. Tzeng, S.-Y. Huang, and P.-Y. [26]CIC Reference Flow for Cell-Based IC
Chao, ‘‘Parameterized all-digital PLL Design, document CIC-DSD-RD08-01,
architecture and its compiler to Chip Implementation Center, CIC,
support easy process migration,’’ IEEE

You might also like