Professional Documents
Culture Documents
Iot Reliability and Maintain Ability
Iot Reliability and Maintain Ability
Maintainability of
IoT systems
Sensor Networks
PSAP
B
0
8
1
KNW
COTD
BDC
KYVW
WMC Santa
RDM
AZRY
CRY
BZN
SND
KSW
FRD
SMER
DHL
SO
P474
SLMS
MPO
LVA2
BVDA SCS
GLRS
P478
P486
MTGY
MVFD
P510
P483 WLA
CRRS
RMNA USG
DSM CWC C
E P506
P499
P480
CE
MON
P
UCSD
M P497
70+ miles
to SCI DESC L
O
P494
P
4
7
3
SDSU
CNM
PL
P066
POTR
N
S
to CI and S
PEMEX S
approximately 50 miles:
anemometer
3D ultrasonic
anemometer solar
radiation
Pan-tilt-zoom
camera
tipping
rainbucket
support
temperature equipment
relative humidity
http://www.csp.navy.mil/csda5/dsu/dsu.htm
CitiSense Deployment: Wearables and
Environmental Sensors
relay
Local SAM connection at the ICP Eagle Fire as seen from HWY79 on May 3rd, 2004
Volcan Fire HPWREN connection, September 2005
Incident Command Post Volcan relay site
site
Summary of Learnings from
Firefighting Over the Last Decade
• Rapid response is key:
– It started with enabling connectivity for Incident Command Post deployed for large
wildfires, and connectivity to fire stations, camps and air bases
• Successfully done for fires in 2003, 2004, 2005, 2006 & 2007
– Real-time event detection based on weather sensors sent automated pager and email
alarms to public safety personnel, alerting them in real time of Santa Ana conditions
• During 2007 fire HPWREN provided data to both fire fighters and public on the current
conditions; enabling much more accurate and scalable troop deployment
• Key problem: we need scalable, adaptable, manageable infrastructure
– The question is not if failures will happen, but how proactive the infrastructure will be at
avoiding them and solving them -> big issue for the IoT!
What is IoT?
Source: ZTE
IoT HW & SW
HW components used
SW features used
The changing nature of computing
Cisco 2011
Performance-related metrics
Energy Optimization
Max(Perf) s.t. Target Battery Lifetime
Available range for selected target
Energy Consumed
operation
Normal
time
(Algorithm activation time, Maximum Battery lifetime Minimum
Available energy) performance extension performance
Example: 𝐹 = 0
𝑇𝑘+1 = 𝐴𝑇𝑘
T (C)
– Fatigue failures 70
| T | q 60
f 1 31 61 91 121 151 181 211 241 271
MTTF EM
• ∆T : Magnitude of variation [years] TDDB
TC
• f : Frequency of cycles 130 System
110
– ΔT increases by 10oC
90
(q=4 for metallic structures) 70
50
• Failures happen 16 times 30 Power Savings [%]
sooner 10
0.0 10.0 20.0 30.0 40.0 50.0
Perfomance
Reliability
Freq(t+1)
T(t) V(t)
MPSoC
Hardware
Optimal Reliability Controller
. core1
. . .
. Optimal Short . .
Optimal Long - Applied
. Term .
Term Controller frequency 𝐹
. 𝑉𝐿𝑇𝐶 Controller - Allocation 𝑋𝑎𝑙𝑙𝑜𝑐 .
𝑇𝐿𝑇𝐶
coreN
Thermal Controller:
- Medium Intervals (MI, s) 𝑻𝒂𝒗𝒆𝒓𝒂𝒈𝒆 ≤ 𝑻𝑳𝑻𝑪
- Assigns H tasks to cooler cores first (𝑋𝑎𝑙𝑙𝑜𝑐 )
Implementation
USER Slow rate Medium rate Fast rate
APPLICATIONS
APP MONITOR CONFIG Reference voltage USERSPACE
and temperature Process
Currently executing apps Favorite allocation
OLTC TC
apps
RELIABILITY MODEL
APP MONITOR
Average
voltage and
foreground app PID temperature
WARM RELIABILITY
APP MONITOR DRIVER
DRIVER DRIVER KERNEL
STC
Implementation
USER Slow rate Medium rate Fast rate
APPLICATIONS
APP MONITOR CONFIG Reference voltage USERSPACE
and temperature Process
Currently executing apps Favorite allocation
OLTC TC
apps
RELIABILITY MODEL
APP MONITOR
Average
voltage and
foreground app PID temperature
WARM RELIABILITY
APP MONITOR DRIVER
DRIVER DRIVER KERNEL
STC
Temperature model
Virtual platform: 𝑇𝑛𝑒𝑥𝑡 = 𝑡ℎ𝑒𝑟𝑚𝑎𝑙_𝑚𝑜𝑑𝑒𝑙(𝑇, 𝑃) (state space
- 4 cores representation)
- configured based on Control
Policy 𝑃(𝑓) 𝑃(𝑓) Power model:
measurements from the 𝑃(𝑓) 𝑃(𝑓) 𝑃 𝑓 = 𝑎𝑓 3 + 𝑏𝑓
Cortex A15 cores of the
ARM big.LITTLE Virtual
- CVX for solving cores
optimization 𝑓1 𝑓3
𝑓0 𝑓2 Frequency
adaptation
Comparison Results: Optimal Policy
vs. Heuristic (simulation)
min frequency
ondemand
max frequency
interactive
proposed
max frequency proposed ondemand interactive min frequency
Impact of Variability
DYNAMIC VARIATIONS:
VOLTAGE
TEMPERATURE DYNAMIC VARIABILITY
MANAGEMENT (DVM)
# occurances
Ambient
Workload temperature
User
STATIC VARIATIONS: PROCESSOR
- Channel length
Performance/ Power/
- Threshold
Degradation Rate
voltage
Without DVM
- Operating frequency
(performance) With DVM
TRANSISTOR - Power consumption Product
- Degradation rate (DR) requirement
A comprehensive management framework for
power, temperature, reliability, & variability
management in individual devices
User Experience Requirements
LTC: Long Term
Comprehensive Management Controller
STC: Short Term
Online Framework Controller
Variability
Variability
Emulation
LTC
Online Reliability Reliability
Emulation Average Unexploited
target margins
Temperature
STC
System
feedback App perf Battery
Power
Control decisions
Target system
Networks of IoT Devices
Internet of Things:
- Energy and reliability are important issues -> maintenance costs
- Challenges: heterogeneity, distributed nature, large scale
Control engine
Upper Layer
Research directions: Reliability
Average utilization constraint
- Framework for energy / Power
reliability management of QoS Lower Layer
the IoT infrastructure
- Control of multiple Control utilizations
QoS
heterogeneous control
C
variables P Utilization
metrics
- Automatic recognition of Utilization
Control
decisions
user experience R log stream
stream
requirements
IoT Network
Reliability of Networks of Devices
• Series combination: all components must operate for the
system to function correctly - R = RA x RB x RC
Input A B C Output