You are on page 1of 18

IcySOC RTD 2013

IcySoC: Inexact Sub- and Near-Threshold


Systems for Ultra-Low Power Devices

A Toolbox for Tera-Scale


Computing in Nano-Scale Devices
EPFL, EM, ETHZ, CSEM
2017
IcySOC RTD 2013

Smart Autonomous Devices with Ambient


Intelligence
Data volume Latency
Environmental
0.01 Mbps hours
monitoring

Health minutes
0.1 Mbps
monitoring

Smart cities,
Surveillance, 100 Mbps seconds
Interaction

Autonomous driving
1000 Mbps milliseconds
& Tactile Internet
Sensing and processing requirements

Collection of large amounts of data to capture complex contexts


Short latency for complex processing to interpretation the data
2
IcySOC RTD 2013

Cloud Based Services are Not a Solution


INPUT OUTPUT
BANDWIDTH 1000s of 1000s of BANDWIDTH
Central
Per User users/devices users/devices Per User
intelligence

Image 1000 Kbps 10 bps

Sensor data Response

Voice 256 Kbps 0.02 Kbps


Clou
d
Inertial 2.4 Kbps Exa-Flops Computing 0.02 Kbps
Power

Biometrics
16 Kbps Latency: 10s to 100s of milliseconds
0.08 Kbps

Large bandwidth and power for communication of sensor data


Limited QoS guarantees due to limited network availability
Long- to very long latencies
L. Benini

3
IcySOC RTD 2013

Ultra-Low Power Near-Sensor Processing


to theSense
Rescue Analyze and Classify Communicate
MEMS IMU
Extracting the Essence
Storage
Interaction
Clou
d
MEMS Microphone

ULP Imager

EMG/ECG/EEG

1 2000 MOps Low-rate (Kbps Mbps)

Idle: ~1 uW
100 uW 2 mW 1 10 mW
Active: ~ 50 mW
L. Benini

4
IcySOC RTD 2013

Objective of the project


Develop a tool-box for ultra-low-power computing in
various technologies, ranging from 180nm down to 28nm

5
IcySOC RTD 2013

IcySoC Applications and Approach


IcySoC follows a bottom-up approach to energy-efficient and
robust near- and sub-Vt design
Field of Use
Image Audio Quality Visual Deep
narrow Applications Enhancement Enhancement Recognition Learning

Search
Key Kernels Filters DCT/FFT
Algorithms

Software Variable Accuracy Software Error


bottom-up

Support Support Handling

PULP Micro Configurable Variable


Architecture Multicore & Accelerators Pipeline Depth

Variable Precision Hardware Error


Gate Level Arithmetic Handling

Circuit Level Body Robust


wide Biasing Cells & Memories
Measures

6
IcySOC RTD 2013

PULP: A Parallel Ultra Low Power Platform (ETHZ)


Multi-core platform
ideally suited for
parallel workloads
OpenRISC / RISC-V
processors
Configurable number of
cores keeps platform
highly flexible
Allows for easy integration
of custom (e.g., approximate
computing) accelerators and FP units based on LNS
Tightly coupled data memories avoid management overhead
of autonomous caches
Complete programming and emulation environment

7
IcySOC RTD 2013

PULP is Available as OpenSource Hardware


Platform
Various versions of the PULP system are available for
evaluation and use in real-world applications
Thunderboard: a PULP system with OpenSource HW PULPINO
4 cores and 2 accelerators platform with a single core

8
IcySOC RTD 2013

Near- and Sub-Threshold Design


The IcySoC project has developed the basic building
blocks to drive voltage scaling to new frontiers with best-
in-class tradeoffs for various speed/energy requirements
Robust ultra-low voltage Robust low-voltage Synthesizable robust
standard cell libraries custom sub-VT SRAM low-voltage memories

Less susceptible to variation with New 6-T and 7-T bit-cell Robust ultra-low voltage
proper transistor sizing structures for reliable sub-VT memory for any technology,
Low leakage for energy efficiency operation including FDSOI with BB
Low leakage in sub-VT operation Lower energy than SRAM

9
IcySOC RTD 2013

Near- and Sub-Threshold Design


Demonstrators
IcySoC near-sensor processors for sub- and near-
threshold operation outperform commercial IoT
processors by orders of magnitude
Icyflex2: a near-sensor sub-VT PULP v2/v3: two near-sensor near-VT
ULP DSP in 180nm CMOS ULP many core accelerators in 28nm FDSOI

Designed and
manufactured
in Switzerland

IcySoC custom cells and memories enable Exploiting the full potential of new technologies:
operation down to 0.37V for leading FDSOI body bias compensates for variability
energy efficiency: 17.1pJ/cycle @ 19kHz at near-threshold voltages and reduces leakage
with leakage as low as 2.8 nW at 0.48 V Standard-cell memories with reconfigurable
Energy-autonomous operation memory maps enable ULP operation for a wide
from Photo-Voltaic Supply voltage range

10
IcySOC RTD 2013

Approximate Computing: A New Design


Paradigm
Many applications tolerate a certain amount of loss in QoS

Use the potential of trading-off accuracy and speed to partially compensate for performance loss due to
voltage scaling
Approximation allows to tolerate a certain amount of circuit failures due to reliability issues reduce
overhead for worst-case design

11
IcySOC RTD 2013

A Toolbox for Approximate Computing


The IcySoC project has developed design techniques for
deterministic approximation of fundamental arithmetic
primitives to reduce area, delay, and power at design time
Gate-level Pruning (GLP) Speculative Adders Logarithmic number
arithmetic

Selective pruning of gates Speculation avoids long delays that


with large impact on power, are only relevant for rare events Alternative to floating-point for
but small impact on accuracy Approximate compensation avoids high dynamic range applications
Up to 5x gains for 1% mean large errors Shifts the main complexity from
relative error in a 32 bit adder 50% power savings and 75% PDAP MUL/DIV/SQRT to addition
reduction with 10-3 relative RMS error New area/quality tradeoffs
compared to floating point
Many design-time tradeoffs for specialized hardware accelerators

12
IcySOC RTD 2013

Approximate Computing at Work: Proof of


Concept
IcySoC uses approximate computing on full near-sensor
PULP platforms for video and audio real-time processing
PULP SoC with pruned arithmetic PULP SoC with shared LNS unit
Image and video compression

HDR tone-mapping with approximate FPU

Original Exact Approximate

20-50% power-area savings on image applications Key application kernels show gains of up to
5.54x in energy efficiency (1.71x on average)

13
IcySOC RTD 2013

Approximate Computing for Always the


Right Quality
IcySoC provides techniques for run-time
adaptation of energy/quality
tradeoff:

Low-power filters ULP wakeup Circuits with graceful performance


for audio applications receivers degradation on unreliable silicon

Exploits data dependencies Careful balance between Avoid overhead for 100% reliable operation
of power consumption for false-alarm and missed in processors and memories
>40% power reduction events: 80% power reduction 20 80 % power savings

14
IcySOC RTD 2013

IcySoC Chips with Help of All Partners


Three versions
EM-Marin
CSEM
Chip manufactured
Optim
using the ALP 180nm
ETHZ
ized
Four technology Sub-VT
SRAM
core
Macro
PULP
cells
ICLAB syste
for
Prune m Low Leakage
dSub-
and with TCL
VT
specul App
appro
opera
ative rox.
ximat
tion
adder FIR Low VT
e
s LNUs 15
IcySOC RTD 2013

Chips sent to Manufacturing by ETH Zrich

16
IcySOC RTD 2013

System Demonstrations of PULP (right)


and DynOR (left)

17
IcySOC RTD 2013

Thank you for your attention

18

You might also like