You are on page 1of 39

QUALITY PROGRAMMABLE

VECTOR PROCESSORS FOR


APPROXIMATE COMPUTING
Swagath Vekataramani1, Vinay Chippa1, Srimat
Chakradhar2, Kaushik Roy1, Anand Raghunathan1
1Integrated

Systems Laboratory
School of ECE, Purdue University
2NEC Laboratories America
International Symposium on Microarchitecture 2013

COMPUTERS == PRECISE CALCULATORS


TASK

> 20.4

>1

float x = 433/21
float y = 20.4
(x > y) ? YES :NO

YES

float x = 433/21
float y = 1
(x > y) ? YES :NO

YES
But, I worked
harder than
needed

Leads to inefficiency
And, an overkill (for many applications)

EVOLVING APPLICATION LANDSCAPE

Search
Mining
Recognition

Vision
Video Processing

Relaxed notion of correctness


Results cannot be arbitrary either

Good enough answers !!!

INTRINSIC APPLICATION RESILIENCE: A NEW


DIMENSION TO OPTIMIZE HW & SW

Search
Vision

Mining
Recognition

Video Processing
Intrinsic

Application
Resilience

The ability to produce outputs of acceptable quality

despite many of their computations executed imprecisely

INTRINSIC

APPLICATION RESILIENCE:

SOURCES

Perceptual
Limitations
Statistical
Probabilistic
Computations

Redundant
Input Data

Noisy Real
World Inputs

Principle Component
Analysis

Intrinsic
Application

Self-Healing

Resilience
Repeat until convergence

Compute distances
& assign points
to clusters

Update cluster
means

APPROXIMATE COMPUTING: DESIGN PHILOSOPHY


Systems that can modulate the effort

expended towards quality of results

MIN

Higher effort Higher quality but higher


energy

MAX
EFFORT

How do we get the best Q vs. E

tradeoff?

Min

Effort

Energy

Max

Quality

Disproportionate benefit

Max

Effort

Min

Max

Effort

Min

APPROXIMATE COMPUTING DESIGN TECHNIQUES:


OVERVIEW

Quality
Specifications

Application

Approximate
Algorithm
Software

Approximate
Architecture
Architecture

ARC Chippa et. al. DAC 2013


Verifying quantitative reliability
Carbin et. al. OOPSLA 2013

Approximate
Circuit
Circuits

Layout

Lets take a closer look


GREEN Baek et. al. PLDI 2010
Best effort computing Chakradhar
et. al. DAC 2010
Power dial Hoffmann et. al.
ASPLOS 2011
Approximate neural acceleration
Esmaeilzadeh et. al. MICRO 2012

Redundancy propagation Shin et. al.


DATE 2011
Probabilistic pruning Lingamneni et.
al. DATE 2011
Dont care based approximation
Venkataramani et. al. DAC 2012
Substitute-and-simplify
Venkataramani et. al. DATE 2013

Approximate

Implementation

APPROXIMATE ARCHITECTURE

Algorithm- Domain-specific
specific
accelerators
accelerators image, video

ANT Hedge et. al.


ISLPED 1999
Significance driven
computing Mohapatra
et. al. ISLPED 2009
Scalable effort hardware
Chippa et. al. DAC
2010, DAC 2011
Application specific designs

Programmable
accelerators (GPGPUs,
MIC) / Vector processors

ERSA Leem et. al.


DATE 2010
Stochastic processor
Narayanan et. al.
DATE 2010
Cores of different
reliabilities

General purpose
processors/
Multicores

Truffle
Esmaeilzadeh et.
al. ASPLOS
2012
EnerJ Sampson
et. al. PLDI 2011
Accurate and
approximate
instructions

APPROXIMATE ARCHITECTURE

Algorithm- Domain-specific
specific
accelerators
accelerators image, video

Programmable
accelerators (GPGPUs,
MIC) / Vector processors

General purpose
processors/
Multicores

Pros:
Large energy benefits

Broader applicability

Challenges:
Limited applicability

Inherently limited energy benefit


Dominated by control front-ends
that cannot be approximated
Allow arbitrary errors in hardware
limits the fraction of computations
that can be approximated

APPROXIMATE ARCHITECTURE

Algorithm- Domain-specific
specific
accelerators
accelerators image, video

Programmable
accelerators (GPGPUs,
MIC) / Vector processors

General purpose
processors/
Multicores

Opportunity:
Wide range of
applications fine grained
parallelism

SIMD: Control overheads amortized


over many execution units
Need quality guarantees from HW
We will address that!

CONTRIBUTIONS

Quality programmable processors

An abstract model for programmable


approximate processors
QUORA
A quality programmable 1D/2D vector processor

Quality programmable processors


Requirements:
HW/SW interface for applications to expose resilience
Micro-architecture that can translate resilience to
efficiency

QUALITY PROGRAMMABLE PROCESSORS


Quality Programmability: Ability to specify beyond

what can be accurate & approximate to HW


Notion of quality explicitly built into the instruction set

Quality Programmable ISA

HW/SW INTERFACE

Quality fields in instructions

qpADD dest, op1, op2, MAG, 1%


Quality-programmable
add

Error magnitude < 1%


of maximum
numeric value of output

Purely based on instruction semantics

NEED FOR QUALITY PROGRAMMABILITY


Errors injected randomly in x86 instructions arbitrary vs. bounded
30

25

% of Approximate instructions

% of Approximate instructions

30

Arbitrary
< 50%
< 12.5%
< 2.5%

20
15

25

Arbitrary
< 50%
< 25%
< 7.5%

20
15

10

10

5
0
0

10

% Loss in Output Quality -->

Image Segmentation
(K-means)

12

0
0

% Loss in Output Quality -->

Handwritten Digit Recognition


(SVM)

25-100X improvement in number of approximate


instructions
Constraining errors much greater opportunity to approximate!!!

QUALITY PROGRAMMABLE MICRO-ARCHITECTURE


Micro-architecture guarantees instruction-level quality

HW/SW INTERFACE

QUALITY PROGRAMMABLE MICROARCHITECTURE


Inst.
Fetch

Decode &
Control

Quality Control Logic

Register
File

Quality Programmable ISA


qpADD dest, op1, op2, MAG, 1%

Quality
Configurable
Execution
Unit

Translate instruction

quality specification
into accuracy knobs
built in hardware
Capable of executing

instructions with
different quality levels
Any approximate
HW design technique
e.g. precision scaling

QUALITY MONITORS

AND

ERROR FEEDBACK

Micro-architecture provides feedback on instruction-level

quality to software

HW/SW INTERFACE

QUALITY PROGRAMMABLE MICROARCHITECTURE


Inst.
Fetch

Decode &
Control

Register
File

Software visible
Error Registers

Quality Control Logic

Quality
Configurable
Instruction
Execution
accuracy
Unit
monitor

Quality Programmable ISA


qpADD dest, op1, op2, MAG, 1%

Translate instruction quality


HW accuracy knobs
Capable of executing
instructions with different
quality levels

uArch may not be

able to use all the


flexibility
Conservative quality
translation
Error variance over i/p
data

Estimate actual error

Multimedia

Synthesis

Vision

Recognition

Search
Video Processing
Image Analysis

Mining

QUORA
Quality programmable 1D/2D vector processor

QUORA: OVERVIEW
3-tier processing element

hierarchy

2D array PEs
2 sets of 1D array PEs
One scalar PE

2 X Streaming memory bank


2 X1D Array
2D Array

2 streaming memory

m rows

banks along the array


borders

n columns

QUORA: OVERVIEW
3-tier processing element

hierarchy

2 streaming memory

banks along the array


borders

Application characteristic:

2 levels of reduction
operations

2 X Streaming memory bank


2 X1D Array
2D Array

First level: 2D array


All-to-all vector
reduction of inputs
Generate large
intermediate data

n columns

m rows

2D array PEs
2 sets of 1D array PEs
One scalar PE

QUORA: OVERVIEW
3-tier processing element

hierarchy

2 streaming memory

banks along the array


borders

Application characteristic:

2 levels of reduction
operations

2 X Streaming memory bank


2 X1D Array
2D Array
Second level: 1D array
Reduction of
intermediate data
to small number of
outputs

m rows

2D array PEs
2 sets of 1D array PEs
One scalar PE

n columns

PROCESSING ELEMENT HIERARCHY

Functionality / Size

Scalar
1D-Array
PE

2D-array PE

Similar to scalar
uProcessor

Small register
file, complex
execution units
Simple
accumulator
based data path

Complexity

PROCESSING ELEMENT HIERARCHY

Scalar
1D-Array
PE

PE count
(1)
(m+n)

2D-array PE
(m*n)

Scope for
approximation

PE count
Energy

Complexity

PROCESSING ELEMENT HIERARCHY


Energy scalability
under approximate
operation

Scalar
1D-Array
PE

(> 70%)

2D-array PE
(> 90%)

CAPE

Scope for
approximation

PE count
Energy

Complexity

PROCESSING ELEMENT HIERARCHY


Energy scalability
under approximate
operation

Completely Accurate
Processing Element

MAPE

Mixed Accuracy Processing


Elements

APE

Approximate Processing
Elements

(> 70%)

(> 90%)
3-tiered PE hierarchy enables larger energy benefits from approximate computing
(while matching application characteristics)

QUORA: INSTRUCTION SET ARCHITECTURE


47 Instructions 9 APE, 22 MAPE, 13 CAPE, 3 SM
Inst.Type

Instruction
LDRI Rd, value

Scalar
Instructions

ADDR Rd,Rs1,Rs2
BEZ Rs, Rel. address
HALT

Streaming
LDSM R_length, stride, burst,
Memory
R_st_add
instructions
qpMAC R_length, R_row_enb,
R_col_enb, R_q_type, R_q_amt

Inst. Type

qpACC <r/c>, R_row_enb,


R_col_enb, R_q_type, R_q_amt

1D Array
Reduction
Instructions qpMIN <r/c>, R_row_enb,

R_col_enb, R_q_type, R_q_amt

1D Array
SEQ R_length, SReg, R_row_enb,
Streaming
R_col_enb
Instructions

1D Array
Self2D Array qpMOD2 R_length, R_row_enb,
Operand
Instructions R_col_enb, R_q_type, R_q_amt
Instructions
STR <r/c>, R_stride, R_burst,
R_st_add, R_row_enb,
R_col_enb

Instruction

MVASR <r/c>, R_<r/c>_enb, SReg


qpADDX <r/c>, R_<r/c>_enb, Sreg,
R_q_type, R_q_amt
qpMUL <r/c>, R_<r/c>_enb, Sreg,
R_q_type, R_q_amt

STMCG <r/c>, R_<r/c>_enb, SReg

QUORA QUALITY PROGRAMMABLE INSTRUCTIONS


APE and MAPE instructions extended with 2 additional

quality fields
e.g. qpMAC R_length, R_row_enb, R_col_enb, R_q_type, R_q_amt

Type of error 3 quality metrics

= | |


=
.
.
=
.

Amount of error

QUORA: QP 1D/2D VECTOR PROCESSOR


INTERFACE

Data. IN
SM_col_sel

1-to-many-DEMUX

ALU

CLK

SM

SM

MAPE

Scalar
Reg. File

MAPE

CAPE

SM

SM

SM

MAPE

ACC

Approximate
Processing Element Array
Quality Control Unit & Quality Monitors
MAPE_row_sel

SM

MAPE

APE

APE

APE

APE

APE

SM

MAPE

APE

APE

APE

APE

APE

APE

APE

APE

APE

APE

APE

APE

APE

SM

ALU
ACC

SM

MAPE

SM

MAPE

ALU

MUX

MAPE

MUX

Scratch Registers

Reg

1-to-many-DEMUX
SM_row_sel

MAPE

ACC
MUX

APE

APE

APE

APE

APE

APE

APE ARRAY
APE

MUX
Data. OUT

Data. OUT

Data. IN

DATA
MEMORY

Data. Read
Data. Write
Data. Add
Data. OUT
Data. IN

Halt

MAPE

Prog. Counter

Quality Control Unit & Quality Monitors

Instruction

ALU

INST. DECODE &


CONTROL UNIT

Inst. Add

Reg

Inst. Read
INST.
MEMORY

Scratch Registers

RESET

MAPE_col_sel

QUORA: QP 1D/2D VECTOR PROCESSOR

SM_row_sel
SM_row_sel
SM_row_sel
SM_row_sel
SM_row_sel

MAPE
MAPE
MAPE
MAPE
MAPE

ACC
ACC
ACC
ACC
ACC

ALU
ALU
ALU
ALU
ALU

MAPE
MAPE
MAPE
MAPE
MAPE

MAPE
MAPE
MAPE
MAPE
MAPE

Halt
Halt
Halt
Halt
Halt

Scratch
Registers
Scratch
Registers
Scratch
Registers
Scratch
Registers
Scratch
Registers

Prog. Counter
Counter
Prog.
Prog.
Counter
Prog.
Counter
Prog.
Counter

SM
SM
SM
SM
SM

Quality Control
Control Unit
Unit &
& Quality
Quality Monitors
Monitors
Quality
Quality
Control
Unit
&
Quality
Monitors
Quality
Control
Unit
Quality
Monitors
Quality
Control
Unit
&&
Quality
Monitors

Mixed Processing
Element Array

MAPE
MAPE
MAPE
MAPE
MAPE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

SM
SM
SM
SM
SM

MAPE
MAPE
MAPE
MAPE
MAPE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

MAPE
MAPE
MAPE
MAPE
MAPE

SM
SM
SM
SM
SM

ALU
ALU
ALU
ALU
ALU
ACC
ACC
ACC
ACC
ACC

SM
SM
SM
SM
SM
SM
SM
SM
SM
SM

MAPE
MAPE
MAPE
MAPE
MAPE

MAPE
MAPE
MAPE
MAPE
MAPE

ALU
ALU
ALU
ALU
ALU

ACC
ACC
ACC
ACC
ACC
MUX
MUX
MUX
MUX
MUX

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

Data. OUT
OUT
Data.
Data.
OUT
Data.
OUT
Data.
OUT

APE
APE
APE
APE
APE

APE
APE
APE
APE
APE

APE
A
RRAY
APE
RRAY
RRAY
APEAA
ARRAY
RRAY
APE
APE
APE
APE
APE
APE

MUX
MUX
MUX
MUX
MUX

MUX
MUX
MUX
MUX

ScratchRegisters
Registers
Scratch
Scratch
Registers
Scratch
Registers
Scratch
Registers

MAPE_row_sel
MAPE_row_sel
MAPE_row_sel
MAPE_row_sel
MAPE_row_sel

Reg
Reg
Reg
Reg
Reg

SM
SM
SM
SM
SM

Data.
Data.
OUT
Data.
OUT
Data.
OUT
Data.OUT
OUT

Data. IN
Data.
IN
Data.
Data.
IN
Data.IN
IN

Streaming
Memory Banks

INST. DECODE
DECODE &
&
INST.
INST.
DECODE
&
INST.
DECODE
INST.
DECODE
&&
CONTROL UNIT
UNIT
CONTROL
CONTROL
UNIT
CONTROLUNIT
UNIT
CONTROL

SM
SM
SM
SM
SM

SM
SM
SM
SM
SM

MUX
MUX
MUX
MUX
MUX

Data.Read
Read
Data.
Data.
Read
Data.
Read
Data.
Read
Data.Write
Write
Data.
Write
Data.
Data.
Write
Data.
Write
Data.Add
Add
Data.
Add
Data.
Data.
Add
Data.
Add
Data.
OUT
Data.
OUT
OUT
Data.
Data.
OUT
Data.
OUT
Data.
IN
Data.
IN
IN
Data.
Data.
Data.
ININ

SM SM
SM
SM
SM
SM
SM
SM
SM
SM

ALU
ALU
ALU
ALU
ALU

Reg
Reg
Reg
Reg
Reg

DATA
DATA
DATA
DATA
DATA
MEMORY
MEMORY
MEMORY
MEMORY
MEMORY

Inst.Read
Read
Inst.
Read
Inst.
Inst.
Read
Inst.
Read
Inst. Add
Add
Inst.
Inst.
Add
Inst.
Add
Inst.
Add
Instruction
Instruction
Instruction
Instruction
Instruction

1-to-many-DEMUX
1-to-many-DEMUX
1-to-many-DEMUX
1-to-many-DEMUX
1-to-many-DEMUX

INST.
INST.
INST.
INST.
INST.
MEMORY
MEMORY
MEMORY
MEMORY
MEMORY

Scalar
Scalar
Scalar
Scalar
Scalar
Reg.File
File
Reg.
Reg.
File
Reg.
File
Reg. File

MAPE
MAPE
MAPE
MAPE
MAPE

CLK
CLK
CLK
CLK
CLK
RESET
RESET
RESET
RESET
RESET

CAPE
CAPE
CAPE
CAPE
CAPE

MAPE
MAPE
MAPE
MAPE
MAPE

Completely Accurate
Processing Element

SM_col_sel
SM_col_sel
SM_col_sel
SM_col_sel
SM_col_sel

1-to-many-DEMUX
1-to-many-DEMUX
1-to-many-DEMUX
1-to-many-DEMUX
1-to-many-DEMUX

Quality
Control
Unit
&&Quality
Monitors
Quality
Control
Unit
Quality
Monitors
Quality
QualityControl
ControlUnit
Unit&
QualityMonitors
Monitors
Quality
Control
Unit
&&Quality
Quality
Monitors

INTERFACE
INTERFACE
INTERFACE
INTERFACE
INTERFACE

Data. IN
IN
Data.
Data.
IN
Data.
IN
Data.
IN

MAPE_col_sel
MAPE_col_sel
MAPE_col_sel
MAPE_col_sel
MAPE_col_sel

QUORA: QP 1D/2D VECTOR PROCESSOR

MAPE
MAPE
MAPE

MAPE
MAPE
MAPE

MAPE
MAPE
MAPE

ACC
ACC
ACC

Decode and Control


Logic
ALU
ALU
ALU

MAPE
MAPE
MAPE

Halt
Halt
Halt

SM
SM
SM

Quality
QualityControl
ControlUnit
Unit&
&Quality
QualityMonitors
Monitors
Quality
Control
Unit
&
Quality
Monitors

SM
SM
SM

MAPE
MAPE
MAPE

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APE
APE

SM
SM
SM

MAPE
MAPE
MAPE

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APE
APE

SM
SM
SM

ALU
ALU
ALU
ACC
ACC
ACC

SM
SM
SM
SM
SM
SM

MAPE
MAPE
MAPE

MAPE
MAPE
MAPE

Enable
quality
ACC
ACC
ACC
configurable
execution
ALU
ALU
ALU

MUX
MUX
MUX

APE
APE
APE

APE
APE
APE

APE
APE
APE

MUX
MUX
MUX

MAPE
MAPE
MAPE

Quality Control Unit


Reg
Reg
Reg

Scratch Registers
ScratchRegisters
Registers
Scratch

MAPE_row_sel
MAPE_row_sel
MAPE_row_sel

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APE
APE

APE
APEAAARRAY
RRAY
APE
RRAY
APE

Monitor error and


MUX
provide
feedback
MUX
MUX

APE
APE
APE

Data.
Data.OUT
OUT
Data.
OUT

APE
APE

APE
APE
APE

APE
APE
APE

Data.
OUT
Data.OUT
OUT
Data.

SM_row_sel
SM_row_sel
SM_row_sel

Prog.
Prog.Counter
Counter
Prog.
Counter

SM
SM
SM

SM
SM
SM

SM
SM
SM

Reg
Reg
Reg

1-to-many-DEMUX
1-to-many-DEMUX
1-to-many-DEMUX

Data. Read
Data.Read
Read
Data.
Data. Write
Data.Write
Write
Data.
Data. Add
Data.Add
Add
Data.
Data.
OUT
Data.OUT
OUT
Data.
Data. IN
Data.IN
IN
Data.

INST.
INST.DECODE
DECODE&&
&
INST.
DECODE
CONTROL
UNIT
CONTROL
UNIT
CONTROL UNIT

QualityControl
ControlUnit
Unit&&&Quality
QualityMonitors
Monitors
Quality
Quality
Control
Unit
Quality
Monitors

Instruction
Instruction

Data.IN
Data.
Data.
ININ

DATA
DATA
DATA
MEMORY
MEMORY
MEMORY

Inst. Read
Inst.Read
Read
Inst.

Inst.
Inst.Add
Add
Inst.
Add
Instruction

SM
SM
SM

ALU
ALU
ALU

Scratch Registers
Scratch
ScratchRegisters
Registers

Scalar
Scalar
Scalar
Reg.
Reg.File
File
Reg.
File

MAPE
MAPE
MAPE

CAPE
CAPE
CAPE
CLK
CLK
CLK
RESET
RESET
RESET

INST.
INST.
INST.
MEMORY
MEMORY
MEMORY

SM_col_sel
SM_col_sel
SM_col_sel

1-to-many-DEMUX
1-to-many-DEMUX
1-to-many-DEMUX

MUX
MUX
MUX

INTERFACE
INTERFACE
INTERFACE

Data.
Data.IN
IN
Data.
IN

MAPE_col_sel
MAPE_col_sel
MAPE_col_sel

QUALITY CONFIGURABLE EXECUTION: PRECISION


SCALING
Scale the precision of input operands to APEs/MAPEs

4 different flavors

Up/down operand round-off


Input operand

Xpsc[i]
Precision scaled
operand

e.g., PSc == 3

MUX

PSc

MUX

No. of bits to
scale precision
for

Up/Down Precision
Scaling

X[i]

1<<
0

X[2:0] >= 4 ?
Round up : Round down

PRECISION SCALING WITH ERROR COMPENSATION


Quality specified at the output of vector operations
Key idea: Compensate errors across many scalar operations

to reduce overall instruction level error


X[i]
PSc
Err

1<<

>

MUX

Track the error in


positive and
negative directions

Xt

P.Err

Modulate the
threshold for round-off

N.Err
MUX

Err.[i]
Enables error feedback
to software

MAX

MAX/+

Err.[i+1]

Xpsc[i]

PRECISION SCALING: ARRAY


PSc. units located along

MAPE

MAPE

PSc.
Unit

PSc.
Unit

PSc.
Unit

PSc.
Unit

APE

APE

APE

Error
+/>

MAPE

PSc.
Unit

ACC

C.PSc

Benefits: Dynamic power,

clock gating, precision of


future instructions

C.PSc.

Op-code

Actual Error
Op-code

Error
PSc

R.PSc

Error target

MAPE

Quality Control unit


MAPE

the borders of the 2D


array

LEVEL VIEW

Gated CLK

CL
K

Overheads: PSc. units

shared across PEs in a


row/column. <1% of
overall energy

MAPE

PSc.
Unit

APE

APE

APE

APE

MAPE

PSc.
Unit

APE

APE

APE

APE

MAPE

PSc.
Unit

APE

APE

APE

APE

QUALITY CONTROL UNIT

E. Reg = 2

MAPE
MAPE

MAPE
MAPE

MAPE
MAPE

+/>
+/>

PSc.
PSc.
Unit
Unit

PSc.
PSc.
Unit
Unit

PSc.
PSc.
Unit
Unit

APE

APE
ACC
ACC

Error target
MAPE

PSc.
Unit
PSc.
Unit

Gated CLK
Gated CLK

max (|. |, |. |)

MAPE

MAPE

PSc.
Unit
PSc.
Unit

APE

APE

C.PSc.

APE

APE

APE

APE

APE

APE

APE

APE

Err.Reg

APE

APE

APE

APE+/>APE

APE

APE

APE

APE

APE

APE

APE

APE

APE

MAPEOp-code APE

MAPE

More details in paper

PSc.
Unit
PSc.
Unit

Error
CLK
PSc

APE

APE

CL
K

Actual Error
MAPE

C.PSc
Op-code

MAPE

R.PSc = 0

PSc.
PSc.
Unit
Unit

Err.Reg
Error

MAPE

PSc.
PSc.
Unit
Unit

C.PSc
C.PSc. .

C.PSc
R.PSc

Actual
Error
Actual
Error
Op-code
Op-code

Error
Error
PSc
PSc

Op-code

ErrorError
target
target

E.g., MAC operation

C.PSc = 1 +

Quality
Control unit
Quality control unit

R.PSc

instruction level quality


bound

MAPE
MAPE

Set PSc based on required

APE

EXPERIMENTAL METHODOLOGY
RTL implementation

of QUORA using
Verilog HDL
Synthesized to IBM

45nm technology
node

Design flow: Synopsys


Design Compiler,
ModelSim, Synopsys
Power Compiler

Micro-architectural Parameters

Value

Array Dimensions

16 X 16

No. of PEs (APEs + MAPEs+ CAPE)

289 (256 + 32 + 1)

Size of Register File CAPE / MAPE

32 / 8

No. of SM elements

32

Depth of SM elements

64

Operating Frequency

250 MHz

Metric

Value

Feature Size

45nm

Area

2.6 mm2

Power

367.8 mW

Gate Count

502042

APEs (%)

1%

0%

MAPEs (%)

1%
19%

CAPE(%)
51%
28%

SMs (%)
PScE (%)
Misc. (%)

BENCHMARKS

Search image

Results

Principle Component
Analysis
SVM Classifier

Applications

Algorithm

Dataset

Handwritten Digit Recognition (SVM-MNIST)

SVM

MNIST

Object Recognition (SVM-NORB)

SVM

NORB

Digit Classification (CNN)

CNN

MNIST

Eye Detection (GLVQ)

GLVQ

Image set from NEC labs.

Optical Character Recognition (k-NN)

K-NN

OCR digits

Census Data Analysis (ANN)

ANN

Adult

Document Search (SSI)

SSI

Subset of Wikipedia

Image Segmentation (K-Means-Seg)

K-means

Berkeley dataset

Optical Character Clustering (K-Means-OCR)

K-means

OCR digits

0: Burger
1: Bread
2: Food
.
.
25: McDonals

Quality Metric

Percentage
classification
accuracy

No. correct in top 25


results
Mean distance of
clustered points from
respective centroids

RESULTS SUMMARY

Normalized Energy -->

Energy savings
1.2

No Approx.

< 0.5%

~ 2.5 %

~ 7.5%

1
0.8
0.6
0.4
0.2
0

1.05-1.7X savings for NO loss in output quality


1.18-2.1X savings for < 2.5% quality loss

> 2.5X savings for < 7.5% quality loss

RESULTS SUMMARY
QP-instructions in QUORA
10%

1% 3%

2%

96%
QP-APE

QP-MAPE

Accurate

88%
QP-APE

Dynamic instruction count

QP-MAPE

Energy

90% of application energy in quality

programmable instructions

Accurate

RESULTS SUMMARY
Precision scaling mechanisms
1.2

0.8

MAPE Energy -->

APE Energy -->

1.05

Trunc
Up/Down
Err. Comp

0.6
0.4
0.2

Trunc
Up/Down
Err. Comp

0.95

0.9

0.85

0.8

0
0

0.5

Average Error (%) -->

MAC - APE

0.5
1
1.5
Average Error (%) -->

ACC - MAPE

Precision scaling with error compensation

provides superior energy vs. quality trade-off

SUMMARY
Intrinsic application resilience: A new dimension to

optimize HW and SW
Objective: Energy-efficient & programmable processor

for approximate computing


Quality programmable processors: Quality codified as

part of the instruction set


Quora: Quality programmable 1D/2D vector processor
Quality programmable ISA and microarchitecture