You are on page 1of 24

SCRATCHPAD MEMORIES: A

DESIGN ALTERNATIVE FOR


CACHE ON-CHIP MEMORY IN
EMBEDDED SYSTEMS
- Nalini Kumar
Gaurav Chitroda
Komal Kasat
OUTLINE

04/09/2010
 Introduction
 Scratch pad memory

Spring 2010, EEL 6935, Embedded Systems


 Cache memory

 Proposed methodology

 Results

 Conclusions

2
04/09/2010
 INTRODUCTION
 Scratch pad memory

Spring 2010, EEL 6935, Embedded Systems


 Cache memory

 Proposed methodology

 Results

 Conclusions

3
INTRODUCTION

04/09/2010
 Scratch pad memory:
A high speed internal memory used for temporary

Spring 2010, EEL 6935, Embedded Systems


storage of calculations, data and other work in progress.
 It is next closest memory to the ALU after the internal
registers.
 Scratch pad based systems have NUMA(Non-Uniform
Memory Access) latencies, and use explicit instructions
to move data. DMA based data transfer is often used.
 On chip caches using SRAM consume power in the
range of 25% to 45% of the total chip power
 Current embedded processors for multimedia
applications have on-chip scratch pad memories 4
INTRODUCTION

04/09/2010
 Scratchpad vs. Cache:
 A scratchpad doesn’t contain a copy of data that is stored

Spring 2010, EEL 6935, Embedded Systems


in the main memory.
 Scratchpad memory is directly manipulated by
applications.
 In cache memory systems mapping of program elements
is done during runtime, in scratch pad memory systems it
is done either by the user or by the compiler using a
suitable algorithm
 Prior studies on scratch pad memories do not address
the impact on area
5
CONTRIBUTIONS

04/09/2010
 The paper proposes scratchpad memory as an
alternative to cache memory as on-chip memory for

Spring 2010, EEL 6935, Embedded Systems


computationally intensive applications.
 CACTI tool is used for computing area and energy for
AT91M40400 target architecture.
 The results establish scratchpad memory as a low
power alternative in most situations with an average
energy reduction of 40%

6
04/09/2010
 Introduction
 SCRATCH PAD MEMORY

Spring 2010, EEL 6935, Embedded Systems


 Cache memory

 Proposed methodology

 Results

 Conclusions

7
SCRATCH PAD MEMORY

04/09/2010
 Memory array with the
Memory Cell decoding and the column
circuitry logic

Spring 2010, EEL 6935, Embedded Systems


 Memory objects are mapped

Memory Array to the scratch pad in the last


stage of the compiler
 It occupies one distant part of
the memory address space.
No need to check for
data/instr. availability in the
scratch pad
 Reduces the comparator and
6 Transistor Static RAM the signal miss/hit
acknowledging circuitry 8

Figure: Scratch Memory Array


SCRATCH PAD MEMORY

04/09/2010
 Area of scratchpad, As
As = Asde + Asda + Asco + Aspr + Asse + Asou

Spring 2010, EEL 6935, Embedded Systems


 Energy Consumption is estimated from the energy
consumption of the components
Escratchpad = Edecoder + Ememcol
 Components:
Data decoder, data array area, column multiplexers, pre
charge circuit, data sense amplifiers, output driver
circuitry
 Memory array is the major consumer of energy
 CACTI tool first computes the capacitances for each
unit then estimates the energy 9
ESTIMATING THE ENERGY CONSUMPTION

04/09/2010
 For the memory array:
Ememcol = Cmemcol * Vdd2 * P0->1

Spring 2010, EEL 6935, Embedded Systems


 Cmemcol is the capacitance of the memory array unit
and is calculated as
Cmemcol = ncols * (Cpre + Creadwrite)
 P0->1 is the probability of bit toggle, 0.5
 Only two word lines are switched regardless of the
change in the address bits
 Total energy spent in the scratch pad memory is
Esptotal = SPaccess * E scratchpad
 The only case that holds good is read or write access 10
04/09/2010
 Introduction
 Scratch pad memory

Spring 2010, EEL 6935, Embedded Systems


 CACHE MEMORY

 Proposed methodology

 Results

 Conclusions

11
CACHE MEMORY

04/09/2010
 Area model is based on
Tag Array Data Array
the transistor count in

Spring 2010, EEL 6935, Embedded Systems


the circuitry
 Area of the cache,
Ac = Atag + Adata
where
Atag = Adt + Ata + Aco + Apr + Ase +
Acom + Amu
and Adata = Ade + Ada + Acol + Apre +
Figure: Cache Memory Organization Asen + Aout

12
04/09/2010
 Introduction
 Scratch pad memory

Spring 2010, EEL 6935, Embedded Systems


 Cache memory

 PROPOSED METHODOLOGY

 Results

 Conclusions

13
EXPERIMENTAL SETUP

04/09/2010
 Compare same size cache with scratchpad memory
(the delay of cache is higher than scratchpad for the

Spring 2010, EEL 6935, Embedded Systems


same technology)
 Identification and Assignment of critical data
structures to scratch pad in based on a packing
algorithm
 Total number of clock cycles determines the
performance
 Larger the number of clock cycles, lower the
performance because on-chip configuration doesn’t
change the clock period
14
SCRATCH PAD MEMORY ACCESS

04/09/2010
 Performance estimation from the trace file.
 An appropriate latency is added to the overall

Spring 2010, EEL 6935, Embedded Systems


program delay on scratchpad access:
 one for scratch pad read/write access,
 one cycle and one wait cycle for 16 bit main memory
access,
 one cycle plus three wait states for main memory 32 bit
access

Access Number of Cycles


Cache Using Cache calculations
Scratch Pad 1 cycle
Main memory 16 bit 1 cycle + 1 wait cycle
15
Main memory 32 bit 1 cycle + 1 wait cycle
CACHE MEMORY ACCESS

04/09/2010
 Authors assume a write through cache
 Read Hit: Tag array is accessed. No write to cache and no access
to main memory

Spring 2010, EEL 6935, Embedded Systems


 Read Miss: One cache read operation, L (line size) words written

to cache. One main memory read event of size L and no main


memory write
 Write Hit: Cache write followed by memory write

 Write Miss: One cache tag read and main memory write. No

cache update.

Access Caread Cawrite Mmread Mmwrite


type
Read hit 1 0 0 0
Read miss 1 L L 0
Write hit 0 1 0 1
Write 16
1 0 0 1
miss
FLOW DIAGRAM
C

04/09/2010
Benchmark
Cache
ARMulator Number of
trace analysis

Spring 2010, EEL 6935, Embedded Systems


Energy Aware Cycles
Compiler

Mapping Energy
CACTI
Algorithm Estimates
Cache/Scratch
Compiler Support Pad Size
Analytical Area
model Estimates

Scratchpad
Trace Analysis Number of 17
cycles
EXPERIMENTAL SETUP

04/09/2010
 Target architecture:
 AT91M40400, based on embedded ARM 7TDMI embedded processor

Spring 2010, EEL 6935, Embedded Systems


 High performance RSIC processor with a very low power consumption
 On-chip scratch memory of 4KB. 32 bit data path and two instruction sets.
 encc – energy aware complier, uses a special packing algorithm-
knapsack algorithm for assigning code and data blocks to the scratch
pad memory
 The binary output of the compiler is simulated on the ARMulator to
produce a trace file.
 ARMulator accepts the cache size as a parameter for on-chip cache
configuration and generates the performance as number of cycles.

The area and performance estimates are made for the 0.5um 18
technology
04/09/2010
 Introduction
 Scratch pad memory

Spring 2010, EEL 6935, Embedded Systems


 Cache memory

 Proposed methodology

 RESULTS

 Conclusions

19
RESULTS Cache per access(2kB) 4.57 nJ

04/09/2010
Scratch pad per access(2kB) 1.53 nJ
Main memory read access, 2 bytes 24.00 nJ
The average area, time and AT Main memory read access, 4 bytes 49.30 nJ
product reductions are 34% Main memory write access, 4 bytes 41.10 nJ

Spring 2010, EEL 6935, Embedded Systems


18% and 46%
Table: Energy per access of various devices

Table: Area/Performance ratios for bubble-sort


Size Bytes Area Area CPU CPU cycles, Area Time Area-time
Cache Scratchpad cycles Scratchpad reduction reduction product
Cache
64 6744 4032 481.9 347.5 0.40 0.28 0.44
128 11238 7104 302.4 239.9 0.37 0.21 0.51
256 21586 14306 264.0 237.9 0.34 0.10 0.55
512 38630 26722 242.6 237.9 0.31 0.10 0.61
1024 74680 53444 241.7 192.0 0.28 0.21 0.55
2048 142224 102852 241.5 192.0 0.28 0.20 0.57
Average 0.33 0.18 0.54 20
RESULTS

04/09/2010
Spring 2010, EEL 6935, Embedded Systems
Figure: Energy consumed by the memory Figure: Comparison of cache and scratch pad
system memory area

21
04/09/2010
 Introduction
 Scratch pad memory

Spring 2010, EEL 6935, Embedded Systems


 Cache memory

 Proposed methodology

 Results

 CONCLUSION

22
CONCLUSION

04/09/2010
 Presents an approach for selection of on-chip memory
configurations

Spring 2010, EEL 6935, Embedded Systems


 Results show that scratch pad based compile time
memory outperforms cache-based run-time memory
on almost counts.
 40% average reduction for the application considered

 Authors propose study of DRAM based memory


comparisons since memory bandwidth and on-chip
memory capacity are limiting factors for many
applications.
 Also, the energy models for both cache and scratchpad
23
need to be validated by real measurements
04/09/2010 Spring 2010, EEL 6935, Embedded Systems
24
QUESTIONS

You might also like