Smart Memory

www.studymafia.
org
Seminar
On
Smart Memories
Submitted To: Submitted By:

www.studymafia.org www.studymafia.org
CONTENT
 Packet Processing Workload Challenges
 Solution –Smart Memory
 Introduction to Smart Memory
 Smart Memory Architecture
 Packet Processing Bottlenecks
 Smart Memory
 Advantages
 Reference
OVERVIEW
1. High Performance Packet Processing

Challenges
2. Solution –Smart Memory
3. Smart Memory Architecture

PACKET PROCESSING WORKLOAD CHALLENGES
• Sequential memory references

 For lookups (L2, L3, L4, and L7)
 Finite automata traversal
• Read-modify-write Tons of memory reference sand minimal compute
 Statistics, counters, token-bucket, mutex, etc

• Pointer and link - list management
 Buffer management, packet queues, etc.
• Traditional implementations use
Memory Memory Memory Memory
 Commodity memory to store data
 NPs and ASICs to process data in memory
Performance Barriers:
1. Memory and chip I/O bandwidth P P P P P P
2. Memory latency P P P P P P
3. Lock for atomic access
ILLUSTRATION OF PERFORMANCE BARRIER I
0 1 1
0 2 1 3 1
0 P1
4 5 6
Interconnection network P2 1
0 7 P3
P P P P P P 8
9 P4
P5
P P P P P P IP lookup tree
Requires several transactions between memory and processors
Requires several transactions between memory and processors
Need more More latency In

Low IPC
processors inter connect
ILLUSTRATION OF PERFORMANCE BARRIER II
Memory Memory Memory • Lookups are read-only so
relatively easy
• Link-list, counters, policers, etc
Interconnection network
are read-modify-write
P P P P P P • Requires per memory address
P P P P P P lock in multi-core systems
Enqueue Lock free-list Counters Lock counter
Dequeue Get free node Read counter Locks often kept in memory
Unlock free-list Write counter
Requires another transaction
Lock list tail Unlock counter
Read list tail

Adds significant latency
Link free node
Update list tail

Single queue or single counter
Unlock list tail operations are extremely slow
SOLUTION –SMART MEMORY
 Attach simple compute with data
 Attach lock with data
 Enable local memory communication

INTRODUCTION TO SMART MEMORY
Memory Memory Memory
• What is the real problem?

Interconnection network  Compute occurs far away from data
P P P P P P  Lock acquire/release occurs far from data

P P P P P P
Fortunately, compute for packet
processing jobs are very modest!
• Solution: Make memory smarter by:

Compute Compute Compute Compute
Enabling local communication
Interconnection network
Managing lock close to data
P P P P P P
P P P P P P
Keeping compute close to data
INTRODUCTION TO SMART MEMORY
Memory Memory Memory
• What is the real problem?

Interconnection network  Compute occurs far away from data
P P P P P P  Lock acquire/release occurs far from data

P P P P P P
Fortunately, compute for packet
processing jobs are very modest!
Compute Compute Compute Compute

Smart Memory Advantages
(Get more off fewer transactions!)
1. Lower I/O bandwidth
2. Lower processing latency
Interconnection network 3. Higher IPC
4. Significantly higher single
P P P P P P counter/queue performance
P P P P P P
SMART MEMORY ARCHITECTURE
 Hybrid memory –eDRAM + DDR3-DRAM
 Serial chip I/O

SMART MEMORY CAPACITY AND
BANDWIDTH @100G
40
Memory bandwidth (Billion accesses / packet)
DP ng)
(St
20
I
ri
DPI
(re gex)
10
A
5 (algo CL
FBI ritha
(algori m)
2.5
th a m)
Queuing/
Scheduling
Layer 2
fwding
1.2
5 Statistics
/Counter
.62
Basic
Laye2
Packet
Buffer
.31
Vide
Buffer
.15
2 4 8 16 32 64 128 256 512+
Memory Capacity (MB)
SMART MEMORY CAPACITY AND
BANDWIDTH @100G
40
Memory bandwidth (Billion accesses / packet)
DP ng)
(St
20
I
ri
DPI Smart Memory uses

(re gex)
10
intelligent algorithms to
split the data-structures
64 banks eDRAM A
5 (algo CL
FBI ritha
(algori m)
2.5
th a m)
Queuing/
Scheduling
Layer 2
fwding
1.2
5 Statistics
/Counter
.62
Basic
Laye2
Packet
Buffer
.31
8 Channels of DDR3-RAM Vide
Buffer
.15
2 4 8 16 32 64 128 256 512+
Memory Capacity (MB)
SMART MEMORY HIGH LEVEL ARCHITECTURE
Global interconnect: DDR3
Packet processor complex provides fair communication between processors DRAM
andsmart memory
P P P P DRAM
SMEngine
P P P P
eDRAM eDRAM eDRAM eDRAM
P P P P SM engine SM engine SM engine SM engine
SM engine SM engine SM engine SM engine

Local interconnect:
provides local communication
between smart memory blocks
Smart Memory complex
SMART MEMORY HIGH LEVEL ARCHITECTURE
DDR3
Packet processor complex DRAM
P P P P Result
DRAM Read
SMEngine
Computation occurs closeto
memory reducing latency
P P P P
Requires fewer memory
transactions Read
Split tables into eDRAM eDRAM eDRAM eDRAM

eDRAM and DRAM SM engine SM engine SM engine SM engine
Smart Memory complex

I/O TECHNOLOGY CHOICE IN SMART MEMORY
 Smart Memory reduces the chip I/O

bandwidth significantly
 How to further optimize it?
 Bandwidth, latency and I/O bandwidth gap is growing

 On-chip bandwidth is much higher than memory I/O
Smart Memory use serial I/O
-4X throughput than RLDRAM and QDR

-3X fewer pins than DDR3 and DDR4
Based on MoSys data -2.5X reduces I/O power
HIGH SPEED LINE CARD WITH SMART MEMORY
540+ W Power 212 – W
Traditional Line Card 472+ cm^2 Area 148 cm^2
5600+ $ Cost 2520 $ Line Card with SM
DDR3 memory 2-3
10+DIMM,900+Pins times
CIF D
R
R
D
Y C S S S S D D
R R R R D
H R R RR TCM TCM 3 3
A A AR R R R A A
P D D DD
M M D D D D M M
D D DD
3 3 33 MD D D D
3 3 3 3
SM SM
NP NP TM
Y
H
C
P I U
F NP P
C
To Switch FabricDDR3
R R R R
R R RR
D D DD NP NP D
D
D
D
D
D
D
D TM
Y D D DD 3 3 3 3
H 3 3 33
P S
R
S
R
S
R
S
R
SM SM
TCM TCM
A A A A
M M M M
Y RDD 3
H
A M
U
C
P
P RDD 3
YHP YHP YHP YHP

Cantrol Plane
Memory
CONCLUDING REMARKS
• Packet Processing Bottlenecks

 Data away from compute
 I/O and memory bandwidth
• Smart Memory
 Keep compute close to data
 Keep locking close to data
 Provide inter-memory connect
• Advantages
 Reduced chip I/O bandwidth
 High performance and low latency
 Feature rich, flexible and programmable
 Lower cost
 One chip for several functions
REFERENCE
 www.google.com
 www.wikipedia.com
 www.studymafia.org
 www.projectsreports.org
Thank You
ALL

Smart Memory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Smart Memory

Uploaded by

Copyright:

Available Formats

www.studymafia.

Submitted To: Submitted By:

1. High Performance Packet Processing

2. Solution –Smart Memory

3. Smart Memory Architecture

• Sequential memory references

 Statistics, counters, token-bucket, mutex, etc

Requires several transactions between memory and processors

Requires several transactions between memory and processors

Need more More latency In

Read list tail

Update list tail

 Attach simple compute with data

 Attach lock with data

 Enable local memory communication

• What is the real problem?

P P P P P P  Lock acquire/release occurs far from data

• Solution: Make memory smarter by:

Enabling local communication

• What is the real problem?

P P P P P P  Lock acquire/release occurs far from data

Compute Compute Compute Compute

 Serial chip I/O

DPI Smart Memory uses

P P P P SM engine SM engine SM engine SM engine

eDRAM eDRAM eDRAM eDRAM

P P P P SM engine SM engine SM engine SM engine

eDRAM eDRAM eDRAM eDRAM

SM engine SM engine SM engine SM engine

eDRAM eDRAM eDRAM eDRAM

SM engine SM engine SM engine SM engine

eDRAM eDRAM eDRAM eDRAM

P P P P SM engine SM engine SM engine SM engine

Split tables into eDRAM eDRAM eDRAM eDRAM

eDRAM eDRAM eDRAM eDRAM

SM engine SM engine SM engine SM engine

Smart Memory complex

 Smart Memory reduces the chip I/O

 Bandwidth, latency and I/O bandwidth gap is growing

Smart Memory use serial I/O

-4X throughput than RLDRAM and QDR

YHP YHP YHP YHP

• Packet Processing Bottlenecks

You might also like