Intelligent RAM (IRAM

)
Richard Fromm, David Patterson, Krste Asanovic, Aaron Brown, Jason Golbus, Ben Gribstad, Kimberly Keeton, Christoforos Kozyrakis, David Martin, Stylianos Perissakis, Randi Thomas, Noah Treuhaft, Katherine Yelick, Tom Anderson, John Wawrzynek rfromm@cs.berkeley.edu http://iram.cs.berkeley.edu/ EECS, University of California Berkeley, CA 94720-1776 USA
Richard Fromm, IRAM tutorial, ASP-DAC ‘ February 10, 1998 98, 1

IRAM Vision Statement
Microprocessor & DRAM on a single chip:
on-chip memory latency 5-10X, bandwidth 50-100X l improve energy efficiency 2X-4X (no off-chip bus) l serial I/O 5-10X vs. buses l smaller board area/volume l adjustable memory size/width
l

Proc $ $ I/O I/O Bus L2$
Bus

L of ga i b c

D R A M
I/O I/O Proc
Bus

D R A M
Richard Fromm, IRAM tutorial, ASP-DAC ‘ February 10, 1998 98,

D Rf Aa Mb
2

Outline
l Today’ s

Situation: Microprocessor & DRAM l IRAM Opportunities l Initial Explorations l Energy Efficiency l Directions for New Architectures l Vector Processing l Serial I/O l IRAM Potential, Challenges, & Industrial Impact
Richard Fromm, IRAM tutorial, ASP-DAC ‘ February 10, 1998 98, 3

ASP-DAC ‘ February 10. Processor-Memory Performance Gap: (grows 50% / year) DRAM 7%/yr. 1998 98. 4 . IRAM tutorial.Processor-DRAM Gap (latency) Relative Performance 1000 100 10 1 “ Moore’ Law” s µProc 60%/yr. 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time Richard Fromm.

IRAM tutorial. . 1998 98. ASP-DAC ‘ February 10.Processor-Memory Performance Gap “Tax” Processor Alpha 21164 l StrongArm SA110 l Pentium Pro l l % Area % Transistors (~cost) (~power) 37% 77% 61% 94% 64% 88% 2 dies per package: Proc/I$/D$ + L2$ l Caches have no inherent value. only try to close performance gap 5 Richard Fromm.

ASP-DAC ‘ February 10.7 ns =108 clks x 6 or 648 ns l 1 2 X latency x 3X clock rate x 3X Instr/clock ⇒ . questionable future for MIPS and Alpha l Future dominated by IA-64? l Richard Fromm.5X l l Power limits performance (battery.0 ns = 68 clks x 2 or 136 ns 2nd Alpha (8400): 266 ns/3.Today’ Situation: Microprocessor s l Rely on caches to bridge gap l Microprocessor-DRAM performance gap time of a full cache miss in instructions executed 1st Alpha (7000): 340 ns/5.b. IRAM tutorial. 1998 98.): 180 ns/1.d.3 ns = 80 clks x 4 or 320 ns 3rd Alpha (t. 6 . cooling) l Shrinking number of desktop ISAs? No more PA-RISC.

000 (Miillions) $10. .000 $16B $15. IRAM tutorial.000 $0 $7B l Intel: 1Q 94 2Q 94 3Q 94 4Q 94 1Q 95 2Q 95 3Q 95 4Q 95 1Q 96 2Q 96 3Q 96 4Q 96 1Q 97 30%/year since 1987.Today’ Situation: DRAM s DRAM Revenue per Quarter $20. 1/3 income profit 7 Richard Fromm. 1998 98.000 $5. ASP-DAC ‘ February 10.

.Today’ Situation: DRAM s l Commodity. Synch DRAM l DRAM l l industry at a crossroads: Fewer DRAMs per computer over time Growth bits/chip DRAM: 50%-60%/yr l Nathan Myrvold (Microsoft): mature software growth (33%/yr for NT). ASP-DAC ‘ February 10. EDO. conservative Little organization innovation (vs. processors) in 20 years: page mode. l second source industry ⇒ high volume. growth MB/$ of DRAM (25%-30%/yr) l Starting to question buying larger DRAMs? 8 Richard Fromm. low profit. 1998 98. IRAM tutorial.

1998 98. ASP-DAC ‘ February 10. Intel) Minimum Memory Size 4 MB 8 MB 16 MB 32 MB ‘ 86 1 Mb 32 DRAM Generation ‘ 89 ‘ 92 ‘ 96 ‘ 99 ‘ 02 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb 8 Memory per DRAM growth 16 4 @ 60% / year 2 8 4 8 1 2 4 8 1 2 9 Memory per 64 MB System growth 128 MB @ 25%-30% / year 256 MB Richard Fromm. . IRAM tutorial.Fewer DRAMs/System over Time (from Pete MacWilliams.

. . memory size l Gap means performance challenge is memory l DRAM companies at crossroads? Dramatic price drop since January 1996 l Dwindling interest in future DRAM? l l Too much memory per chip? l Alternatives to IRAM: fix capacity but shrink DRAM die. .Multiple Motivations for IRAM l Some apps: energy. 1998 98. board area. ASP-DAC ‘ February 10. 10 Richard Fromm. more out-of-order CPU.. IRAM tutorial. packaging breakthrough.

IRAM tutorial. ASP-DAC ‘ February 10.40 µm DRAM 1 1.23 Richard Fromm. 1998 98.1 2 (Kbits/λ ) 1.35 µm logic Transistors/cell 6 Cell size (µm2) 26.62 10.1 390 62.DRAM Density l Density of DRAM (in DRAM process) is much higher than SRAM (in logic process) l Pseudo-3-dimensional trench or stacked capacitors give very small DRAM cell sizes 64 Mbit DRAM 0.3 Ratio 6:1 16:1 21:1 1:39 1:51 11 StrongARM Process 0. .41 2 (λ ) 216 Density (Kbits/µm2) 10.

270 ns=512b Next generation (21264): 180 ns for 512b? l Richard Fromm. bus to turn around.10X No parallel DRAMs. 1998 98. pins… l New focus: Latency oriented DRAM? l Dominant delay = RC of the word lines l Keep wire length short & block sizes small? l 10-30 ns for 64b-256b IRAM “ RAS/CAS” ? l AlphaStation 600: 180 ns=128b. 12 . ASP-DAC ‘ February 10. SIMM module. IRAM tutorial.Potential IRAM Latency: 5 . memory controller.

IRAM tutorial. each 256b wide l 20% @ 20 ns RAS/CAS = 320 GBytes/sec l If cross bar switch delivers 1/3 to 2/3 of BW of 20% of modules ⇒ 100 . 1998 98.Potential IRAM Bandwidth: 50-100X l 1024 1Mbit modules(1Gb). 13 . 256-bit memory bus.200 GBytes/sec l FYI: AlphaServer 8400 = 1.2 GBytes/sec l 75 MHz. ASP-DAC ‘ February 10. 4 banks Richard Fromm.

ASP-DAC ‘ February 10.Potential Energy Efficiency: 2X-4X l Case l study of StrongARM memory hierarchy vs. IRAM memory hierarchy (more later...) cell size advantages ⇒ much larger cache ⇒ fewer off-chip references ⇒ up to 2X-4X energy efficiency for memory l less energy per bit access for DRAM Richard Fromm. 1998 98. IRAM tutorial. 14 .

. IRAM tutorial. chip is a memory component Lower power via on-demand memory module activation? l Improve yield with variable refresh rate? l“ Map out”bad memory modules to improve yield? l Reduce test cases/testing time during manufacturing? l IRAM advantages even greater if innovate inside DRAM memory interface? (ongoing work.) 15 Richard Fromm. ASP-DAC ‘ February 10.. 1998 98.Potential Innovation in Standard DRAM Interfaces l Optimizations l when chip is a system vs. .

June 1997. on Computer Architecture. l“ The Energy Efficiency of IRAM Architectures” 24th Int’ . IRAM tutorial. 16 Richard Fromm. on Computer Architecture. l. ASP-DAC ‘ February 10. l. . 1998 98. . Feb. Workshop on Mixing Logic and DRAM. 24th Int’ Symp. 1997. l. l“ Evaluation of Existing Architectures in IRAM Systems” .. June 1997.“Vanilla” Approach to IRAM l Estimate performance of IRAM implementations of conventional architectures l Multiple studies: l “ Intelligent RAM (IRAM): Chips that remember and compute” 1997 Int’ Solid-State Circuits Conf. Symp.

17 . IRAM tutorial.“Vanilla” IRAM Performance Conclusions IRAM systems with existing architectures provide only moderate performance benefits l High bandwidth / low latency used to speed up memory accesses but not computation l Reason: existing architectures developed under the assumption of a low bandwidth memory system l Need something better than “ build a bigger cache” l Important to investigate alternative architectures that better utilize high bandwidth and low latency of IRAM l Richard Fromm. ASP-DAC ‘ February 10. 1998 98.

IRAM reduces the average energy per instruction: Energy per memory access = AEL1 + (MRL1 × AEL2 + (MRL2 × AEoff-chip)) where AE = access energy and MR = miss rate Richard Fromm. ASP-DAC ‘ February 10. 18 . which require more energy IRAM reduces energy to access various levels of the memory hierarchy Consequently.IRAM Energy Advantages l l l IRAM reduces the frequency of accesses to lower levels of the memory hierarchy. 1998 98. IRAM tutorial.

a.5 L2 to Memory (off-chip) 316.5 on-chip L2$(SRAM vs.6 (n. 19 . 0.) Conventional on-chip L1$(SRAM) 0. June 1997 Richard Fromm. ASP-DAC ‘ February 10. on Computer Architecture. DRAM) 2. measured in nJoules: IRAM 0.vs. 1998 98. on-chip) 98.35 µm technology Calculated energy efficiency (nanoJoules per instruction) l See “ The Energy Efficiency of IRAM Architectures.6 4.0 l l Based on Digital StrongARM.Energy to Access Memory by Level of Memory Hierarchy l For 1 access.4 L1 to Memory (off.5 1. Symp.”24th Int’ l. IRAM tutorial.

IRAM Energy Efficiency Conclusions IRAM memory hierarchy consumes as little as 29% (Small) or 22% (Large) of corresponding conventional models l In worst case. IRAM energy consumption is comparable to conventional: 116% (Small). 1998 98. ASP-DAC ‘ February 10. 20 . 76% (Large) l Total energy of IRAM CPU and memory as little as 40% of conventional. IRAM tutorial. assuming StrongARM as CPU core l Benefits depend on how memory-intensive the application is l Richard Fromm.

. 1998 98. ...A More Revolutionary Approach l“ . ASP-DAC ‘ February 10. an unacceptably small percentage of the die will be reachable during a single clock cycle.25 micron ...” l “ Will Physical Scalability Sabotage Performance Gains?”Matzke. IEEE Computer (9/97) 21 Richard Fromm.. IRAM tutorial.” l“ Architectures that require long-distance. for CMOS processes below 0. rapid interaction will not scale well .wires are not keeping pace with scaling of other features. … In fact.

& microprocessor design.. IRAM tutorial.” l Needs include high memory BW. 1998 98.. high network BW. and make heavy use of vectors of packed 8-. real-time response. ASP-DAC ‘ February 10. involve . and 32-bit integer and floating pt. significant real-time processing of continuous media streams. 22 Richard Fromm. fine grain parallelism l “ How Multimedia Workloads Will Change Processor Design” Diefendorff & Dubey.. .” l “ new media-rich applications.New Architecture Directions l“ … media processing will become the dominant force in computer arch.. 16-. IEEE Computer (9/97) . continuous media data types.

memo.. l Vision to see surroundings.) l + Cell phone l + Nikon Coolpix (camera.. 1998 98. notes.. ASP-DAC ‘ February 10. address book. to do list. calculator. e-mail) Richard Fromm.. paint . IRAM tutorial.) l + Gameboy l + speech. scan documents l Voice input/output for conversations 23 . vision recognition l + wireless data connectivity (web browser. tape recorder. .PDA of 2002? l Pilot PDA (calendar.

IRAM tutorial. 1998 98. statistical performance (cache) Multimedia ready: choose N * 64b. 2N * 32b. N operations: l l l l l l l are independent (⇒ short signal distance) use same functional unit access disjoint registers access registers in same order as previous instructions access contiguous memory words or known pattern can exploit large memory bandwidth hide memory latency (and any other latency) l l l Scalable (get higher performance as more HW resources available) Energy-efficient Mature. 4N * 16b Easy to get high performance.Potential IRAM Architecture (“New”?) l l l l Compact: Describe N operations with 1 short instruction Predictable (real-time) performance vs. . developed compiler technology 24 Richard Fromm. ASP-DAC ‘ February 10.

vv v3. IRAM tutorial.Vector Processing SCALAR (1 operation) VECTOR (N operations) r1 + r3 r2 v1 v2 + v3 vector length add r3. r1. 1998 98. ASP-DAC ‘ February 10. r2 add. v1. 25 . v2 Richard Fromm.

Vector Model l Vector operations are SIMD operations on an array of virtual processors (VP) Number of VPs given by the vector length register vlr Width of each VP given by the virtual processor width register vpw l l Richard Fromm. ASP-DAC ‘ February 10. 1998 98. IRAM tutorial. 26 .

27 . 1998 98.Vector Architectural State Virtual Processors ($vlr) VP0 VP1 VP$vlr-1 General vr0 vr1 Purpose Registers vr31 (32) $vpw Control Registers vcr0 vcr1 vcr31 32b Flag Registers (32) vf0 vf1 vf31 1b Richard Fromm. ASP-DAC ‘ February 10. IRAM tutorial.

.Variable Virtual Processor Width l Programmer thinks in terms of: l Virtual processors of width 16b / 32b / 64b (or vectors of data of width 16b / 32b / 64b) l Good model for multimedia Multimedia is highly vectorizable with long vectors l More elegant than MMX-style model l Many fewer instructions (SIMD) l Vector length explicitly controlled l Memory alignment / packing issues solved in vector memory pipeline l l Vectorization understood and compilers exist 28 Richard Fromm. IRAM tutorial. ASP-DAC ‘ February 10. 1998 98.

0]. 1]. 1998 98.Virtual Processor Abstraction l Use l vectors for inner loop parallelism (no surprise) One dimension of array: A[0.. l Think of machine as 64 “ virtual processors”(VPs) each with 32 scalar registers! (~ multithreaded processor) l 1 instruction updates 1 scalar register in 64 VPs l Hardware identical. ... A[2.0]. A[1. A[0. 0].0]. . IRAM tutorial. 2]. just 2 compiler perspectives 29 Richard Fromm. A[0. . l Think of machine as 32 vector regs each with 64 elements l 1 instruction updates 64 elements of 1 vector register l and l for outer loop parallelism! 1 element from each column: A[0.. ASP-DAC ‘ February 10.

. Overflow. inValid operation l Memory: speculative loads 30 Richard Fromm. IRAM tutorial.Flag Registers l Conditional Execution Most operations can be masked l No need for conditional move instructions l Flag processor allows chaining of flag operations l l Exception Processing l Integer: overflow. divide by Zero. saturation l IEEE Floating point: Inexact. 1998 98. ASP-DAC ‘ February 10. Underflow.

int u.vv . ARM.vs . IRAM tutorial.int 16 32 64 16 32 64 .v . and transfer operations Richard Fromm.Overview of V-IRAM ISA Scalar Standard scalar instruction set (e.int s.g. .fp s. fixed-point.sv 8 16 32 64 All ALU / memory operations under mask unit stride constant stride indexed 32 x 32 x 1b flag 32 x 64 x 1b flag 32 x 128 x 1b flag 31 load store 32 x 32 x 64b data 32 x 64 x 32b data 32 x 128 x 16b data + Plus: flag. convert.fp d. MIPS) Vector ALU Vector Memory Vector Registers alu op s.int u. ASP-DAC ‘ February 10. 1998 98.

ASP-DAC ‘ February 10. 1998 98. IRAM tutorial.Memory operations Load/store operations move groups of data between registers and memory l Three types of addressing l l Unit stride l Fastest Non-unit (constant) stride l Indexed (gather-scatter) l Vector equivalent of register indirect l Good for sparse arrays of data l Increases number of programs that vectorize l Richard Fromm. 32 .

ASP-DAC ‘ February 10. 33 .Vector Implementation l Vector register file Each register is an array of elements l Size of each register determines maximum vector length l Vector length register determines vector length for a particular operation l Multiple parallel execution units = “ lanes” l Chaining = forwarding for vector operations l Richard Fromm. 1998 98. IRAM tutorial.

IRAM tutorial.Vector Terminology Richard Fromm. 34 . ASP-DAC ‘ February 10. 1998 98.

ASP-DAC ‘ February 10.Example Execution of Vector Code Richard Fromm. 35 . 1998 98. IRAM tutorial.

ASP-DAC ‘ February 10. IRAM tutorial. high BW memory system? l >≈ 1M xtors for vector processor? l High power. . 10k) l Include demand-paged VM l Multimedia apps vectorizable too: N * 64b. modest CPU ⇒ OK scalar (MIPS 5K v.Aren’ Vectors Dead? t High Cost: ~$1M each? l Low latency. 4N * 16b l 36 Richard Fromm. than superscalar l Include modern. high bandwidth memory l Small % in future + scales to Bxtor chips l Vector micro can be low power + more energy eff. 2N * 32b. need elaborate cooling? l Poor scalar performance? l No virtual memory? l Limited to scientific applications? l Single-chip CMOS µP/IRAM l IRAM = low latency. 1998 98.

ASP-DAC ‘ February 10. mature (> 20 years) l Richard Fromm. 37 . eliminate traditional caches and speculation ⇒ repeatable speed as vary input l Easy to scale with technology l Much smaller than VLIW / EPIC l For sale. IRAM tutorial. 1998 98.More Vector IRAM Advantages l Real-time? Vector Performance? l Code density? l Compilers? l Fixed pipeline.

3x without cache.8 5. 1998 98.) 60 l SPECint_base95 5.Increasing Scalar Complexity MIPS µPs l Clock Rate l On-Chip Caches l Instructions / Cycle l Pipe stages l Model l Die Size (mm2) l R5000 R10000 10K/5K 200 MHz 195 MHz 1.7 300 8. .2x In-order Out-of-order --84 298 3. IRAM tutorial. ASP-DAC ‘ February 10. TLB l Development (person-yr.0x 32K/32K 32K/32K 1.0x 1.5x 32 205 6.0x 1(+ FP) 4 4.0x 5 5-7 1.6x 38 Richard Fromm.

Vectors Are Inexpensive
Scalar
N ops per cycle ⇒ O(N2) circuitry l HP PA-8000
l l

Vector
N ops per cycle ⇒ O(N + εN2) circuitry l T0 vector micro*
24 ops per cycle l 730K transistors total
l
l

4-way issue l reorder buffer: 850K transistors
l
l

incl. 6,720 5-bit register number comparators

only 23 5-bit register number comparators

*See http://www.icsi.berkeley.edu/real/spert/t0-intro.html
Richard Fromm, IRAM tutorial, ASP-DAC ‘ February 10, 1998 98, 39

MIPS R10000 vs. T0

Richard Fromm, IRAM tutorial, ASP-DAC ‘ February 10, 1998 98,

40

Vectors Lower Power
Single-issue Scalar
l l

Vector
l l

l

l

l

One instruction fetch, decode, dispatch per operation Arbitrary register accesses, adds area and power Loop unrolling and software pipelining for high performance increases instruction cache footprint All data passes through cache; waste power if no temporal locality One TLB lookup per load or store Off-chip access in whole cache lines

One instruction fetch,decode, dispatch per vector Structured register accesses Smaller code for high performance, less power in instruction cache misses Bypass cache One TLB lookup per group of loads or stores Move only necessary data across chip boundary
41

l

l

l

l

l

Richard Fromm, IRAM tutorial, ASP-DAC ‘ February 10, 1998 98,

ASP-DAC ‘ February 10. . 1998 98. IRAM tutorial.Superscalar Energy Efficiency Worse Superscalar l l Vector l l l Control logic grows quadratically with issue width Control logic consumes energy regardless of available parallelism Speculation to increase visible parallelism wastes energy Control logic grows linearly with issue width Vector unit switches off when not in use Vector instructions expose parallelism without speculation Software control of speculation when desired: l l l Whether to use vector mask or compress/expand for conditionals 42 Richard Fromm.

. RLE.Applications Limited to scientific computing? NO! l l l l l l l l l l Standard benchmark kernels (Matrix Multiply. other references available 43 Richard Fromm. ASP-DAC ‘ February 10. LZW) Cryptography (RSA. checksum) Databases (hash/join. Sort) Lossy Compression (JPEG. memset. FFT. image/video serving) Language run-time support (stdlib. image proc.) Speech and handwriting recognition Operating systems/Networking (memcpy. 1998 98. parity. graphics. IRAM tutorial. Convolution. Differencing. data mining. garbage collection) even SPECint95 significant work by Krste Asanovic at UCB. SHA/MD5) Multimedia Processing (compress. DES/IDEA. audio synth. . MPEG video and audio) Lossless Compression (Zero removal.

iw/16 image width image width image width image width transpose/multiply l DCT (video. IRAM tutorial.) l Separable convolution (img. 1998 98.Mediaprocesing Vector Lengths Kernel l Matrix Vector length # vertices at once image width 256-1024 image width. http://www.) (from Pradeep Dubey .) l FFT (audio) l Motion estimation (video) l Gamma correction (video) l Haar transform (media mining) l Median filter (image process. ASP-DAC ‘ February 10.com/people/p/pradeep/tutor. comm. proc.ibm.IBM.html) Richard Fromm.research. 44 .

Xbar. ASP-DAC ‘ February 10. Crossbar Switch .1 µm l 45 8 Vector Lanes (+ 1 spare) C P I U O +$ Memory (512 Mbits / 64 MBytes) Richard Fromm.13 µm. Vector ⇒ regular design l Spare Lane & Memory ⇒ 90% die repairable l Short signal distance ⇒ speed scales <0.V-IRAM-2 Floorplan Memory (512 Mbits / 64 MBytes) 0. IRAM tutorial. 1998 98. 1 Gbit DRAM l >1B Xtors: 98% Memory.

/Integer vector units l die: 16 x 16 mm Memory (128 Mbits / 16 MBytes) l xtors: 270M l power: ~2 Watts Ring-based Switch l Richard Fromm. 46 . 32 MB in Memory (128 Mbits / 16 MBytes) 16 banks x 256b. Pt. 1998 98.25 µm.18 µm DRAM. ASP-DAC ‘ February 10. IRAM tutorial. C 5 Metal Logic P l 200 MHz CPU 4 Vector Pipes/Lanes U I/O 4K I$. 4K D$ +$ l 4 Float. 128 subbanks l 0.Tentative V-IRAM-1 Floorplan 0.

ASP-DAC ‘ February 10. 1998 98.What about I/O? Current system architectures have limitations l I/O bus performance lags that of other system components l Parallel I/O bus performance scaled by increasing clock speed and/or bus width l Eg. 64-bit PCI: ~90 pins l Greater number of pins ⇒ greater packaging costs l l Are there alternatives to parallel I/O buses for IRAM? 47 Richard Fromm. . IRAM tutorial. 32-bit PCI: ~50 pins.

[DallyPoulton96] l l Serial lines require 1-2 pins per unidirectional link Access to standardized I/O devices l l Fiber Channel-Arbitrated Loop (FC-AL) disks Gbps Ethernet networks l l Serial I/O lines a natural match for IRAM Benefits l l Serial lines provide high I/O bandwidth for I/O-intensive applications I/O bandwidth incrementally scalable by adding more lines l Number of pins required still lower than parallel bus l How to overcome limited memory capacity of single IRAM? l l SmartSIMM: collection of IRAMs (and optionally external DRAMs) Can leverage high-bandwidth I/O to compensate for limited memory 48 Richard Fromm. . IRAM tutorial.Serial I/O and IRAM l Communication advances: fast (Gbps) serial I/O lines [YankHorowitz96]. 1998 98. ASP-DAC ‘ February 10.

Mixing Logic and DRAM.Example I/O-intensive Application: External (disk-to-disk) Sort l Berkeley NOW cluster has world record sort: 8. IRAM tutorial. ASP-DAC ‘ February 10. 0. SmartSIMM sorts 100 GB 49 See “ IRAM and SmartSIMM: Overcoming the I/O Bus Bottleneck” Workshop on . . 1998 98.1 GHz bus) l IRAM: 2-4 GIPS + 2 * 2-4Gb/s I/O + 2 * 2-4Gb/s Net l ISIMM: 16 IRAMs + net switch + FC-AL links (+ disks) l 1 IRAM sorts 9 GB. on Computer Architecture.6 GB disk-to-disk using 95 processors in 1 minute l Balanced system ratios for processor:memory:I/O Processor: N MIPS l Large memory: N Mbit/s disk I/O & 2N Mb/s Network l Small memory: 2N Mbit/s disk I/O & 2N Mb/s Network l l Serial I/O at 2-4 GHz today (v. 24th Int’ Symp. June 1997 l Richard Fromm.

50 . smaller board area ⇒ 115M embedded 32b RISC in 1996 [Microproc. IRAM tutorial. OK for many apps? l Systems headed to 2 chips: CPU + memory l Embedded apps leverage energy efficiency. Report] Richard Fromm. ASP-DAC ‘ February 10. adjustable memory capacity. 1998 98. so early IRAM = SRAM l Past l efforts memory limited ⇒ multiple chips ⇒ 1st solve the unsolved (parallel processing) Gigabit DRAM ⇒ ~100 MB.Why IRAM now? Lower risk than before l Faster Logic + DRAM available now/soon? l DRAM manufacturers now willing to listen l Before not interested.

area. ASP-DAC ‘ February 10. IRAM tutorial.) l Richard Fromm. DRAM vs. power. clusters. IDISK . yield..g.. 1998 98. CC-NUMA.. microprocessor? l Bandwidth / Latency oriented DRAM tradeoffs? l Reconfigurable logic to make IRAM more generic? l l Architecture How to turn high memory bandwidth into performance for real applications? l Extensible IRAM: Large program/data solution? (e. 51 .IRAM Challenges l Chip Good performance and reasonable power? l Speed. cost in DRAM process? l Testing time of IRAM vs. external DRAM.

) l Challenges in power/performance. code size. energy. testing. I/O 10X-100X improvements based on technology shipping for 20 years (not JJ. size. board area. IRAM tutorial. scaling. energy. 1998 98. ASP-DAC ‘ February 10. MEMS.IRAM Conclusion l IRAM l potential in memory BW and latency.. yield l Apps/metrics of future to design computer of future l V-IRAM can show IRAM’ potential s l multimedia. . . compilers l Shift semiconductor balance of power? 52 Who ships the most memory? Most microprocessors? Richard Fromm.. photons.

edu l Thanks l for advice/support: DARPA.edu/ rfromm@cs. .edu patterson@cs. SGI/Cray.berkeley. IRAM tutorial. 1998 98.berkeley. Sun Microsystems 53 Richard Fromm.Interested in Participating? l Looking for more ideas of IRAM enabled apps l Looking for possible MIPS scalar core l Contact us if you’ interested: re http://iram. ASP-DAC ‘ February 10. Samsung. ARM. California MICRO. Microsoft. LG Semiconductor. Neomagic. Intel.cs.berkeley. IBM.

Richard Fromm.. and/or go into more detail as time permits. IRAM tutorial.. ASP-DAC ‘ February 10. 54 . 1998 98.Backup Slides The following slides are used to help answer questions.

2x In-order Out-of-order --84 298 3.0x 5 5-7 1. TLB l Development (person-yr.3x without cache. IRAM tutorial.0x 1.8 5.0x 32K/32K 32K/32K 1.5x 32 205 6. 1998 98.6x 55 Richard Fromm.) 60 l SPECint_base95 5.Today’ Situation: Microprocessor s MIPS µPs l Clock Rate l On-Chip Caches l Instructions/Cycle l Pipe stages l Model l Die Size (mm2) l R5000 R10000 10K/5K 200 MHz 195 MHz 1.0x 1(+ FP) 4 4. ASP-DAC ‘ February 10.7 300 8. .

. . logic processes l Processes optimized for different characteristics Logic: fast transistors and interconnect l DRAM: density and retention l l Precise slowdown varies by manufacturer and process generation 56 Richard Fromm. 1998 98. ASP-DAC ‘ February 10. IRAM tutorial.. l Logic gates currently slower in DRAM vs.Speed Differences Today.

57 .… and in the Future l Ongoing trends in DRAM industry likely to alleviate current disadvantages DRAM processes adding more metal layers l Enables more optimal layout of logic l Some DRAM manufacturers (Mitsubishi. ASP-DAC ‘ February 10. 1998 98. Toshiba) already developing merged logic and DRAM processes l 1997 ISSCC panel predictions l Equal performance from logic transistors in DRAM process available soon l Modest (20-30%) increase in cost per wafer l Richard Fromm. IRAM tutorial.

Data-Restore 4. 58 . Column Access row decoder Word Line 1k rows bitline sense-amp column decoder 256 bits I/O energyrow access = 5 × energy column access 15 ns prechrg 20 ns readout 25 ns restore col 10 ns col Richard Fromm. Precharge 2. 1998 98. Data-Readout 3. IRAM tutorial. ASP-DAC ‘ February 10.DRAM Access 2k columns Steps: 1.

.Possible DRAM Innovations #1 l l More banks l Each bank can independently process a separate address stream Hides memory latency Increases effective cache size (sense-amps) 59 Independent Sub-Banks l l Richard Fromm. 1998 98. IRAM tutorial. ASP-DAC ‘ February 10.

IRAM tutorial. 1998 98.Possible DRAM Innovations #2 l Sub-rows l Save energy when not accessing all bits within a row 60 Richard Fromm. . ASP-DAC ‘ February 10.

1998 98.Possible DRAM Innovations #3 l Row buffers l Increase access bandwidth by overlapping precharge and read of next row access with col accss of prev row 61 Richard Fromm. ASP-DAC ‘ February 10. . IRAM tutorial.

IRAM tutorial. 1998 98.Commercial IRAM highway is governed by memory per IRAM? Laptop Network Computer Super PDA/Phone Video Games Graphics Acc. 32 MB 8 MB 2 MB Richard Fromm. ASP-DAC ‘ February 10. 62 .

Feb. ASP-DAC ‘ February 10. 1997.. Int’ Symp. on Computer Architecture. l l #4: “ The Energy Efficiency of IRAM Architectures” 24th .“Vanilla” Approach to IRAM l Estimate performance IRAM implementations of conventional architectures l Multiple studies: #1: “ Intelligent RAM (IRAM): Chips that remember and compute” 1997 Int’ Solid-State Circuits Conf. June 1997. IRAM tutorial. 24th . l l #2 & #3: “ Evaluation of Existing Architectures in IRAM Systems” Workshop on Mixing Logic and DRAM. . June 1997. l l Richard Fromm. Int’ Symp. on Computer Architecture. 1998 98. 63 .

64 . IRAM tutorial. benchmarks. ASP-DAC ‘ February 10.1-1.8 times slower l Database ⇒ 1. standard DRAM l Used optimistic and pessimistic factors for logic (1. 1998 98. DRAM speed (5-10X faster) for standard DRAM l Results Spec92 benchmark ⇒ 1.2 to 1.3-2.1 times faster l Sparse matrix ⇒ 1.0X slower).2 to 1.3X slower). SRAM (1.8 times faster l Richard Fromm.#1 l Methodology l Estimate performance of IRAM implementation of Alpha architecture l Same caches.“Vanilla” IRAM .1 times slower to 1.

IRAM tutorial.Methodology #2 l l l l l Execution time analysis of a simple (Alpha 21064) and a complex (Pentium Pro) architecture to predict performance of similar IRAM implementations Used hardware counters for execution time measurements Benchmarks: spec95int.“Vanilla” IRAM . 1998 98. all benchmarks fit completely in on-chip memory IRAM execution time model: computation time L1 miss count * memory access time Execution time = + clock speedup memory access speedup Richard Fromm. mpeg_encode. linpack1000. 65 . sort IRAM implementations: same architectures with 24 MB of on-chip DRAM but no L2 caches. ASP-DAC ‘ February 10.

.“Vanilla” IRAM . IRAM tutorial. 1998 98.Results #2 l l Equal clock speeds assumed for conventional and IRAM systems Maximum IRAM speedup compared to conventional: l l Less than 2 for memory bound applications Less than 1. ASP-DAC ‘ February 10.1 for CPU bound applications 66 Richard Fromm.

. wave5). li. perl.Methodology #3 l l Used SimOS to simulate simple MIPS R4000-based IRAM and conventional architectures Equal die size comparison l Are for on-chip DRAM in IRAM systems same as area for level 2 cache in conventional system l l Wide memory bus for IRAM systems Main simulation parameters l l On-chip DRAM access latency Logic speed (CPU frequency) l Benchmarks: spec95int (compress. IRAM tutorial.“Vanilla” IRAM . gcc). linpack1000 67 Richard Fromm. ASP-DAC ‘ February 10. 1998 98. ijpeg. su2cor. spec95fp (tomcatv.

1998 98. IRAM tutorial. 68 .“Vanilla” IRAM . ASP-DAC ‘ February 10.Results #3 Maximum speedup of 1.4 for equal clock speeds l Slower than conventional for most other cases l Richard Fromm.

Methodology #4 l Architectural models l l Simple CPU IRAM and Conventional memory configurations for two different die sizes (Small ≈StrongARM. nowsort. perl 69 Richard Fromm. 1998 98. . go. IRAM tutorial. Large ≈64 Mb DRAM) l l l Record base CPI and activity at each level of memory hierarchy with shade Estimate performance based on access CPU speed and access times to each level of memory hierarchy Benchmarks: hsfsys. ispell. compress. noway. ASP-DAC ‘ February 10. gs.“Vanilla” IRAM .

20 0.03 1.76 0.77 0.08 1.82 0.76 X) 1.89 1.01 l Performance is comparable: IRAM is 0.75 X) (1.95 1.78 1.04 1.27 0. ASP-DAC ‘ February 10.“Vanilla” IRAM .81 1.0 0.78 0.82 0.81 0.Results #4 l Speedup of IRAM compared to Conventional l Higher numbers mean higher performance for IRAM Model Clock rate hsfsys noway nowsort gs ispell compress go perl Small-IRAM (0.99 1.0 X) 0.04 Large-IRAM (0.78 1.31 0.90 1.04 1.08 0.09 1.02 1.76 to 1.75 X) (1.09 1.50 times as fast as conventional l Dependent on memory behavior of application 70 Richard Fromm. 1998 98.19 0.13 1. .50 0.02 1.77 0. IRAM tutorial.

ASP-DAC ‘ February 10. 71 . high offchip energy cost is avoided entirely l Richard Fromm. IRAM tutorial.Frequency of Accesses l On-chip DRAM array has much higher capacity than SRAM array of same area l IRAM reduces the frequency of accesses to lower levels of memory hierarchy. 1998 98. reducing the off-chip energy penalty l When entire main memory array is on-chip. which require more energy On-chip DRAM organized as L2 cache has lower offchip miss rates than L2 SRAM.

72 . 1998 98.Energy of Accesses l IRAM reduces energy to access various levels of the memory hierarchy On-chip memory accesses use less energy than off-chip accesses by avoiding high-capacitance offchip bus l Multiplexed address scheme of conventional DRAMs selects larger number of DRAM arrays than necessary l Narrow pin interface of external DRAM wastes energy in multiple column cycles needed to fill entire cache block l Richard Fromm. ASP-DAC ‘ February 10. IRAM tutorial.

25 0.68 0.46 1.00 0.38 0. ASP-DAC ‘ February 10.00 2. (16:1 density) IRAM to Conv.56 0.00 .63 1.74 0.00 0.Energy Results 1/3 gs 5.29 0.00 ispell compress IRAM to Conv.16 L-C-32 L-C-16 L-C-32 L-C-32 L-C-16 L-C-16 S-I-16 S-I-16 S-I-32 S-I-32 S-I-16 S-I-32 S-C S-C S-C L-I Model L-I Model Model 73 Richard Fromm. 1998 98.80 3. (32:1 density) Energy/Instruction (nJ) 4. L-I 0.00 Main memory Main memory bus L2 cache Data cache Instruction cache 0.90 0.59 0. IRAM tutorial.

00 2.00 Main memory Main memory bus L2 cache Data cache Instruction cache 3.00 0.00 . L-I Model 74 L-I 0. 1998 98.92 0.00 0.Energy Results 2/3 go 5. IRAM tutorial.66 0.61 L-C-32 L-C-32 L-C-16 L-C-16 S-I-16 S-I-16 S-I-32 S-I-32 S-C S-C Model Richard Fromm. ASP-DAC ‘ February 10.58 0. (16:1 density) IRAM to Conv.44 0.41 0.60 0. (32:1 density) Energy/Instruction (nJ) 4.00 perl IRAM to Conv.76 1.

(16:1 density) IRAM to Conv.40 0.00 2. (32:1 density) Energy/Instruction (nJ) 4.22 0. L-I 0.10 0.39 0. 1998 98.60 0.30 0.28 1. IRAM tutorial.57 0. ASP-DAC ‘ February 10.00 .78 3.00 Main memory Main memory bus L2 cache Data cache Instruction cache 1.72 0.26 0.00 L-C-32 L-C-16 S-I-16 L-C-32 L-C-16 L-C-32 S-I-32 L-C-16 S-I-16 S-I-32 S-I-16 S-I-32 S-C S-C S-C L-I L-I Model Model Model 75 Richard Fromm.Energy Results 3/3 hsfsys 5.00 noway nowsort IRAM to Conv.00 0.65 0.

76 . 1998 98. IRAM tutorial.Parallel Pipelines in Functional Units Richard Fromm. ASP-DAC ‘ February 10.

v mem VLOAD VALU VSTORE A T Load -> ALU RAW: ~100 cycles VW VR X1 X2 … XN VW add. ASP-DAC ‘ February 10. … . IRAM tutorial.v st.v … l Load → ALU sees full memory latency (large) 77 Richard Fromm. 1998 98.v ld.v mem A T VR add.v st.Tolerating Memory Latency Non-Delayed Pipeline F D X M W Memory Latency: ~100 cycles ld.

v ld.v st.v … add. IRAM tutorial. ASP-DAC ‘ February 10.v … 78 VLOAD VALU VSTORE A T VW Load -> ALU RAW: ~6 cycles FIFO VR X1 X2 … XN VW A T VR Delay ALU instructions until memory data returns l Load → ALU sees functional unit latency (small) l Richard Fromm.v st. 1998 98.Tolerating Memory Latency Delayed Pipeline F D X M W Memory Latency: ~100 cycles ld. .v add.

1998 98. l Scalar reads of vector unit state Element reads for partially vectorized loops l Count trailing zeros in flags l Pop count of flags l l Indexed vector loads and stores l Need to get address from register file to address generator Mask values from end of pipeline to address translation stage to cancel exceptions 79 l Masked vector loads and stores l Richard Fromm. ASP-DAC ‘ February 10. . IRAM tutorial.Latency not always hidden...

..) l Sorting l l “ Radix Sort for Vector Multiprocessors” Zagha . 2D. IRAM tutorial. . ASP-DAC ‘ February 10.Standard Benchmark Kernels l Matrix Multiply (and other BLAS) l “ Implementation of level 2 and level 3 BLAS on the Cray Y-MP and Cray-2” Sheikh et al.) l Convolutions (1D. Journal . . and Blelloch. of Supercomputing. 1998 98. Supercomputing 91 80 Richard Fromm. Supercomputing. . Journal of .. 5:291-305 “ High-Performance Fast Fourier Transform A Algorithm for the Cray-2” Bailey.. 3D. 1:43-60 l FFT (1D. 2D.

UCB) FFTs. ASP-DAC ‘ February 10. .Compression l Lossy l l Lossless Zero removal l Run-length encoding l Differencing l JPEG lossless mode l LZW l JPEG source filtering and down-sample l YUV ↔ RGB color space conversion l DCT/iDCT l run-length encoding l l MPEG video l Motion estimation (Cedric Krumbein. filtering 81 l MPEG audio l Richard Fromm. IRAM tutorial. 1998 98.

91 4. ASP-DAC ‘ February 10.01 1.01 DES/IDEA (secret key ciphers) l l l l l SHA/MD5 (signature) l Partially vectorizable 82 Richard Fromm. (MB/s) (MB/s) (MB/s) 14.96 1.85 1.04 0. 1998 98. CBC dec. .70 13. IRAM tutorial.Cryptography l l RSA (public key) l Vectorize long integer arithmetic ECB mode encrypt/decrypt vectorizes IDEA CBC mode encrypt doesn’ vectorize (without interleave mode) t DES CBC mode encrypt can vectorize S-box lookups CBC mode decrypt vectorizes IDEA mode T0 (40 MHz) Ultra-1/170 (167 MHz) Alpha 21164 (500 MHz) ECB CBC enc.

Microsoft Talisman MSP l High Quality Additive Audio Synthesis Todd Hodes. . UCB l Vectorize across oscillators l l Image Processing l Adobe Photoshop 83 Richard Fromm. IRAM tutorial. Ardent. lighting) l Pixar Cray X-MP.Multimedia Processing l Image/video/audio compression (JPEG/MPEG/GIF/png) l Front-end of 3D graphics pipeline (geometry. Stellar. ASP-DAC ‘ February 10. 1998 98.

ASP-DAC ‘ February 10. IRAM tutorial.Speech and Handwriting Recognition l Speech recognition Front-end: filters/FFTs l Phoneme probabilities: Neural net l Back-end: Viterbi/Beam Search l l Newton handwriting recognition Front-end: segment grouping/segmentation l Character classification: Neural net l Back-end: Beam Search l l Other handwriting recognizers/OCR systems l Kohonen nets l Nearest exemplar 84 Richard Fromm. 1998 98. .

ASP-DAC ‘ February 10. zero-removal) l Richard Fromm. IRAM tutorial. 85 .Operating Systems / Networking l Copying and data movement (memcpy) l Zeroing pages (memset) Software RAID parity XOR l TCP/IP checksum (Cray) l RAM compression (Rizzo ‘ 96. 1998 98.

ASP-DAC ‘ February 10. IRAM tutorial. 86 . 1998 98. UCB) l Database mining l Image/video serving l Format conversion l Query by image content l Richard Fromm.Databases Hash/Join (Rich Martin.

1998 98. str* l Dhrystone 1. Journal Supercomputing.98 speedup with vectors l Dhrystone 2. Bendiksen.1 on T0: 1. 87 .1 on T0: 1. 3:151-160 l Vector GC 9x faster than scalar GC on Cyber 205 l Richard Fromm.Language Run-time Support Structure copying l Standard C libraries: mem*. ASP-DAC ‘ February 10.63 speedup with vectors l l Garbage Collection “ Vectorized Garbage Collection” Appel and . IRAM tutorial.

not vectorizable vortex .main loop (90% of runtime) vectorized by Cray C compiler 88 Richard Fromm. 35% of runtime in mark/scan garbage collector l Previous work by Appel and Bendiksen on vectorized GC could rewrite for vectors? go .SPECint95 l l l l l l l l l m88ksim . IRAM tutorial.over 95% of runtime in vectorizable functions li . . 1998 98.??? eqntott (from SPECint92) . mem*) gcc . but up to 10% of time in standard library functions (str*.mostly non-vectorizable. ASP-DAC ‘ February 10.36% speedup for decompression with vectorization (including code modifications) ijpeg .approx.most time spent in linke list manipulation l perl .42% speedup with vectorization compress .

ASP-DAC ‘ February 10.18-0.20 micron.5 Volt logic ~10 W @ 1. fast transistor Memory 32 MB Die size ~250 mm2 Vector lanes 4 x 64-bit (or 8 x 32-bit or 16 x 16-bit) Target Low Power High Performance Serial I/O 4 lines @ 1 Gbit/s 8 lines @ 2 Gbit/s Power ~2 W @ 1-1. 89 .V-IRAM-1 Specs/Goals Technology 0.8 GFLOPS64-3 GFLOPS16 2 GFLOPS64-8 GFLOPS16 Clockindustry 400 scalar / 200 vector MHz 500 scalar / 500 vector MHz Perfindustry 1.5-2 Volt logic Clockuniversity 200 scalar / 100 vector MHz 250 scalar / 250 vector MHz Perfuniversity 0. 1998 98.6 GFLOPS64-6 GFLOPS16 4 GFLOPS64-16 GFLOPS16 Richard Fromm. 5-6 metal layers. IRAM tutorial.

268 MIPS.1M Xtors 160 MHz @ 1.36W typ.2X ⇒ 4..0X ⇒ 1.0 v = 245 “ MIPS”< 1.3X ⇒ 3.0X ⇒ 2.5v. 26 W 5.8 W 0.5 W Vdd reduction ⇒ l Reduce functions ⇒ l Scale process ⇒ l Clock load ⇒ l Clock rate ⇒ l 6/97: 233 MHz.0 W l Start l with Alpha 21064 @ 3. IRAM tutorial.6 W 0. High Clock rate IRAM? l Digital l Strong ARM 110 (1996): 2. $49 90 Richard Fromm.5 v = 184 “ MIPS”< 0.5 W l 215 MHz @ 2. 0. 1998 98. .3X ⇒ 1.9 W 1. ASP-DAC ‘ February 10.6 W 0.How to get Low Power.

[DallyPoulton96] l l Serial lines require 1-2 pins per unidirectional link Access to standardized I/O devices l l Fiber Channel-Arbitrated Loop (FC-AL) disks Gbps Ethernet networks l l Serial I/O lines a natural match for IRAM Benefits l l l Avoids large number of pins for parallel I/O buses IRAM can sink high I/O rate without interfering with computation “ System-on-a-chip”integration means chip can decide how to: l l l Notify processor of I/O events Keep caches coherent Update memory 91 Richard Fromm.Serial I/O l Communication advances: fast (Gbps) serial I/O lines [YankHorowitz96]. IRAM tutorial. . 1998 98. ASP-DAC ‘ February 10.

ASP-DAC ‘ February 10. . IRAM tutorial. IRAM with serial lines provides high I/O bandwidth 92 Richard Fromm.Serial I/O and IRAM l How well will serial I/O work for IRAM? l l Serial lines provide high I/O bandwidth for I/O-intensive applications I/O bandwidth incrementally scalable by adding more lines l Number of pins required still lower than parallel bus l How to overcome limited memory capacity of single IRAM? l l SmartSIMM: collection of IRAMs (and optionally external DRAMs) Can leverage high-bandwidth I/O to compensate for limited memory l In addition to other strengths. 1998 98.

0 GB/s … Proc Xbar Proc Pro Mem Proc bridge s s c c s s i i bus bridge s c s i s c s i s c… s i 16 … … … … … … TPC-D (1TB) leader l SMP 64 CPUs. ASP-DAC ‘ February 10. 1998 98.348k DRAM $2.encl.4 GB/s … 2. . $983k CPUs $912k Cables. 603 disks Disks.Another Application: Decision Support (Conventional ) 4 address buses data crossbar switch Sun 10000 (Oracle 8): s Proc Xbar Proc Pro Mem Proc s bridge s s c c s s i i bus bridge s c s i s c s i s c… s i 1 12.I/O $139k Misc $65k HW total $6. 64GB dram.encl.6 GB/s 6.328k Boards. $2. IRAM tutorial.775k l 93 1 23 Richard Fromm.

“Intelligent Disk”: Scalable Decision Support
cross bar cross bar

… 75.0 GB/s

cross bar

1 IRAM/disk + shared nothing database
603 CPUs, 14GB dram, 603 disks Disks (market) $840k IRAM (@$150) $90k Disk encl., racks $150k Switches/cables $150k Misc $60k Subtotal $1,300k Markup 2X? ~$2,600k ~1/3 price, 2X-5X perf.
l

cross bar
IRAM IRAM

cross bar
IRAM IRAM


IRAM

… …


IRAM

6.0 GB/s …


IRAM

… …


IRAM

Richard Fromm, IRAM tutorial, ASP-DAC ‘ February 10, 1998 98,

94

Testing in DRAM
l Importance
l

of testing over time

Testing time affects time to qualification of new DRAM, time to First Customer Ship l Goal is to get 10% of market by being one of the first companies to FCS with good yield l Testing 10% to 15% of cost of early DRAM
l Built

In Self Test of memory: BiST v. External tester? Vector Processor 10X v. Scalar Processor? l System v. component may reduce testing cost
Richard Fromm, IRAM tutorial, ASP-DAC ‘ February 10, 1998 98, 114

Operation & Instruction Count: RISC v. Vector Processor
Spec92fp Program swim256 hydro2d nasa7 su2cor tomcatv wave5 mdljdp2 Operations (M) RISC Vector R / V 115 58 69 51 15 27 32 95 40 41 35 10 25 52 1.1x 1.4x 1.7x 1.4x 1.4x 1.1x 0.6x Instructions (M) RISC Vector R / V 115 58 69 51 15 27 32 0.8 0.8 2.2 1.8 1.3 7.2 15.8 142x 71x 31x 29x 11x 4x 2x

Vectors reduce ops by 1.2X, instructions by 20X!
from F. Ouintana, University of Barcelona
Richard Fromm, IRAM tutorial, ASP-DAC ‘ February 10, 1998 98, 115

. CAD agreement. 1998 98.Testing. and Demonstration (H1’ 00) Functional integrated circuit l First microprocessor ≈250M transistors! 117 Richard Fromm. ASP-DAC ‘ February 10. architecture defined Simulated design Tape-out l Phase l l Phase l l Phase l 4: Fabrication.V-IRAM-1 Tentative Plan l Phase l I: Feasibility stage (H1’ 98) 2: Design Stage (H2’ 98) 3: Layout & Verification (H2’ 99) Test chip. IRAM tutorial.

Sign up to vote on this title
UsefulNot useful