Professional Documents
Culture Documents
Harsh Sharangpani
Principal Engineer and IA-64 Microarchitecture Manager
Intel Corporation
IA-32
FPU
Control
Machine Characteristics
Frequency 800 MHz
Transistor Count 25.4M CPU; 295M L3
Process 0.18u CMOS, 6 metal layer
Package Organic Land Grid Array
Machine Width 6 insts/clock (4 ALU/MM, 2 Ld/St, 2 FP, 3 Br)
Registers 14 ported 128 GR & 128 FR; 64 Predicates
Speculation 32 entry ALAT, Exception Deferral
Branch Prediction Multilevel 4-stage Prediction Hierarchy
FP Compute Bandwidth 3.2 GFlops (DP/EP); 6.4 GFlops (SP)
Memory -> FP Bandwidth 4 DP (8 SP) operands/clock
Virtual Memory Support 64 entry ITLB, 32/96 2-level DTLB, VHPT
L2/L1 Cache Dual ported 96K Unified & 16KD; 16KI
L2/L1 Latency 6 / 2 clocks
L3 Cache 4MB, 4-way s.a., BW of 12.8 GB/sec;
System Bus 2.1 GB/sec; 4-way Glueless MP
Scalable to large (512+ proc) systems
®
128 GR &
Instruction 128 FR, 2 FMACs Three
Cache Register (4 for SSE) levels of
& Branch Remap cache:
Predictors &
2 LD/ST units L1, L2, L3
Stack
Engine
32 entry ALAT
WORD-LINE
EXPAND RENAME DECODE REGISTER READ
IPG FET ROT EXP REN WLD REG EXE DET WRB
buffer
Return Stack Buffer
Bypass
128 Entry Integer
Muxes
ALUs
Src
Register File
8R / 6W
Bypass
Predicate Muxes
Register To Branch Execution (x3)
File Read
To Retirement (x6)
I-Cmps
F-Cmps
®
Exception
Address TLB ALAT Check
Logic
& Exception
Memory Spec. Ld. Status (NaT)
Subsystem
Check Instruction
®
Efficient elimination of memory bottlenecks
12 Hot Chips, 15 August 2000
Itanium™ Processor Core
even
4Mbyte 128 entry
L2 82-bit
L3
Cache odd RF
Cache
2 DP 4 DP
Ops/clk Ops/clk
(2 x Fld-pair) 2 x 82-bit results
®
L3 Cache
Branch & Predicate
128 Integer Registers 128 FP Registers
Scoreboard, Predicate
Registers
NaTs,, Exceptions
ALAT
Units L1
,NaTs
ECC ECC
Bus Controller ECC
ECC