Professional Documents
Culture Documents
Cpu Arch
Cpu Arch
1)
Central Processing Unit Architecture
· Architecture overview
· Machine organization
– von Neumann
· Speeding up CPU operations
– multiple registers
– pipelining
– superscalar and VLIW
· CISC vs. RISC
(6.2)
Computer Architecture
· Major components of a computer
– Central Processing Unit (CPU)
– memory
– peripheral devices
· Architecture is concerned with
– internal structures of each
– interconnections
» speed and width
– relative speeds of components
· Want maximum execution speed
– Balance is often critical issue
(6.3)
Computer Architecture (continued)
· CPU
– performs arithmetic and logical operations
– synchronous operation
– may consider instruction set architecture
» how machine looks to a programmer
– detailed hardware design
(6.4)
Computer Architecture (continued)
· Memory
– stores programs and data
– organized as
» bit
» byte = 8 bits (smallest addressable location)
» word = 4 bytes (typically; machine dependent)
– instructions consist of operation codes and addresses
oprn addr 1
Program
Control Unit
0 8 20 28 39
op code address op code address
(6.7)
Simple Machine Organization (continued)
· ALU does arithmetic and logical comparisons
– AC = accumulator holds results
– MQ = memory-quotient holds second portion of long
results
– MBR = memory buffer register holds data while
operation executes
(6.8)
Simple Machine Organization (continued)
· Program control determines what computer does based on
instruction read from memory
– MAR = memory address register holds address of
memory cell to be read
– PC = program counter; address of next instruction to be
read
– IR = instruction register holds instruction being executed
– IBR holds right half of instruction read from memory
(6.9)
Simple Machine Organization (continued)
· Machine operates on fetch-execute cycle
· Fetch
– PC MAR
– read M(MAR) into MBR
– copy left and right instructions into IR and IBR
· Execute
– address part of IR MAR
– read M(MAR) into MBR
– execute opcode
(6.10)
Simple Machine Organization (continued)
(6.11)
Architecture Families
· Before mid-60’s, every new machine had a different
instruction set architecture
– programs from previous generation didn’t run on new
machine
– cost of replacing software became too large
· IBM System/360 created family concept
– single instruction set architecture
– wide range of price and performance with same software
· Performance improvements based on different detailed
implementations
– memory path width (1 byte to 8 bytes)
– faster, more complex CPU design
– greater I/O throughput and overlap
· “Software compatibility” now a major issue
– partially offset by high level language (HLL) software
(6.12)
Architecture Families
(6.13)
Multiple Register Machines
· Initially, machines had only a few registers
– 2 to 8 or 16 common
– registers more expensive than memory
· Most instructions operated between memory
locations
– results had to start from and end up in memory, so
fewer instructions
» although more complex
– means smaller programs and (supposedly) faster
execution
» fewer instructions and data to move between memory
and ALU
· But registers are much faster than memory
– 30 times faster
(6.14)
Multiple Register Machines (continued)
· Also, many operands are reused within a short time
– waste time loading operand again the next time it’s
needed
· Depending on mix of instructions and operand use,
having many registers may lead to less traffic to
memory and faster execution
· Most modern machines use a multiple register
architecture
– maximum number about 512, common number 32
integer, 32 floating point
(6.15)
Pipelining
· One way to speed up CPU is to increase clock rate
– limitations on how fast clock can run to complete
instruction
· Another way is to execute more than one
instruction at one time
(6.16)
Pipelining
· Pipelining breaks instruction execution down into
several stages
– put registers between stages to “buffer” data and
control
– execute one instruction
– as first starts second stage, execute second
instruction, etc.
– speedup same as number of stages as long as pipe is
full
(6.17)
Pipelining (continued)
· Consider an example with 6 stages
– FI = fetch instruction
– DI = decode instruction
– CO = calculate location of operand
– FO = fetch operand
– EI = execute instruction
– WO = write operand (store result)
(6.18)
Pipelining Example
clock
LD F10, LD F14, SB
16(R1) 24(R1) R1,R1,#4
8
LD LD AD F4,F0,F2 AD F8,F6,F2
F18,32(R1) F22,40(R1)
LD AD AD
F26,48(R1) F12,F10,F2 F16,F14,F2
(6.29)
Instruction Level Parallelism
· Success of superscalar and VLIW machines
depends on number of instructions that occur
together that can be issued in parallel
– no dependencies
– no branches
· Compilers can help create parallelism
· Speculation techniques try to overcome branch
problems
– assume branch is taken
– execute instructions but don’t let them store results
until status of branch is known
(6.30)
CISC vs. RISC
· CISC = Complex Instruction Set Computer
· RISC = Reduced Instruction Set Computer
(6.31)
CISC vs. RISC (continued)
· Historically, machines tend to add features over
time
– instruction opcodes
» IBM 70X, 70X0 series went from 24 opcodes to 185 in
10 years
» same time performance increased 30 times
– addressing modes
– special purpose registers
· Motivations are to
– improve efficiency, since complex instructions can be
implemented in hardware and execute faster
– make life easier for compiler writers
– support more complex higher-level languages
(6.32)
CISC vs. RISC
· Examination of actual code indicated many of these
features were not used
· RISC advocates proposed
– simple, limited instruction set
– large number of general purpose registers
» and mostly register operations
– optimized instruction pipeline
· Benefits should include
– faster execution of instructions commonly used
– faster design and implementation
(6.33)
CISC vs. RISC
· Comparing some architectures