You are on page 1of 42

Chapter 8

CPU and Memory:


Design, Implementation, and Enhancement

The Architecture of Computer


Hardware and Systems Software:
An Information Technology Approach
3rd Edition, Irv Englander
John Wiley and Sons 2003
Wilson Wong, Bentley College
Linda Senne, Bentley College

CPU Architecture Overview

CISC Complex Instruction Set Computer


RISC Reduced Instruction Set Computer
CISC vs. RISC Comparisons
VLIW Very Long Instruction Word
EPIC Explicitly Parallel Instruction
Computer

Chapter 8: CPU and Memory:

8-2

CISC Architecture
Examples
Intel x86, IBM Z-Series Mainframes, older
CPU architectures

Characteristics
Few general purpose registers
Many addressing modes
Large number of specialized, complex
instructions
Instructions are of varying sizes
Chapter 8: CPU and Memory:

8-3

Limitations of CISC Architecture


Complex instructions are infrequently
used by programmers and compilers
Memory references, loads and stores,
are slow and account for a significant
fraction of all instructions
Procedure and function calls are a
major bottleneck
Passing arguments
Storing and retrieving values in registers
Chapter 8: CPU and Memory:

8-4

RISC Features
Examples
Power PC, Sun Sparc, Motorola 68000

Limited and simple instruction set


Fixed length, fixed format instruction words
Enable pipelining, parallel fetches and executions

Limited addressing modes


Reduce complicated hardware

Register-oriented instruction set


Reduce memory accesses

Large bank of registers


Reduce memory accesses
Efficient procedure calls

Chapter 8: CPU and Memory:

8-5

CISC vs. RISC Processing

Chapter 8: CPU and Memory:

8-6

Circular Register Buffer

Chapter 8: CPU and Memory:

8-7

Circular Register Buffer


- After Procedure Call

Chapter 8: CPU and Memory:

8-8

CISC vs. RISC Performance


Comparison
RISC Simpler instructions
more instructions
more memory accesses
RISC more bus traffic and
increased cache memory misses
More registers would improve CISC
performance but no space available for them
Modern CISC and RISC architectures are
becoming similar
Chapter 8: CPU and Memory:

8-9

VLIW Architecture
Transmeta Crusoe CPU
128-bit instruction bundle = molecule
4 32-bit atoms (atom = instruction)
Parallel processing of 4 instructions

64 general purpose registers


Code morphing layer
Translates instructions written for other CPUs into
molecules
Instructions are not written directly for the Crusoe
CPU
Chapter 8: CPU and Memory:

8-10

EPIC Architecture
Intel Itanium CPU
128-bit instruction bundle
3 41-bit instructions
5 bits to identify type of instructions in bundle

128 64-bit general purpose registers


128 82-bit floating point registers
Intel X86 instruction set included
Programmers and compilers follow guidelines
to ensure parallel execution of instructions

Chapter 8: CPU and Memory:

8-11

Paging
Managed by the operating system
Built into the hardware
Independent of application

Chapter 8: CPU and Memory:

8-12

Logical vs. Physical Addresses


Logical addresses are relative
locations of data, instructions and
branch target and are separate from
physical addresses
Logical addresses mapped to physical
addresses
Physical addresses do not need to be
consecutive
Chapter 8: CPU and Memory:

8-13

Logical vs. Physical Address

Chapter 8: CPU and Memory:

8-14

Page Address Layout

Chapter 8: CPU and Memory:

8-15

Page Translation Process

Chapter 8: CPU and Memory:

8-16

Logical vs. Physical Addresses


Spazio fisico non deve essere uguale a
quello logico!!!
Piu grande spazio logico rispetto a
quello fisico!
Pagine caricate da disco in memoria
quando servono ..
Di piu nel cap 15

Chapter 8: CPU and Memory:

8-17

Memory Enhancements
Memory is slow compared to CPU processing
speeds!
2Ghz CPU = 1 cycle in of a billionth of a second
70ns DRAM = 1 access in 70 millionth of a second

Methods to improvement memory accesses


Wide Path Memory Access

Retrieve multiple bytes instead of 1 byte at a time

Memory Interleaving

Partition memory into subsections, each with its own


address register and data register

Cache Memory
Chapter 8: CPU and Memory:

8-18

Memory Interleaving

Chapter 8: CPU and Memory:

8-19

Why Cache?
Accesso in memoria spreca cicli di
clock!
Accesso cache 2 nanosecondi: 1 access in
2 millionth of a second

Guadagno velocita
Es:
Trovare tempo accesso alle memorio
Dram, ddram sdram, sram,
Chapter 8: CPU and Memory:

8-20

Cache Memory
Blocks: 8 or 16 bytes
Tags: location in main memory (memoria
fisica!)
Cache controller
hardware that checks tags

Cache Line
Unit of transfer between storage and cache memory

Hit Ratio: ratio of hits out of total requests


Synchronizing cache and memory
Write through
Write back
Chapter 8: CPU and Memory:

8-21

Step-by-Step Use of Cache

Chapter 8: CPU and Memory:

8-22

Step-by-Step Use of Cache

Chapter 8: CPU and Memory:

8-23

Performance Advantages
Hit ratios of 90% common
50%+ improved execution speed
Locality of reference is why caching works
Most memory references confined to small region of
memory at any given time
Well-written program in small loop, procedure or
function
Data likely in array
Variables stored together
Chapter 8: CPU and Memory:

8-24

Two-level Caches
Why do the sizes of the caches have to be
different?

Chapter 8: CPU and Memory:

8-25

Cache sui multiprocessor


Snoopy bus

Chapter 8: CPU and Memory:

8-26

E la cache del disco?


Cache per accesso al disco memoria
tra la ram (memoria principale) e il
disco gestita dal device.
Controller-cache

Chapter 8: CPU and Memory:

8-27

Cache vs. Virtual Memory


Cache speeds up memory access
Virtual memory increases amount of
perceived storage
independence from the configuration and
capacity of the memory system
low cost per bit

Chapter 8: CPU and Memory:

8-28

Modern CPU Processing Methods

Timing Issues
Separate Fetch/Execute Units
Pipelining
Scalar Processing
Superscalar Processing

Chapter 8: CPU and Memory:

8-29

Timing Issues

Computer clock used for timing purposes


MHz million steps per second
GHz billion steps per second
Instructions can (and often) take more than one step
Data word width can require multiple steps

Chapter 8: CPU and Memory:

8-30

Separate Fetch-Execute Units


Fetch Unit
Instruction fetch unit
Instruction decode unit

Determine opcode
Identify type of instruction and operands

Several instructions are fetched in parallel and


held in a buffer until decoded and executed
IP Instruction Pointer register

Execute Unit
Receives instructions from the decode unit
Appropriate execution unit services the instruction
Chapter 8: CPU and Memory:

8-31

Alternative CPU Organization

Chapter 8: CPU and Memory:

8-32

Instruction Pipelining
Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions
Only one instruction is being executed to completion at a time
Scalar processing
Average instruction execution is approximately equal to the clock speed of the CPU

Problems from stalling


Instructions have different numbers of steps

Problems from branching

Chapter 8: CPU and Memory:

8-33

Branch Problem Solutions


Separate pipelines for both possibilities
Probabilistic approach
Requiring the following instruction to not
be dependent on the branch
Instruction Reordering (superscalar
processing)

Chapter 8: CPU and Memory:

8-34

Pipelining Example

Chapter 8: CPU and Memory:

8-35

Superscalar Processing
Process more than one instruction per
clock cycle
Separate fetch and execute cycles as
much as possible
Buffers for fetch and decode phases
Parallel execution units

Chapter 8: CPU and Memory:

8-36

Superscalar CPU Block Diagram

Chapter 8: CPU and Memory:

8-37

Scalar vs. Superscalar Processing

Chapter 8: CPU and Memory:

8-38

Superscalar Issues
Out-of-order processing dependencies
(hazards)
Data dependencies
Branch (flow) dependencies and speculative
execution
Parallel speculative execution or branch
prediction
Branch History Table
Register access conflicts
Logical registers
Chapter 8: CPU and Memory:

8-39

Hardware Implementation
Hardware operations are
implemented by logic gates
Advantages
Speed
RISC designs are simple and typically
implemented in hardware

Chapter 8: CPU and Memory:

8-40

Microprogrammed Implementation
Microcode are tiny programs stored in
ROM that replace CPU instructions
Advantages
More flexible
Easier to implement complex instructions
Can emulate other CPUs

Disadvantage
Requires more clock cycles
Chapter 8: CPU and Memory:

8-41

Copyright 2003 John Wiley & Sons


All rights reserved. Reproduction or translation of this
work beyond that permitted in Section 117 of the 1976
United States Copyright Act without express permission
of the copyright owner is unlawful. Request for further
information should be addressed to the permissions
Department, John Wiley & Songs, Inc. The purchaser
may make back-up copies for his/her own use only and
not for distribution or resale. The Publisher assumes no
responsibility for errors, omissions, or damages caused
by the use of these programs or from the use of the
information contained herein.

Chapter 8: CPU and Memory:

8-42