You are on page 1of 108

Invitation to Computer Science

CHAPTER 5
KUAN-CHOU LAI

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 1


Objectives
• After studying this chapter, students will be able to
– Enumerate the characteristics of the Von Neumann
architecture
– Describe the components of a RAM system, including
how fetch and store operations work, and the use of
cache memory to speed up access time
– Explain why mass storage devices are important, and
how DASDs like hard drives or DVDs work
– Diagram the components of a typical ALU and illustrate
how the ALU data path operates

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 2


Objectives
– Describe the control unit’s Von Neumann cycle, and
explain how it implements a stored program
– List and explain the types of instructions and how they
are encoded
– Diagram the components of a typical Von Neumann
machine
– Show the sequence of steps in the fetch, decode, and
execute cycle to perform a typical instruction
– Describe non-Von Neumann parallel processing systems

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 3


Introduction
• Computer organization examines the computer as
– a collection of interacting “functional units”
– functional units may be built out of the circuits already
studied
• This chapter changes the level of abstraction
• Focus on functional units and computer organization
• A hierarchy of abstractions hides unneeded details
• Change focus from transistors, to gates, to circuits as
the basic unit

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 4


Introduction
• Higher level of abstraction assists in understanding by
reducing complexity
• Hierarchy of abstractions

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 5


Introduction
• Example of Abstraction

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 6


System Components
• Virtually every computer in use today is based on the
same theoretical model
• Von Neumann architecture (1946, John Von Neumann)

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 7


System Components
• Von Neumann architecture (1946, John Von Neumann)
– four functional units
• Arithmetic/Logic unit
• Memory
• Input / Output
• Control unit
– Sequential execution of instructions
• Instruction is fetched from memory to control unit →
decoded → executed
– Stored program concept
• Instructions to be executed are presented as binary values
and stored in memory
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 8
System Components
• Memory and Cache
– Information stored (in the binary number system) and
fetched from memory subsystem
– Memory
• Read Only Memory (ROM)
– The contents in locations in ROM cannot be
changed
• Random Access Memory (RAM)

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 9


System Components
• Random Access Memory (RAM)
– Organized into cells, each given a unique address
– Equal time to access any cell
– Cell values may be read and changed
– maps addresses to memory locations
– Inherent in the idea of being able to access each location is the
ability to change the contents of each location
– Cache memory keeps values currently in use in faster memory
to speed access times
– Cell size/memory width typically 8 bits (number of bits per
cell )
– Maximum memory size/address space is 2N, where N is length
of address
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 10
System Components
• Memory types
– ROM (Read Only Memory)
– RAM (Random Access Memory)
• SRAM (Static RAM)
6 transistor, no refresh time, faster, expensive
used in cache or video memory
• DRAM (Dynamic RAM)
1 transistor, need refresh, slower, cheaper
used in main memory
– SDRAM (synchronous DRAM)
clock rate is synchronous to processor’s external clock,
PC100 (100Mhz), PC133
– DDR4 SDRAM (Double Data Rate-Synchronous DRAM )
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 11
System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 12


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 13


System Components
– Memory address
• Unsigned binary number N long
• Address space is then 2N cells (0 ~ 2N-1)

• 32 bit computer → max. 4G address space


• Address / content

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 14


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 15


System Components
– Two basic memory operations
• Fetching
– Retrieve a value from memory
– Nondestructive fetch
• Storing
– Store a value into memory
– Destructive store
– Memory access time
• time required to fetch/store
• Modern RAM requires 5-10 nanoseconds

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 16


System Components
• Memory register
– Very fast memory location
– Given a name, not an address
– Serves some special purpose
– Modern computers have dozens or hundreds of registers
• Memory address register (MAR)
– holds memory address to access
• Memory data register (MDR)
– receives data from fetch, holds data to be stored
• Memory cells
– with decoders to select individual cells

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 17


System Components
• Memory system circuits
– Decoder
• converts MAR into signal to a specific memory cell
• One-dimensional versus two-dimensional memory
organization
– Fetch/store controller
• Like a traffic cop for MDR
• Takes in a signal that indicates fetch or store
• Routes data flow to/from memory cells and MDR

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 18


System Components
– Fetch operation
• The address of the desired memory cell is moved into
the MAR
• Fetch/store controller signals a “fetch,” accessing the
memory cell
• The value at the MAR’s location flows into the MDR
– Store operation
• The address of the cell where the value should go is
placed in the MAR
• The new value is placed in the MDR
• Fetch/store controller signals a “store,” copying the
MDR’s value into the desired cell
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 19
System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 20


System Components
• Operating procedure

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 21


System Components
• Decode the address in the MAR
– Memory unit translate the N-bit address store in the
MAR into the set of signals needed to access that one
specific memory cell
– Decode the address in the MAR using a decoder circuit
(Section 4.5, Fig. 4.29)

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 22


System Components
• Example
– N=4 → MAR contains 4 bits → 16 addressable cell →
4-to-16 decoder

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 23


System Components
When MAR = 0010 → output 0010 is ON →
enable the memory hardware of address 2 to
• copy the contents of location 2 to MDR
(if it is doing a fetch)
• load the contents from MDR
(if it is doing a store)
– The problem of previous Fig. is scalability
• E.g. N=32, a decoder needs 232 (4 billion) output lines
– Problem solving
• Two dimensional organization
• Row selection line + column selection line
– Using 2 decoders
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 24
System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 25


Using two decoders

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 26


System Components
• Example
– 228 cells(256MB), each cell 8 bits→2 billion bits
– Memory circuit of a single bit requires
• 3 gates (1 AND, 1 OR, 1 NOT) → 7 transistors
– 256MB → 6 billion gates / 14 billion transistors

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 27


System Components
• Cache Memory
– As computers become faster, the processor was sitting
idle waiting for data or instructions to arrive
– Processors were executing instructions so quickly that
memory access was becoming a bottleneck
• Memory access is much slower than processing time

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 28


System Components
– Faster memory is too expensive to use for all memory
cells
– Locality principle
• Values close to recently-accessed memory are more
likely to be accessed
• Load neighbors into cache and keep recent there
– Once a value is used, it is likely to be used again
– Small size, fast memory just for values currently in use
speeds computing time → CACHE
• 5 – 10 times faster than RAM
• Two-level memory hierarchy

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 29


System Components
• Once a processor needs a piece of information

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 30


System Components
– Example
• RAM’s access time = 15 ns
• Cache’s access time = 5 ns
• Cache hit ratio = 70%
– Cache hit rate: percentage of times values are
found in cache
→ average access time = (0.7*5)+0.3*(5+15) = 9.5 ns
• 37% reduction from 15 ns to 9.5 ns

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 31


System Components
• Input / Output and Mass Storage
– I/O units connects the processor to the outside world:
• Humans: keyboard, monitor, etc.
• Data storage: hard drive, DVD, flash drive
• Other computers: network
– Volatile / Nonvolatile
• Volatile memory
Information disappears when the power is turned off
• Nonvolatile archival storage
Information does not disappear when the power is
turned off (for example, Disks, tapes)
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 32
System Components
– Two types of Input/Output devices
• Human-readable form
– Keyboard, screens, printers
• Machine-readable form (mass storage systems)
– Floppy disks, hard disks CDROM, DVD

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 33


System Components
• Random access memory

• Mass storage systems


– Direct access storage devices (DASDs)
Uses its own addressing scheme to access data

– Sequential access storage devices (SASDs)

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 34


System Components
• Direct access storage devices (DASDs)
– The time needed to access that information depends on
its physical location and the current state of the device
– Floppy disks, hard disks, CDROM, DVD
– Tracks
• concentric rings around disk surface
– Sectors
• A fixed number of sectors in a concentric circle on the
surface of the disk
• fixed size segments in tracks
• unit of retrieval, contains an address and a data block

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 35


System Components
• A single read/write head moves in or out to be positioned
over any track on the disk surface

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 36


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 37


System Components
• Access time = seek time + latency + transfer time
– Seek time
the time needed to position the r/w head over the
correct track
– Latency
the time for the beginning of the desired sector to
rotate under the r/w head
– Transfer time
the time for the entire sector to pass under the r/w
head and have its contents read into or written
from memory

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 38


System Components
• Example

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 39


System Components

• Three cases

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 40


System Components
• Other non-disk DASDs
– flash memory, optical

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 41


Disk Scheduling
• File systems must be accessed in an efficient manner
• As a computer deals with multiple processes over a
period of time, a list of requests to access the disk
builds up
• Disk scheduling
– The technique that the operating system uses to
determine which requests to satisfy first
• I/O is the slowest aspect of a general computer system
– Dominated by Seek time / latency time

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 42


Disk Scheduling
• E.g. 26, 49, 91, 22, 61, 7, 62, 33, 35
• First-Come, First-Served
– Requests are serviced in the order they arrive, without
regard to the current position of the heads
• Shortest-seek-time-first (SSTF)
– Disk heads are moved the minimum amount possible to
satisfy a pending request
– 26, 22, 33, 35, 49, 61, 62, 91, 7
– starvation
• Scan
– Disk heads continuously move in and out servicing
requests as they are encountered
– 26, 22, 7, 33, 35, 49, 61, 62, 91
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 43
Disk Scheduling
• SCAN Disk Scheduling works like an elevator
– An elevator is designed to visit floors that have people
waiting. In general, an elevator moves from one extreme
to the other (say, the top of the building to the bottom),
servicing requests as appropriate.
– The SCAN disk-scheduling algorithm works in a similar
way, except instead of moving up and down, the
read/write heads move in toward the spindle, then out
toward the platter edge, then back toward the spindle,
and so forth

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 44


System Components
• Sequential access storage devices (SASDs)
– Addresses no longer exists
– Search data sequentially
– Stores data sequentially
– Used for backup storage these days

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 45


System Components
• DASDs and SASDs are orders of magnitude slower than
RAM (microseconds or milliseconds)
– I/O is a bottleneck
– Solution: using I/O controller
• I/O Controller manages data transfer with slow I/O
devices, freeing processor to do other work
• Controller sends an interrupt signal to processor
when I/O task is done
– Processor sends request and data, then goes on
with its work
– I/O controller interrupts processor when request
is complete
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 46
System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 47


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 48


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 49


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 50


System Components
• Arithmetic/Logic Unit (ALU)
– Actual computations are performed
– Primitive operation circuits
• Arithmetic ( +, -, *, / )
• Comparison (CE, etc.)
• Logic (AND, OR, NOT)
– ALU + Control unit → processor
– ALU is made up of three parts
• Registers (super-fast, dedicated memory connected to
circuits)
• Interconnections between components
• ALU circuitry
Together with these components are data path
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 51
System Components
• Data path
– how information flows in ALU
• from registers to circuits
• from circuits back to registers

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 52


System Components
– Register
• Storage cell that holds the operands of an arithmetic
operation and that, when the operation is complete,
holds its results
• Minor differences

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 53


System Components
• Typical ALU has 16, 32, 64 registers to hold
temporary results
– E.g., (a/b)*(c-d)
– Storing temporary result of (a/b) in RAM is
slower than storing that in registers

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 54


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 55


System Components
– ALU process
• Values for operations copied into ALU’s input
register locations
• All circuits compute results for those inputs
• Multiplexor selects the one desired result from all
values
• Result value copied to desired result register

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 56


System Components
– Integrated +, - , ==, *, >, <, …, functions
• Multiplexor control circuit

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 57


System Components
• First, move data from RAM
to registers
• ALU compute
• Store temp. results to registers
• Write back to RAM

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 58


System Components
• How to choose which operation to perform?
– Option 1: decoder signals one circuit to run
– Option 2: run all circuits, multiplexor selects one output
from all circuits
• In practice, option 2 is usually chosen

• Information flow:
– Data comes in from outside ALU to registers
– Signal comes to multiplexor, which operation
– Result goes back to register, and then to outside

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 59


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 60


System Components
• Overall ALU organization

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 61


System Components
• Control Unit
– Von Neumann architecture
• The idea of a stored program
Programs are encoded in binary and stored in
computer’s memory
a sequence of machine language instructions stored as
binary values in memory
– Manages stored program execution

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 62


System Components
– Control unit
• fetches instructions from memory, decodes them, and
executes them
– Task of Control unit
• Fetch
– from memory the next instruction to be executed
• Decode
– determine what is to be done
• Execute
– issue appropriate command to ALU, memory, and
I/O controllers
repeat over and over until reach HALT, STOP, QUIT
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 63
System Components
• Machine Language Instructions
– Can be decoded and executed by control unit
• Instructions encoded:
– Parts of instructions
• Operation code (op code)
– tells which operation
– Unique unsigned-integer code assigned to each
machine language operation
• Address field(s)
– tell which memory addresses/registers to operate
on

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 64


System Components
– Example
op code for ADD is 00001001
memory addresses of X and Y are 99, 100 (in decimal
format)

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 65


System Components
• Machine language:
– Binary strings that encode instructions
– Instructions can be carried out by hardware
– Sequences of instructions encode algorithms
• Instruction set:
– The set of all operations that can be executed by a
processor, which implemented by a particular chip
– Each kind of processor speaks a different language
• RISC (reduced instruction set computer)
• CISC (complex instruction set computer)

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 66


System Components
• RISC (reduced instruction set computer)
– small instruction sets (as few as 30-50 instructions)
– each instruction highly optimized
– easy to design hardware, minimizes the amount of
hardware circuitry (gates and transistors) needed
to build a processor
– Extra space on the chip to be used to optimize the
speed of the instructions and allow them to
execute very quickly
– Require more instructions to solve a problem
– However, each instruction runs faster so the
overall running time is less
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 67
System Components
• CISC (complex instruction set computer)
– Common in 1970, 1980, 1990, 2000
– large instruction set, include a much larger
number (300-500) of very powerful instructions in
the instruction set
– single instruction can do a lot of work
– The CISC instruction set provides all the features
you will need when you write programs
– complex to design hardware
» More complex, more expensive, more difficult
to build

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 68


System Components
• Modern hardware: some RISC and some CISC
• 4 basic classes of operations of a machine language
– Data transfer (move data from memory to register)
– Arithmetic/logic (add, AND, …)
– Compares (compare two values)
– Branches
• change to a non-sequential instruction
• Branching allows for conditional and loop forms
• e.g., JUMPLT a
If previous comparison of A and B found A < B,
then jump to instruction at address a
High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 69
System Components
– Data transfer
• Move values to and from memory and registers

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 70


System Components
– Arithmetic/logic
• Perform ALU operations that produce numeric
values, e.g., + - * / …

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 71


System Components
– Compares
• Compare two values and set an indicator in the
register
• Condition codes (status register or condition register)
– Special set of bits in the processor

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 72


System Components
– Branches
• Jump to a new memory address to continue
processing (not sequential execution)
• Branch instructions alter the normal sequential flow
of control depending on the condition codes

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 73


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 74


System Components
• Control Unit Registers And Circuits
– Parts of control unit
• Links to other subsystems
• Instruction decoder circuit
• Two special registers:
– Program Counter (PC)
» Stores the memory address of the next
instruction to be executed
– Instruction Register (IR)
» holds encoding of current instruction

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 75


System Components
– Instruction decoder circuit
• Decodes op code of instruction, and signals helper
circuits, one per instruction
– Helpers send addresses to proper circuits
– Helpers signal ALU, I/O controller, memory

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 76


System Components
• Control Unit Registers and Circuits

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 77


System Components
• Example

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 78


System Components
– When OP code = 000

– When OP code = 001

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 79


System Components
• Von Neumann Architecture
– Four components
• Arithmetic/Logic unit
• Memory
• Input/Output
• Control unit

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 80


System Components
– Subsystems connected by a bus
• Bus
– wires that permit data transfer among them
– At this level, ignore the details of circuits that perform
these tasks
• Abstraction!
– Computer repeats fetch-decode-execute cycle indefinitely
– The repetition of the fetch-decode-execute phase is called
the Von Neumann cycle

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 81


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 82


System Components
• Combine previous pieces: Von Neumann machine
• Fetch-Decode-Execute cycle
– machine repeats until HALT instruction or error
– also called Von Neumann cycle
• Fetch phase
– get next instruction into memory
• Decode phase
– instruction decoder gets op code
• Execute phase
– different for each instruction

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 83


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 84


System Components
• Notation for computer’s behavior:

CON(A) Contents of memory cell A


A -> B Send value in register A to register B
(special registers: PC, MAR, MDR, IR, ALU,
R, GT, EQ, LT, +1)
FETCH Initiate a memory fetch operation
STORE Initiate a memory store operation
ADD Instruct the ALU to select the output of the
adder circuit
SUBTRACT Instruct the ALU to select the output of the
subtract circuit

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 85


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 86


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 87


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 88


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 89


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 90


System Components
LOAD X meaning CON(X) -> R
1. IRaddr -> MAR Send address X to MAR
2. FETCH Initiate Fetch, data to MDR
3. MDR -> R Move data in MDR to R

STORE X meaning R -> CON(X)


1. IRaddr -> MAR Send address X to MAR
2. R -> MDR Send data in R to MDR
3. STORE Initiate store of MDR to X

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 91


System Components
ADD X meaning R + CON(X) -> R
1. IRaddr -> MAR Send address X to MAR
2. FETCH Initiate Fetch, data to MDR
3. MDR -> ALU Send data in MDR to ALU
4. R -> ALU Send data in R to ALU
5. ADD Select ADD circuit as result
6. ALU -> R Copy selected result to R

JUMP X meaning get next instruction from X


1. IRaddr -> PC Send address X to PC

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 92


System Components
COMPARE X meaning:
if CON(X) > R then GT = 1 else 0
if CON(X) = R then EQ = 1 else 0
if CON(X) < R then LT = 1 else 0
1. IRaddr -> MAR Send address X to MAR
2. FETCH Initiate Fetch, data to MDR
3. MDR -> ALU Send data in MDR to ALU
4. R -> ALU Send data in R to ALU
5. SUBTRACT Evaluate CON(X) – R
Sets EQ, GT, and LT

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 93


System Components
JUMPGT X meaning:
if GT = 1 then jump to X
else continue to next instruction

1. IF GT = 1 THEN IRaddr -> PC

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 94


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 95


System Components
• Non-Von Neumann Architectures
– MIPS (million instructions per second)
– Problems to solve are always larger
– Computer chip speeds no longer increase exponentially
– Reducing size puts gates closer together, faster
• Speed of light pertains to signals through wire
• Cannot put gates much closer together
• Heat production increases too fast
– Von Neumann bottleneck (Physical limitations on speed)
• inability of sequential machines to handle larger
problems (sequential one instruction at a time)
– Parallel computing architectures can provide
improvements: multiple operations occur at the same time

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 96


System Components
Electronic signal cannot
travel faster than the speed
of light, 299,792,458 meters
per second

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 97


System Components
• Non-Von Neumann architectures:
– Other ways to organize computers
– Most are experimental/theoretical, EXCEPT parallel
processing
• Parallel processing:
– Use many processing units operating at the same time
– Supercomputers (in the past)
– Desktop multi-core machines (in the present)
– “The cloud” (in the future)

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 98


System Components
– Parallel processing
• SIMD / MIMD (SISD / MISD)
– SIMD architecture
• Single instruction/Multiple data
• Multiple processors running in parallel
• Processor contains one control unit, but many ALUs
• Each ALU operates on its own data
• All ALUs execute same operation at one time
• Suitable for “vector” operations

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 99


System Components
• A SIMD Parallel Processing System

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 100


System Components
– MIMD architecture
• Multiple instruction/Multiple data
• Each processor performs its own operations on its
own data
• Multiple processors running in parallel
• Communication over interconnection network
• Off-the-shelf processors work fine
• Communication costs can slow performance
• Scalability
– It is possible to match the number of processors to
the size of the problem
• Key issue: Parallel algorithm

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 101


System Components
• Model of MIMD Parallel Processing

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 102


System Components
• Varieties of MIMD systems
– Special-purpose systems
• newer supercomputers
– Cluster computing
• standard machines communicating over LAN or WAN
– Grid computing (http://www2.twgrid.org/gridcafe/ )
• machines of varying power, over large distances/Internet
• Example: SETI project
– Cloud Computing
• http://en.wikipedia.org/wiki/Cloud_computing

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 103


System Components
• Hot research are: parallel algorithms
– Need to take advantage of all this processing power

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 104


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 105


System Components

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 106


Summary
• We must abstract in order to manage system
complexity
• Von Neumann architecture is standard for modern
computing
• Von Neumann machines have memory, I/O, ALU, and
control unit; programs are stored in memory;
execution is sequential unless program says other
• Memory is organized into addressable cells; data is
fetched and stored based on MAR and MDR; uses
decoder and fetch/store controller

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 107


Summary
• Mass data storage is nonvolatile; disks store and fetch
sectors of data stored in tracks
• I/O is slow, needs dedicated controller to free CPU
• ALU performs computations, moving data to/from
dedicated registers
• Control unit fetches, decodes, and executes
instructions; instructions are written in machine
language
• Parallel processing architectures can perform multiple
instructions at one time

High Performance Computing Laboratory@NTCU © Lai, Kuan-Chou 108

You might also like