You are on page 1of 29

Course Objectives

Review development of computer systems


Examine the operation of the major
building blocks of a computer system
Investigate performance enhancements for
EE 4504 each component
Computer Organization

Dr. N. J. Davis

Spring 1997

EE 4504 Section 1 1 EE 4504 Section 1 2

1
Objectives

Review historical development of


computer systems
Identify design levels for computer system
development
EE 4504 Discuss descriptive and design tools for
Computer Organization each design level
Compare and contrast various performance
metrics for computer systems
Section 1
Introduction to Computer Systems

EE 4504 Section 1 3 EE 4504 Section 1 4

2
Computer Architecture

Baer: “The design of the integrated system Common themes:


which provides a useful tool to the – Design / structure
programmer” – Art
Hayes: “The study of the structure, – System
behavior and design of computers” – Tool for programmer and application
– Interface
Abd-Alla: “The design of the system
specification at a general or subsystem Thus, computer architecture refers to those
level” attributes of the system that are visible to a
Foster: “The art of designing a machine programmer -- those attributes that have a
that will be a pleasure to work with” direct impact on the execution of a
program
Hennessy and Patterson: “The interface
– Instruction sets
between the hardware and the lowest level
– Data representations
software”
– Addressing
– I/O

EE 4504 Section 1 5 EE 4504 Section 1 6

3
Computer Organization What is a computer?

Synonymous with “architecture” in many Historically, a computer was a job title, not
uses and textbooks a piece of equipment!
We will use it to mean the underlying Requirements of a computer:
implementation of the architecture – Process data
Transparent to the programmer – Store data
– Move data between the computer and the
An architecture can have a number of
outside world
organizational implementations
– Control the operation of the above
– Control signals
– Technologies
– Device implementations

Figure 1.1 Functional view of a computer


EE 4504 Section 1 7 EE 4504 Section 1 8

4
History of Computers

Mechanical Era (1600s-1940s) – Charles Babbage (1822)


– Wilhelm Schickhard (1623) » Mathematician
» Astronomer and mathematician » “Father of modern computer”
» Automatically add, subtract, multiply, and » Wanted more accuracy in calculations
divide » Difference engine
– Blaise Pascal (1642) Government / science agreement
» Mathematician Automatic computation of math tables
» Mass produced first working machine (50 » Analytic engine
copies) Perform any math operation
» Could only add and subtract Punch cards
» Maintenance and labor problems Modern structure: I/O, storage, ALU
– Gottfried Liebniz (1673) Add in 1 second, multiply in 1 minute
» Mathematician and inventor » Both engines plagued by mechanical
» Improved on Pascal’s machine problems
» Add, subtract, multiply, and divide – George Boole (1847)
» Mathematical analysis of logic
» Investigation of laws of thought

EE 4504 Section 1 9 EE 4504 Section 1 10

5
– Herman Hollerith (1889)
» Modern day punched card machine Mechanical era summary
» Formed Tabulating Machine Company – Mechanical computers were designed to reduce
(became IBM) the time required for calculations and increase
accuracy of the results
» 1880 census took 5 years to tabulate
– Two drawbacks
» Tabulation estimates
» Speed of operation limited by the inertia of
1890: 7.5 years
moving parts (gears and pulleys)
1900: 10+ years
» Cumbersome, unreliable, and expensive
» Hollerith’s tabulating machine reduced the
7.5 year estimate to 2 months
– Konrad Zuse (1938)
» Built first working mechanical computer,
the Z1
» Binary machine
» German government decided not to pursue
development -- W.W.II already started
– Howard Aiken (1943)
» Designed the Harvard Mark I
» Implementation of Babbage’s machine
» Built by IBM
EE 4504 Section 1 11 EE 4504 Section 1 12

6
The Electronic Era

Generation 1 (1945 - 1958) – IAS (Institute for Advanced Studies)


– ENIAC » von Neumann and Goldstine
» Developed for calculating artillery firing » Took idea of ENIAC and developed
tables concept of storing a program in the memory
» Designed by Mauchly and Echert of the » This architecture came to be known as the
University of Pennsylvania “von Neumann” architecture and has been
» Generally regarded as the first electronic the basis for virtually every machine
computer designed since then
Colossus probably the first, but was » Features
classified until recently Data and instructions (programs) are
» BIG! stored in a single read-write memory
18,000 tubes Memory contents are addressable by
location, regardless of the content itself
70,000 resistors
Sequential execution
10,000 capacitors
– Lots of initial and long-term fighting over
6,000 switches patents, rights, credits, firsts, etc.
30 x 50 feet
140 kW of power
» Decimal number system used
» Programmed by manually setting switches
EE 4504 Section 1 13 EE 4504 Section 1 14

7
Summary of Generations
Generation 2 (1958 - 1964) Generation Example Hardware Software Performance
Machines
– Technology change
– Transistors
1 ENIAC, Vacuum tubes, Machine code, 2 Kb memory,
– High level languages UNIVAC I, magnetic stored 10 KIPS
IBM 700 drums programs
– Floating point arithmetic
Generation 3 (1964 - 1974) 2 IBM 7094 Transistors, High level 32 Kb
core memory languages memory, 200
– Introduction of integrated circuits KIPS

– Semiconductor memory 3 IBM 360 370, ICs, Timesharing, 2 Mb memory,


– Microprogramming PDP 11 semiconductor graphics, 5 MIPS
memory, structured
– Multiprogramming microprocesso programming
rs
Generation 4 (1974 - present) 4 IBM 3090, VLSI, Packaged 8 Mb memory,
Cray XMP, networkes, programs, 30 MIPS
– Large scale integration / VLSI IBM PC optical disks object-oriented
languages,
– Single board computers expert systems
5 Sun Sparc, ULSI, GaAs, Parallel 64 Mb
Generation 5 (? - ?) Intel Paragon parallel languages memory, 10
– VLSI / ULSI systems symbolic GFLOPS
processing, AI
– Computer communications networks
– Artificial intelligence
– Massively parallel machines
EE 4504 Section 1 15 EE 4504 Section 1 16

8
Applications that Drive Computer
Performance Trends in Computer Usage

Weather forecasting 4 levels of ascending sophistication


Oceanography
Seismic/petroleum exploration
Medical research and diagnosis
Aerodynamics and structure analysis
nuclear physics
Artificial intelligence
Military/defense
Socio-economics

Computer processing spaces [HwB84]


EE 4504 Section 1 17 EE 4504 Section 1 18

9
Four Levels of Computer
Description

Global system structure Register level


– Overall system structure is defined – Specify internal operation of processor-level
– Major components identified components at the word level
» Processors – Primitives:
» Control modules » Registers
» Memory modules » Counters
» Interconnection structure » Memories
– Mostly a static description -- “black box” » ALUs
approach » Clocks
Processor level » Combinational logic
– Architectural Features specified Gate level
» Interfaces – Specify operations at the individual bit level
» Instruction sets – Gates are primitive elements
» Data Representation – Very cumbersome to do manually (logic
– More detailed individual component minimization, etc.)
specification

EE 4504 Section 1 19 EE 4504 Section 1 20

10
Global Descriptive Tools

Flynn’s Taxonomy
– The most universally excepted method of
classifying computer systems
– Relies on a block diagram approach
– Published in the Proceedings of the IEEE in
1966 Instruction Data
Stream Stream
– Any computer can be placed in one of 4 broad Control Processor
categories Unit
» SISD: Single instruction stream, single
data stream
» SIMD: Single instruction stream, multiple
data streams
» MIMD: Multiple instruction streams,
multiple data streams
» MISD: Multiple instruction streams, single
data stream
SISD system architecture of [Fly66]

EE 4504 Section 1 21 EE 4504 Section 1 22

11
Data
Stream 0 Instruction Data
Processor 0 Stream 0
Control Stream 0
Processor 0
Unit 0
Data
Instruction Data
Stream 1
Stream 1 Stream 1
Processor 1 Control Processor 1
Unit 1
Instruction
Control Stream
Unit

Data Instruction Data


Stream N-1 Stream N-1 Stream N-1
Control Processor
Processor
Unit N-1 N-1
N-1

SISD system architecture of [Fly66]


MIMD system architecture of [Fly66]
EE 4504 Section 1 23 EE 4504 Section 1 24

12
– Advantages of Flynn
Instruction
» Universally accepted
Control Stream 0
Processor 0 » Compact Notation
Unit 0
» Easy to classify a system (?)
Instruction – Disadvantages of Flynn
Control Stream 1 » Very coarse-grain differentiation among
Processor 1
Unit 1 machine systems
Data » Comparison of different systems is limited
Stream » Interconnections, I/O, memory not
considered in the scheme
Other global level tools
– Tendency to rely on block diagrams and very
Instruction coarse performance measures
Stream N-1 » Processor-memory-switch notation of
Control Processor
Unit N-1 [BeN71] uses block diagrams with 7 basic
N-1
component types

MISD system architecture of [Fly66]


EE 4504 Section 1 25 EE 4504 Section 1 26

13
Processor Level Descriptive Tools

At this level, the operation of the global


level components and their interfaces must
be defined
Items to be specified include:
– Data formats
» Word lengths
» Instruction formats
» Data representation
– Memory accessing
– Instruction set and its operation
Specification takes the appearance of a
software program
– Permits direct simulation of the machine’s
operation
Typical tool is ISP -- Instruction Set
Processor

PMS description of IBM 370/155 [Bae80]


EE 4504 Section 1 27 EE 4504 Section 1 28

14
Register Level Descriptive Tools

Internal operation of each processor level


component is defined
Defines the actual hardware of the system
in terms of data flow (at the word (register)
level) and the associated control
mechanisms
Any number of “hardware design
languages” (CDLs, HDLs, and RTLs) can
be used
– An example is the RTL from the Mano machine
in EE 2504

ISP specification of Sigma 5 machine [Bae80]


EE 4504 Section 1 29 EE 4504 Section 1 30

15
RTL example of the “Mano Machine” [Man93] RTL example of the “Mano Machine” [Man93] (cont)
EE 4504 Section 1 31 EE 4504 Section 1 32

16
VHDL -- A Universal Design
Gate Level Descriptive Tools Tool

At the gate level, descriptive tools rely on Background


combinational and sequential design – DoD sponsored the VHSIC Hardware Design
techniques as in EE 2504 and EE 4505/6 Language (VHDL) program in the 1980s to
promote the rapid insertion of advanced
To specify a complete computer system at microelectronic components into operational
this level is a staggering task! systems -- cut design and development time
Current automated design tools have – Speed process by:
replaced the manual methods of state » Increasing communication among defense
tables, truth tables, etc. contractors
» Increasing efficiency of CAD/CAM
capabilities
» Improving functioning of multi-contractor
teams
Objectives
– Support hardware technologies and design
methodologies
– Support design styles and automation tools
– Support management of design data

EE 4504 Section 1 33 EE 4504 Section 1 34

17
Implementation Modeling computer hardware
– Main focus is on chip-level designs at the gate – A hardware component is represented using a
and transistor level “design entity”
– Designs supported hierarchical design – Entities consist of 2 parts
decomposition » Interface -- containing externally visible
– VHDL design specifications resemble information
“programs” -- very similar to Ada in style » Bodies -- Describing one or more
– Structured programming mechanisms enable implementations of the entity
top-down hierarchical decomposition of the
design process
– As a result of the programming nature of the
specification, simulation of a design is possible Interface
– VHDL has been used to span all design levels
and is now being used to specify full
multiprocessor systems from the global level all Body
the way down to actual implementation in
hardware
Design entity

EE 4504 Section 1 35 EE 4504 Section 1 36

18
Entity interface contains / defines
externally visible items X
SUM
– Ports Y Full adder
– Data parameters Cout
Cin
– Generic parameters
– Declarations and assertions
Entity body describes alternative
implementations of the entity
– Architectural architecture architecture_view of full_adder is view:
» Data flow approach where body statements
are executed in parallel block
» Function decomposition composed of other begin
entities SUM <= X xor Y xor Cin
– Behavioral Cout <= (X and Y) or (Y and Cin) or
(Cin and X)
» Control flow approach -- statements
end block;
executed sequentially
end architecture_view
» No information on structural decomposition

EE 4504 Section 1 37 EE 4504 Section 1 38

19
architecture behavior_view of full_adder is view: Architectural description can contain
block references to other entities
begin – Designs can use libraries of entities of
previously designed “parts” to speed the
process (X, Y, Cin) process
variable S: bit_vector (1 to 3):= X&Y&Cin;
variable NUM: integer range 0 to 3 :=0;
begin Interface
for i:= 1 to 3 loop
if S(i) = ‘1’ then Body
NUM := NUM+1;
end if;
Interface Interface Interface
end loop;
case NUM is
when 0 => Cout<=0; SUN<=0; Body Body Body
when 1 => Cout <=0; SUM<=1;
when 2 => Cout<=1; SUM<=0;
when 3=> Cout<=1; SUM<=1;
end process; Interface Interface
end block;
end behavior_view Body Body
EE 4504 Section 1 39 EE 4504 Section 1 40

20
Thus, the full adder could be defined as
consisting of interconnected XOR, AND,
and OR gates which are themselves
defined as entities in a library
Consider a simple computer system:

EE 4504 Section 1 41 EE 4504 Section 1 42

21
Computer Architecture
Performance Measures

Introduction – Which of the following increases throughput,


– In the last section, representative descriptive reduces response time, or both?
tools for each design level were presented » Faster clock cycle time
– We still have the problem of assessing one or » Multiple processors for separate tasks
more differing architectures in a dynamic sense » Parallel processing of scientific problems
-- how well (fast) will the machine work? – Recalling that architects design machines to run
– In this section, we will discuss common ways programs, improved performance is a total
of gauging a system's value in terms of its cost system process as embodied by Amdahl's Law:
and its performance » The performance improvement to be gained
– Observation: An increase in a machine's from using some faster mode of execution
performance is viewed in one of two is limited by the fraction of time the faster
(competing) ways: mode can be used
» Reduced response time to an individual job
» Increase in overall throughput

EE 4504 Section 1 43 EE 4504 Section 1 44

22
– Consider the problem of leaving West Virginia
for civilization in Virginia. Assume that you Vehicle Hrs for Speedup Hrs for Speedup
must walk out of the mountains and that portion Used in Trip in in VA Entire for Trip
of the trip takes 20 hours. Once in Virginia, 200 VA VA Trip
miles are traversed in one of several modes of Walk 50 1 70 1
travel. Bike 20 2.5 40 1.8
Honda 4 12.5 24 2.9
Compute the total travel time for each mode Ferrari 1.67 30 21.67 3.2
and the speedup compared to walking the entire Rocket 0.33 150 20.33 3.4
trip. Car

Modes:
– Moral 1: One must be careful as an architect in
» Walk at 4 MPH
assessing the impact of a particular
» Ride a bike at 10 MPH performance enhancement on the system as a
» Drive a Honda Excel at 50 MPH whole
» Drive a Ferrari at 120 MPH – Moral 2: When evaluating a machine's
» Drive a rocket car at 600 MPH performance, one must be very careful in the
use of narrowly defined test/benchmark data

EE 4504 Section 1 45 EE 4504 Section 1 46

23
Memory Bandwidth MIPS
– Memory bandwidth is the maximum rate in bits – Defined as millions of instructions per second
per second at which information can be InstructionCount ClockRate
MIPS = =
transferred to or from main memory ExecTime × 10 6 CPI × 106

– Imposes a basic limit on the processing power – In general, faster machines will have higher
of a system MIPS ratings and appear to have better
– Weakness is that it is not related in any way to performance
actual program execution – Advantage in use: easy to "understand," easy to
– Not one of the current "in" figures of merit market systems with this measure of
performance
– Problems:
» Rating of a machine is based on its
instruction set -- how do you compare
machines with very different instruction
sets? Apples and Oranges

EE 4504 Section 1 47 EE 4504 Section 1 48

24
– MIPS rating can vary on a single computer – Example 2: Impact of optimizing compiler
based on program being executed
– MIPS can vary inversely to performance! -- Assume the following program makeup:
increase in performance with a decrease in
MIPS rating Operation Freq. Clock Cycles
– Example 1: ALU 43% 1
Load 21% 2
Use of a floating point unit vs. S/W routines for Store 12% 2
floating point operations: Branch 24% 2

FPU uses less time and less instructions, S/W Assume a 20 ns clock, optimizing compiler
uses many simple integer instructions leading to eliminates 50% of all ALU operations
what seems like higher MIPS rating -- even
though it takes more time than the FPU to NOT Optimized:
perform the task Ave CPI = 0.43 x 1 + 0.21 x 2 + 0.12 x 2 +
0.24x 2 = 1.57
MIPS = 50 MHz / 1.57 x 106 = 31.8

Optimized:
Ave CPI = (.43/2x1 + .21x2 + .12x2 + .24x2) /
(1-.43/2) = 1.73
MIPS = 50 MHz / 1.73 x 106 = 28.9
EE 4504 Section 1 49 EE 4504 Section 1 50

25
– To attempt to standardize MIPS ratings across MFLOPS
machines, we now have native vs normalized – Millions of floating point operations per second
MIPS
– Useful in comparing performance of "scientific
» Use a reference machine to standardize the applications" machines
ratings
– Intended to provide a "fair" comparison
» VAX 11-780 1 MIP machine has been the between such machines since a flop is the same
standard for a number of years on all machines
– Problems
Time Re ference
Norm. MIPS = × MIPS.Re f
TimeNewMachine » While a flop is a flop, not all machines
implement the same set of flops -- some
operations are synthesized from more
primitive flops
» MFLOP rating varies with floating point
instruction mix -- the idea of fast vs. slow
floating point opns

EE 4504 Section 1 51 EE 4504 Section 1 52

26
Programs for performance analysis – Toy Benchmarks
– To accurately gauge system performance, » 10-100 lines of easily produced code with
applications programs must be considered known computational result
– Synthetic Benchmarks » Small, easy to write (compiler may not be
» Programs that attempt to match average produced yet) and evaluate
frequency of operations and operands over » Hennessy and Patterson assert best use is
a large program base for "beginning programming assignments"
» Don't do any real work since the benchmarks do not reflect
computer's performance for normal sized
» Whetstone -- based on 1970's Algol applications
programs
– Kernels
» Dhrystone -- Based on a composite of HLL
statements for 1980's -- targeted to test CPU » Small, key pieces of real programs
and compiler performance » Livermore loops and Linpack are good
» Designer can get around these benchmarks examples
» Tend to focus on a specific aspect of the
overall performance rather than the entire
system

EE 4504 Section 1 53 EE 4504 Section 1 54

27
– Real Programs Other considerations
» Actual applications with real I/O – COST!!!
» Best measure of a machine's total capability » Design cost
» Often, the hardest to judge based on limited » Purchase cost
access to machine -- have not bought it yet! » Components: component, direct, indirect
» Typical suite would include compilers, – Compatibility
word processors, math applications, etc.
– S/W availability
» SPEC (System Performance Evaluation
Cooperative) is a good example of – Maintenance
workstation benchmark (started by HP,
Sun, DEC, and MIPS)

EE 4504 Section 1 55 EE 4504 Section 1 56

28
Section Summary

This section has briefly presented a history


of the electronic computer
– Generations
– Technologies
– Applications
Surveyed
– Classification and descriptive levels of
computer design
– Associated design tools
Introduced performance measures for
computer evaluation and comparison

EE 4504 Section 1 57

29

You might also like