You are on page 1of 65

+

Chapter 2
Computer Evolution and Performance
William Stallings : Computer Organization and Architecture, 9th Edition
+ 2

Objectives

Why should we study this chapter?


 How are computers developed?  generations
 What
applications require great power
computers?
 Whatare Multicore, MICs (many integrated
cores), and GPGPUs (general purpose
graphical processing unit)?
 How to assess computer performance?
+ 3

Objectives
After studying this chapter, you should be able to:
 Present an overview of the evolution of computer
technology from early digital computers to the
latest microprocessors.
 Understand the key performance issues that
relate to computer design.
 Explain the reasons for the move to multicore
organization, and understand the trade-off
between cache and processor resources on a
single chip.
+ 4

Contents

 Basics: Number Systems (optional)


 2.1- A Brief History of Computers
 2.2- Designing for Performance
 2.3- Multicore, MICs, and GPGPUs
 2.6- Performance Assessment
+ 5

Number Systems: Definition

Definition Base-10/ Base-2/ Base-16/ Hexadecimal


Decimal Binary system
system system

Base number 10 2 16
Set of digits { 0, 1, 2, …, 9 } { 0. 1 } { 0, 1, 2, …, 9, A, B, C, D, E, F }
Basic operations +, -. *, / +, -. *, / +, -. *, /

Review: System is an assemblage of related parts in which there exists an


operating mechanism. What are elements of a number system?

Operating mechanism in number systems: Give your explanation.


+ 6

Number Systems:
Representing a quantity
 Choose a system

 Use positional expansion


+ 7

Number Systems: Counting

 Decimal count: 0, 1, 2, …, 9, 10, 11, 12 ,…

 Binary count: 0, 1, 10, 11, 100, …

 Hexadecimal count: 0, 1, …, 9, A, B, C, D, E, F, 10, 11,



+ 8

Number Systems: Conversions


(Decimal  Binary/Hexa expansion)

37d = ?b = ?h
69d = ?b =?h
42d = ?b= ?h
+ 9

Number Systems: Conversions


(Decimal  Binary/Hexa expansion) …
+ 10

Number Systems: Conversions


(Binary  Hexa expansion)

1001100b = ?h 11001110b = ? h
2AFh = ?b 49Ch= ?b
BF7h = ?b 7EAh = ?b
+ 11

2.1- History of Computers

A generation is engraved based on an IC: Integrated Circuit


event/essential invention
First
+ Generation: Vacuum Tubes 12

 Basic technology: Vacuum tubes

 Buildingblock: Composition and operating of vacuum


tube (https://en.wikipedia.org/wiki/Vacuum_tube)

 Typical computers:
 ENIAC (Electronic Numerical Integrator And Computer)
 EDVAC (Electronic Discrete Variable Computer) and John Von
Neumann
 IAS computer (Princeton Institute for Advanced Studies)
 Commercial Computers: UNIVAC ((Universal Automatic
Computer)
 IBM Computers ( International Business Machines)
+ First Generation: ENIAC Computer 13

(Read by yourself)
 Electronic Numerical Integrator And Computer
 Designed and constructed at the University of Pennsylvania
 Started in 1943 – completed in 1946, by John Mauchly and John Eckert

 World’s first general purpose electronic digital computer


 Army’s Ballistics Research Laboratory (BRL) needed a way to supply
trajectory tables for new weapons accurately and within a reasonable
time frame
 Was not finished in time to be used in the war effort

 Its first task was to perform a series of calculations that were used
to help determine the feasibility of the hydrogen bomb

 Continued to operate under BRL management until 1955 when it


was disassembled (Army’s Ballistics Research Laboratory )
ENIAC: Characteristics
14

Major
Memory drawback
consisted
was the need
Occupied of 20
Contained Capable
1500 Decimal accumulators,
more of for manual
Weighed square 140 kW rather each
than 5000 programming
30 feet Power than capable
18,000 additions by setting
tons of consumption binary of
vacuum per switches
floor machine holding
tubes second and
space a
10 digit plugging/
number unplugging
cables
+ 15

John von Neumann


EDVAC (Electronic Discrete Variable Computer)

 First publication of the idea was in 1945

 Stored program concept


 Attributed to ENIAC designers, most notably the
mathematician John von Neumann
 Program represented in a form suitable for storing in
memory alongside the data (program= data +
instructions)

 IAS computer
 Princeton Institute for Advanced Studies
 Prototype of all subsequent general-purpose computers
 Completed in 1952
16

Structure of von Neumann Machine

CA: Cellular Automata


CC: Cellular Constructor
+ 17

IAS Memory Formats


 Both data and instructions are
 The memory of the IAS stored there
consists of 1000 storage
locations (called words) of  Numbers are represented in
binary form and each instruction
40 bits each is a binary code

data

Instruction
One word contains 2 instructions
+
Structure
of
IAS
Computer
AC: Accumulator
MQ: Multiplier Quotient
MBR: Memory Buffer Register
IBR: Instruction Buffer Register
PC: program counter
IR: Instruction register
MAR: Memory Address Register
+ 19

Table 2.1
The IAS
Instruction
Set
Hexadecimal Code:
+ 010FA210FB
20

IAS code length: 20 bits


Left instruction: 010FA
Opcode: 01(h)
Address: 0FA
01(h)  0000 0001
Load data in the 0FA memory
word to AC
AC = [0FA]
Right instruction: 210FB
Opcode: 21(h)
Address: 0FB Run IAS
21(h)  0010 0001
Store AC to the 0FB memory
Machine
word
[0FB] = AC
Code
AC: 7
7 OFA
 [0FB] = [0FA]
7 OFB
A part of the exercise 2.7
+ 21

Commercial Computers: UNIVAC


(Read by yourself)
 1947 – Eckert and Mauchly formed the Eckert-Mauchly
Computer Corporation to manufacture computers commercially

 UNIVAC I (Universal Automatic Computer)


 First successful commercial computer
 Was intended for both scientific and commercial applications
 Commissioned by the US Bureau of Census for 1950 calculations

 The Eckert-Mauchly Computer Corporation became part of the


UNIVAC division of the Sperry-Rand Corporation

 UNIVAC II – delivered in the late 1950’s


 Had greater memory capacity and higher performance

 Backward compatible
22
+
 Was the major manufacturer of
punched-card processing
equipment

 Delivered its first electronic


stored-program computer (701)
in 1953
 Intended primarily for
scientific applications IBM
 Introduced 702 product in 1955 (Read by yourself)
 Hardware features made it
suitable to business
applications

 Series of 700/7000 computers


established IBM as the
overwhelmingly dominant
computer manufacturer
+ 23

Second Generation: Transistors


 Transistor = Transfer – resistor (vật có thể truyền-cản điện)
 Building block: Composition and operating of transistor
More details: https://en.wikipedia.org/wiki/Transistor
 It’s activity is similar to those in vacuum tube
 Smaller, Cheaper
 Dissipates (phát tán) less heat than a vacuum tube
 Is a solid state device made from silicon
 Was invented at Bell Labs in 1947
 It was not until the late 1950’s that fully transistorized
computers were commercially available
 Typical computers: IBM 700/7000 series
+ 24

Second Generation Computers

 Introduced:  Appearance of the Digital


 More complex arithmetic Equipment Corporation
and logic units and control (DEC) in 1957
units
 The use of high-level  PDP-1 (programmed data
programming languages processor) was DEC’s first
 Provision of system software computer
which provided the ability
to:  This began the mini-
 load programs computer phenomenon that
 move data to peripherals would become so prominent
and libraries (leading) in the third
 perform common generation
computations
Table 2.3 : Example Members of the 25

IBM 700/7000 Series


26

IBM
7094
Configuration
Read by yourself

Multiplexer (mạch đa hợp) manages


centrally some devices.
Mag: magnetic
Drum: magnetic drum for storing data
27

Third Generation: Integrated Circuits


IC
 1958 – the invention of the integrated circuit
 All components of a circuit are minimize to micro size.
So, all of them are packed in a chip
 Discrete component
 Single, self-contained transistor
 Manufactured separately, packaged in their own containers,
and soldered or wired together onto masonite (like circuit
boards)
 Manufacturing process was expensive and cumbersome
(complex)

 The two most important members of the third


generation were the IBM System/360 and the DEC PDP-
8
+ 28

Microelectronics
+  A computer consists of gates,
29

Integrated memory cells, and


interconnections among these
Circuits elements

 Data storage – provided by  The gates and memory cells


memory cells are constructed of simple
digital electronic components
 Data processing – provided
by gates
 Exploits the fact that such
 Data movement – the paths components as transistors, resistors,
among components are used and conductors can be fabricated
to move data from memory to from a semiconductor such as silicon
memory and from memory
through gates to memory  Many transistors can be produced at
the same time on a single wafer(thin
 Control – the paths among piece) of silicon
components can carry control
signals  Transistors can be connected with a
processor metallization (cover using
metal) to form circuits
More details: https://en.wikipedia.org/wiki/Silicon
+ 30

Wafer, Wafer: a thin piece of


Chip, silicon (< 1 mm)

and
Gate
Relationship
+ Chip Growth 31

Figure 2.8 Growth in Transistor Count on Integrated Circuits

Number of
transistors

Year m: million
bn: billion
Moore’s Law 32

1965, Gordon Moore


(co-founder of Intel)

Observed number of transistors that could be put


on a single chip was doubling every year

Consequences of Moore’s law:


The pace slowed to
a doubling every 18
months in the Computer
1970’s but has The cost of
The electrical becomes
computer
sustained that rate logic and
path length is smaller and is Reduction in
Fewer
ever since shortened, more power and
memory interchip
increasing convenient to cooling
circuitry has use in a variety connections
operating requirements
fallen at a of
speed
dramatic rate environments
+ 33

Table 2.4: Characteristics of the


System/360 Family

Table 2.4 Characteristics of the System/360 Family


34

Table 2.5: Evolution of the PDP-8


(Read by yourself)
PDP: Programmed Data Processor
Produced by Digital Equipment Corporation (DEC)
+ 35

DEC - PDP-8 Bus Structure


DEC: Digital Equipment Corporation
PDP: Programmed Data Processor

Omni (Latin) = for all


+ LSI
Large
Scale
Later Integration

Generations
VLSI
Very Large
Scale
Integration

ULSI
Semiconductor Memory Ultra Large
Microprocessors Scale
Integration
+ Semiconductor Memory 37

In 1970 Fairchild produced the first relatively capacious semiconductor memory

Chip was about the size Could hold 256 bits of


Non-destructive Much faster than core
of a single core memory

In 1974 the price per bit of semiconductor memory dropped below the price per bit
of core memory
There has been a continuing and rapid decline in Developments in memory and processor
memory cost accompanied by a corresponding technologies changed the nature of computers in
increase in physical memory density less than a decade

Since 1970 semiconductor memory has been through 13 generations

Each generation has provided four times the storage density of the previous generation, accompanied
by declining cost per bit and declining access time
+ 38

Microprocessors
 The density of elements on processor chips continued to rise
 More and more elements were placed on each chip so that fewer
and fewer chips were needed to construct a single computer
processor

 1971 Intel developed 4004


 First chip to contain all of the components of a CPU on a single
chip
 Birth of microprocessor

 1972 Intel developed 8008


 First 8-bit microprocessor

 1974 Intel developed 8080


 First general purpose microprocessor
 Faster, has a richer instruction set, has a large addressing
capability
Evolution of Intel Microprocessors 39
Evolution of Intel Microprocessors 40
+ 41

2.2- Designing for Performance

Desktop applications that require the great power of


today’s microprocessor-based systems include

• Image processing

• Speech recognition

• Videoconferencing

• Multimedia authoring

• Voice and video annotation of files

• Simulation modeling
+ Microprocessor Speed 42

Techniques built into contemporary (current) processors include:


Technique Description
Pipelining Processor moves data or instructions into a
conceptual pipe with all stages of the pipe
processing simultaneously
Branch Processor looks ahead in the instruction code
prediction fetched from memory and predicts which
branches, or groups of instructions, are likely to
be processed next

Data flow Processor analyzes which instructions are


analysis dependent on each other’s results, or data, to
create an optimized schedule of instructions
Speculative Using branch prediction and data flow analysis,
(suy đoán) some processors speculatively execute
execution instructions ahead of their actual appearance in
the program execution, holding the results in
temporary locations, keeping execution engines
as busy as possible
+ 43

Performance
Balance
 Adjust the organization and Increase the number
of bits that are
architecture to compensate retrieved at one time
by making DRAMs
for the mismatch among the “wider” rather than
“deeper” and by
capabilities of the various using wide bus data
paths
components
Reduce the
 Architectural examples frequency of memory
access by
include: incorporating
increasingly
complex and
efficient cache
structures between
the processor and
main memory

Increase the
Change the DRAM interconnect
interface to make it bandwidth between
more efficient by processors and
including a cache or memory by using
other buffering higher speed buses
scheme on the DRAM and a hierarchy of
chip buses to buffer and
structure data flow
Typical I/O Device Data Rates 44
+ 45

Improvements in Chip
Organization and Architecture
 Increase hardware speed of processor
 Fundamentally due to shrinking logic gate size
 More gates, packed more tightly, increasing clock rate

 Propagation time for signals reduced

 Increase size and speed of caches


 Dedicating part of processor chip
 Cache access times drop significantly

 Change processor organization and architecture


 Increase effective speed of instruction execution
 Parallelism
+ 46

Problems with Clock Speed and


Login Density
 Power
 Power density increases with density of logic and clock speed
 Dissipating heat

 RC (Resistance and Capacitance) delay


 Speed at which electrons flow limited by resistance and
capacitance of metal wires connecting them
 Delay increases as RC product increases
 Wire interconnects thinner, increasing resistance
 Wires closer together, increasing capacitance

 Memory latency
 Memory speeds lag (slow down) processor speeds
47
+ Processor Trends
+ 48

2.3- Multicore, MICs, and GPGPUs

 Multicore
CPU: CPU has some cores running
concurrently.
 MIC: Many integrated core
 GPGPU: General Purpose Graphical
Processing Unit
The use of multiple 49

Multicore processors on the same chip


provides the potential to
increase performance
without increasing the clock
rate

Strategy is to use two


simpler processors on the
chip rather than one more
complex processor

With two processors larger


caches are justified

As caches became larger it


made performance sense
to create two and then three
levels of cache on a chip
+ 50

Many Integrated Core (MIC)


Graphics Processing Unit (GPU)

MIC GPU
 Leap (fast growth) in  Core designed to perform parallel
performance as well as the operations on graphics data
challenges in developing
software to exploit such a large  Traditionally found on a plug-in
number of cores graphics card, it is used to encode
and render 2D and 3D graphics as
 The multicore and MIC strategy well as process video
involves a homogeneous (same
kind) collection of general  Used as vector processors for a
purpose processors on a single variety of applications that require
chip repetitive computations
Read by Yourself 51

2.4- The Evolution of The Intel x86 Architecture


2.5- Embedded Systems and the ARM

Some definitions:
CISC: Complex Instruction Set Computer, CPU is equipped a
large set of instructions
RISC: Reduced Instruction Set Computer, CPU is equipped basic
instructions only based on the thinking: A high instruction is
created using some basic instructions.
ARM: Advanced RISC Machine
+ 52

2.6- Performance Assessment


Factors affect on computer performance:

Factors
Clock Speed and Instructions per Second
Instruction execution rate
Methods: Benchmarks
Some laws: Read by yourself
Amdahl’s Law
Little’s Law
+ 53

System Clock

- Digital devices need pulses to operate. Pulses are created by a


clock generator (a hardware using crystal oscillator)
- The rate of pulses is known as the clock rate, or clock speed.
- The time between pulses is the cycle time.
- One increment, or pulse, of the clock is referred to as a clock
cycle, or a clock tick.
- Unit: cycles per second, Hertz (Hz)
- Operations performed by a processor, such as fetching an
instruction, decoding the instruction, performing an arithmetic
operation, and so on, are governed by a system clock.
 High clock rate  High performance.
+ 54

Instruction Execution Rate

-Unit: MIPS (millions of instructions per second)


- Unit: MFLOPs (Floating-point performance is
expressed as millions of floating-point operations
per second)
+ 55

Benchmark

-A test used to measure hardware or software performance.


-Benchmarks for hardware use programs that test the
capabilities of the equipment
- Benchmarks for software determine the efficiency,
accuracy, or speed of a program in performing a particular
task, such as recalculating data in a spreadsheet.
-The same data is used with each program tested, so the
resulting scores can be compared to see which programs
perform well and in what areas.
(MS Computer Dictionary)
Benchmarks … 56

For example, consider this high-level language statement:

A = B + C /* assume all quantities in main memory */

With a traditional instruction set architecture, referred to as a complex


instruction set computer (CISC), this instruction can be compiled into
one processor instruction:
2 codes may
add mem(B), mem(C), mem (A)
need the same
amount of time
On a typical RISC machine, the compilation would look
when they
something like this:
load mem(B), reg(1);
execute on 2
load mem(C), reg(2); machines.
add reg(1), reg(2), reg(3);
store reg(3), mem (A)
+ 57

Benchmark
-The design of fair benchmarks is something of an art,
because various combinations of hardware and software
can exhibit widely variable performance under different
conditions. Often, after a benchmark has become a
standard, developers try to optimize a product to run that
benchmark faster than similar products run it in order to
enhance sales (MS Computer Dictionary)
 Beginning in the late 1980s and early 1990s, industry
and academic interest shifted to measuring the
performance of systems using a set of benchmark
programs
+ 58

Desirable Benchmark Characteristics

1. It is written in a high-level language, making


it portable across different machines.
2. It is representative of a particular kind of
programming style, such as system
programming, numerical programming, or
commercial programming.
3. It can be measured easily.
4. It has wide distribution.
+ 59

System Performance Evaluation


Corporation (SPEC)
 Benchmark suite
A collection of programs, defined in a high-level
language
 Attempts to provide a representative test of a computer
in a particular application or system programming area

 SPEC
 An industry consortium
 Defines and maintains the best known collection of
benchmark suites
 Performance measurements are widely used for
comparison and research purposes
Best known SPEC benchmark suite
+ 

 Industry standard suite for


processor intensive applications
SPEC  Appropriate for measuring
performance for applications that
spend most of their time doing

CPU2006 computation rather than I/O

 Consists of 17 floating point


programs written in C, C++, and
Fortran and 12 integer programs
written in C and C++

 Suite contains over 3 million lines of


code

 Fifth generation of processor


intensive suites from SPEC
 Gene Amdahl [AMDA67]
+Amdahl’s  Deals with the potential
Law speedup of a program using
multiple processors compared
(Read by to a single processor

yourself)  Illustrates
the problems facing
industry in the development of
multi-core machines
 Software must be adapted to a
highly parallel execution
environment to exploit the
power of parallel processing

 Can be generalized to evaluate


and design technical
improvement in a computer
system
+ Amdahl’s Law (Read by yourself) 62
+ 63

Little’s Law (Read by yourself)


 The general setup is that we have a steady state system to which items arrive
at an average rate of λ items per unit time. The items stay in the system an
average of W units of time. Finally, there is an average of L units in the system
at any one time. Little’s Law relates these three variables as L = λ W.

 Fundamental and simple relation with broad applications

 Can be applied to almost any system that is statistically in steady state, and
in which there is no leakage

 Queuing system
 If server is idle an item is served immediately, otherwise an arriving item
joins a queue
 There can be a single queue for a single server or for multiple servers, or
multiples queues with one being for each of multiple servers

 Average number of items in a queuing system equals the average rate at


which items arrive multiplied by the time that an item spends in the system
 Relationship requires very few assumptions
 Because of its simplicity and generality it is extremely useful
+ Questions (Use your notebook) 64

Building blocks: Composition and operating of vacuum tube/transistor

2.1 What is a stored program computer?

2.2 What are the four main components of any general-purpose computer?

2.3 At the integrated circuit level, what are the three principal constituents of a
computer system?

2.4 Explain Moore’s law.

2.5 List and explain the key characteristics of a computer family.

2.6 What is the key distinguishing feature of a microprocessor?

2.7- Refer to the table 2.1


+ Summary
65

Computer Evolution
and Performance
Chapter 2
 Multi-core
 First generation computers
 MICs
 Vacuum tubes
 Second generation computers  GPGPUs
 Transistors  Performance assessment
 Third generation computers  Clock speed and instructions
 Integrated circuits per second
 Benchmarks
 Performance designs
 Amdahl’s Law
 Microprocessor speed
 Little’s Law
 Performance balance
 Chip organization and
architecture

You might also like