You are on page 1of 67

+

KT14203

Computer
Architecture
and
Organization

Presented by:
Dr. Mohd Hanafi Ahmad Hijazi
SKTM, UMS
Slides, with minor modifications, taken from
William Stallings Computer Organization and
Architecture, 9th Edition
+
Chapter 2
Computer Evolution and Performance
+
History of Computers
First Generation: Vacuum Tubes
 ENIAC
 Electronic Numerical Integrator And Computer
 Designed and constructed at the University of Pennsylvania
 Started in 1943 – completed in 1946
 By John Mauchly and John Eckert

 World’s first general purpose electronic digital computer


 Army’s Ballistics Research Laboratory (BRL) needed a way to supply trajectory
tables for new weapons accurately and within a reasonable time frame
 Was not finished in time to be used in the war effort

 Its first task was to perform a series of calculations that were used to
help determine the feasibility of the hydrogen bomb

 Continued to operate under BRL management until 1955 when it was


disassembled
ENIAC

Major
Memory drawback
consisted
was the need
Occupied of 20
Contained Capable
1500 Decimal accumulators,
more of for manual
Weighed square 140 kW rather each
than 5000 programming
30 feet Power than capable
18,000 additions by setting
tons of consumption binary of
vacuum per switches
floor machine holding
tubes second and
space a
10 digit plugging/
number unplugging
cables
ENIAC

Source: https://www.youtube.com/watch?v=ANRJsigryJw
+

ENIAC
Pictures taken from
http://en.wikipedia.org/wiki/ENIAC
+
John von Neumann
EDVAC (Electronic Discrete Variable Computer)

 First publication of the idea was in 1945

 Stored program concept


 Attributed to ENIAC designers, most notably the mathematician
John von Neumann (consultant of ENIAC project)
 Program represented in a form suitable for storing in memory
alongside the data

 IAS computer
 Princeton Institute for Advanced Studies
 Prototype of all subsequent general-purpose computers
 Completed in 1952
Structure of von Neumann Machine
+
IAS Memory Formats
 Both data and instructions are
 The memory of the IAS stored there
consists of 1000 storage
locations (called words) of  Numbers are represented in
binary form and each instruction
40 bits each is a binary code
+
Structure
of
IAS
Computer
+ Registers
Memory buffer register • Contains a word to be stored in memory or sent to the I/O unit
(MBR) • Or is used to receive a word from memory or from the I/O unit

Memory address • Specifies the address in memory of the word to be written from
register (MAR) or read into the MBR

Instruction register (IR) • Contains the 8-bit opcode instruction being executed

Instruction buffer • Employed to temporarily hold the right-hand instruction from a


register (IBR) word in memory

• Contains the address of the next instruction pair to be fetched


Program counter (PC) from memory

Accumulator (AC) and • Employed to temporarily hold operands and results of ALU
multiplier quotient (MQ) operations
+

IAS
Operations
+

Table 2.1

The IAS
Instruction
Set

Table 2.1 The IAS Instruction Set


+
History of Computers
Second Generation: Transistors
 Smaller

 Cheaper

 Dissipates less heat than a vacuum tube

 Is a solid state device made from silicon

 Was invented at Bell Labs in 1947

 It was not until the late 1950’s that fully transistorized


computers were commercially available
Table 2.2
Computer Generations

+
Computer Generations
+
Second Generation Computers

 Introduced:
 Appearance of the Digital
 More complex arithmetic
Equipment Corporation (DEC)
and logic units and control
units in 1957
 The use of high-level
 PDP-1 was DEC’s first
programming languages
computer
 Provision of system software
which provided the ability  This began the mini-computer
to:
phenomenon that would
 load programs become so prominent in the
 move data to peripherals third generation
and libraries
 perform common
computations
Table 2.3
Example
Members of the
IBM 700/7000 Series

Table 2.3 Example Members of the IBM 700/7000 Series


History of Computers
Third Generation: Integrated Circuits

 1958 – the invention of the integrated circuit

 Discrete component
 Single, self-contained transistor
 Manufactured separately, packaged in their own containers, and
soldered or wired together onto masonite-like circuit boards
 Manufacturing process was expensive and cumbersome

 The two most important members of the third generation


were the IBM System/360 and the DEC PDP-8
+
Microelectronics
+  A computer consists of gates,
Integrated memory cells, and
interconnections among these
Circuits elements

 The gates and memory cells


 Data storage – provided by are constructed of simple
memory cells digital electronic components
 Data processing – provided by
gates  Exploits the fact that such
components as transistors,
resistors, and conductors can be
 Data movement – the paths fabricated from a
among components are used semiconductor such as silicon
to move data from memory to
memory and from memory  Many transistors can be
through gates to memory produced at the same time on a
single wafer of silicon
 Control – the paths among
components can carry control  Transistors can be connected
signals with a processor metallization to
form circuits
+
Wafer,
Chip,
and
Gate
Relationship
+
Chip Growth
Moore’s Law
1965; Gordon Moore – co-founder of Intel

Observed number of transistors that could


be put on a single chip was doubling every
year
Consequences of Moore’s law:
The pace slowed to
a doubling every 18
months in the
The cost of Computer
1970’s but has The electrical becomes
computer
sustained that rate logic and
path length is smaller and is Reduction in
Fewer
ever since shortened, more power and
memory interchip
increasing convenient to cooling
circuitry has use in a variety connections
operating requirements
fallen at a of
speed
dramatic rate environments
+
IBM System/360

 Announced in 1964

 Product line was incompatible with older IBM machines

 Was the success of the decade and cemented IBM as the


overwhelmingly dominant computer vendor

 The architecture remains to this day the architecture of IBM’s


mainframe computers

 Was the industry’s first planned family of computers


 Models were compatible in the sense that a program written for
one model should be capable of being executed by another
model in the series

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Family Characteristics
Similar or
Similar or
identical
identical
operating
instruction set
system

Increasing
Increasing
number of I/O
speed
ports

Increasing
Increasing cost
memory size

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+

Image taken from wikipedia


Table 2.5
Evolution of the PDP-8

Table 2.5 Evolution of the PDP-8


+
DEC - PDP-8 Bus Structure
+ LSI
Large
Scale
Later Integration

Generations
VLSI
Very Large
Scale
Integration

ULSI
Semiconductor Memory Ultra Large
Microprocessors Scale
Integration
+ Semiconductor Memory
In 1970 Fairchild produced the first relatively capacious semiconductor memory

Chip was about the size Could hold 256 bits of


Non-destructive Much faster than core
of a single core memory

In 1974 the price per bit of semiconductor memory dropped below the price per bit
of core memory
There has been a continuing and rapid decline in Developments in memory and processor
memory cost accompanied by a corresponding technologies changed the nature of computers in
increase in physical memory density less than a decade

Since 1970 semiconductor memory has been through 13 generations

Each generation has provided four times the storage density of the previous generation, accompanied
by declining cost per bit and declining access time
+
Microprocessors
 The density of elements on processor chips continued to rise
 More and more elements were placed on each chip so that fewer
and fewer chips were needed to construct a single computer
processor

 1971 Intel developed 4004


 First chip to contain all of the components of a CPU on a single
chip
 Birth of microprocessor

 1972 Intel developed 8008


 First 8-bit microprocessor

 1974 Intel developed 8080


 First general purpose microprocessor
 Faster, has a richer instruction set, has a large addressing
capability
Evolution of Intel Microprocessors

a. 1970s Processors

b. 1980s Processors
Evolution of Intel Microprocessors

c. 1990s Processors

d. Recent Processors
+
The Evolution of the Intel x86
Architecture
 Two processor families are the Intel x86 and the ARM
architectures

 Current x86 offerings represent the results of decades of


design effort on complex instruction set computers (CISCs)

 An alternative approach to processor design is the reduced


instruction set computer (RISC)

 ARM architecture is used in a wide variety of embedded


systems and is one of the most powerful and best-designed
RISC-based systems on the market

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Highlights of the Evolution of the
Intel Product Line:
8080 8086 80286 80386 80486
• World’s first • A more • Extension of the • Intel’s first 32- • Introduced the
general- powerful 16-bit 8086 enabling bit machine use of much
purpose machine addressing a • First Intel more
microprocessor • Has an 16-MB memory processor to sophisticated
• 8-bit machine, instruction instead of just support and powerful
8-bit data path cache, or 1MB multitasking cache
to memory queue, that technology and
• Was used in the prefetches a sophisticated
first personal few instructions instruction
computer before they are pipelining
(Altair) executed • Also offered a
• The first built-in math
appearance of coprocessor
the x86
architecture
• The 8088 was a
variant of this
processor and
used in IBM’s
first personal
computer
(securing the
success of Intel

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Highlights of the Evolution of the
Intel Product Line:
Pentium
• Intel introduced the use of superscalar techniques, which allow multiple instructions to execute in parallel

Pentium Pro
• Continued the move into superscalar organization with aggressive use of register renaming, branch
prediction, data flow analysis, and speculative execution

Pentium II
• Incorporated Intel MMX technology, which is designed specifically to process video, audio, and graphics
data efficiently

Pentium III
•Incorporated additional floating-point instructions
•Streaming SIMD Extensions (SSE)

Pentium 4
• Includes additional floating-point and other enhancements for multimedia

Core
• First Intel x86 micro-core

Core 2
• Extends the Core architecture to 64 bits
• Core 2 Quad provides four cores on a single chip
• More recent Core offerings have up to 10 cores per chip
• An important addition to the architecture was the Advanced Vector Extensions instruction set

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Embedded Systems
 The use of electronics and software within a product

 Billions of computer systems are produced each year


that are embedded within larger devices

 Today many devices that use electric power have an


embedded computing system

 Often embedded systems are tightly coupled to their


environment
 This can give rise to real-time constraints imposed by the
need to interact with the environment
 Constraints such as required speeds of motion, required
precision of measurement, and required time durations,
dictate the timing of software operations
 If multiple activities must be managed simultaneously this
imposes more complex real-time constraints

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Custom
logic

Processor Memory

Human Diagnostic
interface port

A/D D/A
conversion Conversion

Actuators/
Sensors
indicators

Figure 1.14 Possible Organization of an Embedded System

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
The Internet of Things (IoT)
 Term that refers to the expanding interconnection of smart devices, ranging
from appliances to tiny sensors

 Is primarily driven by deeply embedded devices

 Generations of deployment culminating in the IoT:


 Information technology (IT)
 PCs, servers, routers, firewalls, and so on, bought as IT devices by enterprise IT
people and primarily using wired connectivity
 Operational technology (OT)
 Machines/appliances with embedded IT built by non-IT companies, such as
medical machinery, SCADA, process control, and kiosks, bought as appliances by
enterprise OT people and primarily using wired connectivity
 Personal technology
 Smartphones, tablets, and eBook readers bought as IT devices by consumers
exclusively using wireless connectivity and often multiple forms of wireless
connectivity
 Sensor/actuator technology
 Single-purpose devices bought by consumers, IT, and OT people exclusively
using wireless connectivity, generally of a single form, as part of larger systems

 It is the fourth generation that is usually thought of as the IoT and it is marked
by the use of billions of embedded devices
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Embedded Application Processors
Operating versus
Systems Dedicated Processors

 There are two general  Application processors


approaches to developing an  Defined by the processor’s ability
to execute complex operating
embedded operating system systems
(OS):  General-purpose in nature
 Take an existing OS and  An example is the smartphone –
the embedded system is designed
adapt it for the embedded to support numerous apps and
application perform a wide variety of functions

 Design and implement an  Dedicated processor


OS intended solely for  Is dedicated to one or a small
embedded use number of specific tasks required
by the host device
 Because such an embedded system
is dedicated to a specific task or
tasks, the processor and associated
components can be engineered to
reduce size and cost

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Processor

Analog data A/D Temporary


RAM
acquisition converter data

Analog data D/A Program


ROM
transmission converter and data

Send/receive Serial I/O Permanent


EEPROM
data ports data

Peripheral Parallel I/O Timing


TIMER
interfaces ports System functions
bus

Figure 1.15 Typical Microcontroller Chip Elements


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Deeply Embedded Systems
 Subset of embedded systems

 Has a processor whose behavior is difficult to observe both by the


programmer and the user

 Uses a microcontroller rather than a microprocessor

 Is not programmable once the program logic for the device has been
burned into ROM

 Has no interaction with a user

 Dedicated, single-purpose devices that detect something in the


environment, perform a basic level of processing, and then do
something with the results

 Often have wireless capability and appear in networked configurations,


such as networks of sensors deployed over a large area

 Typically have extreme resource constraints in terms of memory,


processor size, time, and power consumption

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


ARM

Refers to a processor architecture that has evolved from


RISC design principles and is used in embedded systems

Family of RISC-based microprocessors and microcontrollers


designed by ARM Holdings, Cambridge, England

Chips are high-speed processors that are known for their


small die size and low power requirements

Probably the most widely used embedded processor


architecture and indeed the most widely used processor
architecture of any kind in the world

Acorn RISC Machine/Advanced RISC Machine

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
ARM Products

Cortex-M
• Cortex-M0
Cortex-R • Cortex-M0+
• Cortex-M3
Cortex- • Cortex-M4
A/Cortex-
A50

Huawei Kirin 970 with Neural Processing Unit


is based on ARM

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Security Analog Interfaces Timers &Triggers Parallel I/O Ports Serial Interfaces
Periph Timer/
bus int counter Pin
Hard- reset USART USB
ware A/D D/A Low Real
AES con- con- energy time ctr
General External Low-
verter verter energy
Pulse Watch- purpose Inter- UART
counter dog tmr I/O rupts UART

Peripheral bus
32-bit bus

Voltage Voltage High fre- High freq Flash SRAM Debug DMA
regula- compar- quency RC crystal memory memory inter- control-
tor ator oscillator oscillator 64 kB 64 kB face ler

Brown- Low fre- Low freq Memory


Power- protec-
out de- quency RC crystal Cortex-M3 processor
on reset tion unit
tector oscillator oscillator
Energy management Clock management Core and memory

Microcontroller Chip
ICode SRAM &
interface peripheral I/F
Bus matrix

Debug logic

Memory
DAP protection unit

ARM
NVIC core ETM
Cortex-M3 Core
NVIC ETM Cortex-M3
interface interface
Processor
32-bit ALU
Hardware 32-bit
divider multiplier

Control Thumb
logic decode
Instruction Data
interface interface

Figure 1.16 Typical Microcontroller Chip Based on Cortex-M3

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Cloud Computing

 NIST defines cloud computing as:

“A model for enabling ubiquitous, convenient,


on-demand network access to a shared pool of
configurable computing resources that can be
rapidly provisioned and released with minimal
management effort or service provider interaction.”

 You get economies of scale, professional network


management, and professional security management

 The individual or company only needs to pay for the storage


capacity and services they need

 Cloud provider takes care of security

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Cloud Networking
 Refers to the networks and network management functionality that must
be in place to enable cloud computing

 One example is the provisioning of high-performance and/or high-


reliability networking between the provider and subscriber

 The collection of network capabilities required to access a cloud,


including making use of specialized services over the Internet, linking
enterprise data center to a cloud, and using firewalls and other network
security devices at critical points to enforce access security policies

Cloud Storage
 Subset of cloud computing

 Consists of database storage and database applications hosted


remotely on cloud servers

 Enables small businesses and individual users to take advantage of data


storage that scales with their needs and to take advantage of a variety of
database applications without having to buy, maintain, and manage the
storage assets
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Designing for Performance
 The cost of computer systems continues to drop dramatically, while the performance
and capacity of those systems continue to rise equally dramatically

 Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago

 Processors are so inexpensive that we now have microprocessors we throw away

 Desktop applications that require the great power of today’s microprocessor-based


systems include:
 Image processing
 Three-dimensional rendering
 Speech recognition
 Videoconferencing
 Multimedia authoring
 Voice and video annotation of files
 Simulation modeling

 Businesses are relying on increasingly powerful servers to handle transaction and


database processing and to support massive client/server networks that have
replaced the huge mainframe computer centers of yesteryear

 Cloud service providers use massive high-performance banks of servers to


satisfy high-volume, high-transaction-rate applications for a broad spectrum of
clients
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Microprocessor Speed
Techniques built into contemporary processors include:

•Processor moves data or instructions


Pipelining into a conceptual pipe with all stages
of the pipe processing simultaneously

Branch
•Processor looks ahead in the
instruction code fetched from memory
and predicts which branches, or
prediction groups of instructions, are likely to be
processed next

Data flow •Processor analyzes which instructions


are dependent on each other’s results,

analysis or data, to create an optimized


schedule of instructions

Speculative
• Using branch prediction and data flow analysis,
some processors speculatively execute
instructions ahead of their actual appearance in

execution
the program execution, holding the results in
temporary locations, keeping execution
engines as busy as possible
+
Performance
Balance
 Adjust the organization and Increase the number
of bits that are
architecture to compensate retrieved at one time
by making DRAMs
for the mismatch among the “wider” rather than
“deeper” and by
capabilities of the various using wide bus data
paths
components
Reduce the
 Architectural examples frequency of memory
access by
include: incorporating
increasingly
complex and
efficient cache
structures between
the processor and
main memory

Increase the
Change the DRAM interconnect
interface to make it bandwidth between
more efficient by processors and
including a cache or memory by using
other buffering higher speed buses
scheme on the DRAM and a hierarchy of
chip buses to buffer and
structure data flow
Typical I/O Device Data Rates
+
Improvements in Chip
Organization and Architecture
 Increase hardware speed of processor
 Fundamentally due to shrinking logic gate size
 More gates, packed more tightly, increasing clock rate
 Propagation time for signals reduced

 Increase size and speed of caches


 Dedicating part of processor chip
 Cache access times drop significantly

 Change processor organization and architecture


 Increase effective speed of instruction execution
 Parallelism
The use of multiple

Multicore processors on the same chip


provides the potential to
increase performance
without increasing the clock
rate

Strategy is to use two simpler


processors on the chip rather
than one more complex
processor

With two processors larger


caches are justified

As caches became larger it


made performance sense to
create two and then three
levels of cache on a chip
+
Many Integrated Core (MIC)
Graphics Processing Unit (GPU)
MIC GPU
 Leap in performance as well  Core designed to perform
as the challenges in parallel operations on graphics
developing software to exploit data
such a large number of cores
 Traditionally found on a plug-in
 The multicore and MIC graphics card, it is used to
strategy involves a encode and render 2D and 3D
homogeneous collection of graphics as well as process
general purpose processors video
on a single chip
 Used as vector processors for a
variety of applications that
require repetitive computations
+ Overview
ARM
 Results of decades of design effort on
complex instruction set computers Intel
(CISCs)

 Excellent example of CISC design

 Incorporates the sophisticated design


principles once found only on
mainframes and supercomputers

 An alternative approach to processor


design is the reduced instruction set
x86 Architecture
computer (RISC)

 The ARM architecture is used in a


wide variety of embedded systems
and is one of the most powerful and
best designed RISC based systems on
the market

 In terms of market share Intel is CISC


ranked as the number one maker of
microprocessors for non-embedded
systems RISC
+
System Clock

Constant signal waves

Digital voltage pulses

1 pulse = 1 clock cycle.


Processor speed is measured in cycles per second, or Hertz (Hz).
Processor speed does not tell the whole story about performance.
+ Table
Performance Factors 2.9
and
System Attributes

Ic = Instruction count, p = number of processor cycles needed to decode and execute, m


= number of memory references, k = ratio between memory and processor cycle time, 𝜏=
1/ clock frequency
+
MIPS Rate

 A common measure of performance for a processor is the


rate at which instructions are executed, expressed as millions
of instructions per second (MIPS).

𝐼𝑐 𝑓
 𝑀𝐼𝑃𝑆 𝑟𝑎𝑡𝑒 = =
𝑇×106 𝐶𝑃𝐼×106

𝑇 = 𝐼𝑐 × 𝐶𝑃𝐼 × 𝜏(the measuring unit used is second)


σ𝑛
𝑖=1 𝐶𝑃𝐼𝑖 ×𝐼𝑖
 𝐶𝑃𝐼 = , n is number of different type of
𝐼𝑐
instructions.
+
MIPS Rate Example

 Given 2 millions instruction executed on a 400 MHz


processor, the instruction mix and the CPI for each
instruction are shown in the table below.
Instruction Type CPI Instruction Mix (%)
Arithmetic and logic 1 60
Load/store with cache hit 2 18
Branch 4 12
Memory reference 8 10

 Overall CPI: 0.6 × 1 + 0.18 × 2 + 0.12 × 4 + 0.1 × 8 = 2.24

 MIPS rate: 400 × 106 Τ 2.24 × 106 ≈ 178


Benchmarks
For example, consider this high-level language statement:

A = B + C /* assume all quantities in main memory */

With a traditional instruction set architecture, referred to as a complex


instruction set computer (CISC), this instruction can be compiled into
one processor instruction:

add mem(B), mem(C), mem (A)

On a typical RISC machine, the compilation would look


something like this:
load mem(B), reg(1);
load mem(C), reg(2);
add reg(1), reg(2), reg(3);
store reg(3), mem (A)
+
System Performance Evaluation
Corporation (SPEC)
 Benchmark suite
 A collection of programs, defined in a high-level language
 Attempts to provide a representative test of a computer in a
particular application or system programming area

 SPEC
 An industry consortium
 Defines and maintains the best known collection of benchmark
suites
 Performance measurements are widely used for comparison and
research purposes
+  Best known SPEC benchmark suite

 Industry standard suite for processor


intensive applications
SPEC  Appropriate for measuring
performance for applications that
spend most of their time doing
computation rather than I/O
CPU2006  Consists of 17 floating point programs
written in C, C++, and Fortran and 12
integer programs written in C and C++

 Suite contains over 3 million lines of


code

 Fifth generation of processor intensive


suites from SPEC
+  Gene Amdahl [AMDA67]

 Deals with the potential speedup of a


program using multiple processors
compared to a single processor
Amdahl’s  Illustrates the problems facing industry

Law
in the development of multi-core
machines
 Software must be adapted to a highly
parallel execution environment to
exploit the power of parallel
processing

 Can be generalized to evaluate and


design technical improvement in a
computer system
T
(1 – f)T fT

(1 – f)T fT
N

1
1 f 1 T
N

Figure 2.3 Illustration of Amdahl’s Law


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Amdahl’s Law
+ Summary Computer Evolution
and Performance
Chapter 2
 Multi-core
 First generation computers
 MICs
 Vacuum tubes
 Second generation computers  GPGPUs
 Transistors  Evolution of the Intel x86
 Third generation computers
 Cloud computing
 Integrated circuits
 Embedded systems
 Performance designs
 Microprocessor speed  Performance assessment
 Performance balance  Clock speed and instructions
 Chip organization and per second
architecture  Benchmarks
 Amdahl’s Law

You might also like