You are on page 1of 52

Architecture History Types of Computers Structure Technology Trends Performance

CSC 213: Computer Architecture


Overview of Computer Architecture

October 15, 2021

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Agenda

1 Architecture

2 History

3 Types of Computers

4 Structure

5 Technology Trends

6 Performance

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

What is Computer Architecture?

Patterson & Hennessy:


Computer architecture =

Instruction set architecture + Machine organization + Hardware

Instruction Set Architecture (ISA)


WHAT the computer does (logical view)
Machine Organization
HOW the ISA is implemented (physical view)
Micro-Architecture
ISA implementation (Logic circuits level)

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Instruction Set Architecture (ISA)

ISA, or simply architecture – the abstract interface between


the hardware and the lowest level software that encompasses
all the information necessary to write a machine language
program, including instructions, registers, memory access,
I/O, ...
Enables implementations of varying cost and performance to
run identical software
The combination of the basic instruction set (the ISA) and
the operating system interface is called the application binary
interface (ABI)
ABI – The user portion of the instruction set plus the
operating system interfaces used by application programmers.
Defines a standard for binary portability across computers.

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Organization and Hardware

Organization: high-level aspects of a computer’s design


Principal components: memory, CPU, I/O, . . .
How components are interconnected
How information flows between components
e.g. AMD Opteron 64 and Intel Pentium 4: same ISA but
different organizations
Hardware: detailed logic design and the packaging
technology of a computer
e.g. Pentium 4 and Mobile Pentium 4: nearly identical
organizations but different hardware details

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Abstraction Layers

Application
Algorithm
Programming Language Parallel
computing,
Original Operating System/Virtual Machine security, …
domain of Domain of
Instruction Set Architecture (ISA)
the computer recent
architecture Microarchitecture computer
(‘50s-’80s) architecture
Gates/Register-Transfer Level (RTL)
(‘90s)
Circuits Reliability,
Devices power, …

Physics
Reinvigoration of
computer architecture,
mid-2000s onward.

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Historic Perspective of Computers

1944 – ENIAC (The Electronic Numerical Integrator and


Calculator)
The first completely electronic, operational general-purpose
machine built using vacuum tubes.
Designed and built by Eckert and Mauchly at the University of
Pennsylvania during 1943-45.
30 tons, 72 square meters, 200KW
Performance : Read in 120 cards per minute; Addition took
200 µs; Division 6 ms;
Application: Ballistic calculations

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Historic Perspective of Computers (2)

1947: William Shockley, John Bardeen and Walter Brattain of


Bell Laboratories invent the transistor.
1958: Jack Kilby and Robert Noyce invent the integrated
circuit. Kilby was awarded the Nobel Prize in Physics in 2000
for his work
1960-1970, Mainframe computers used primarily by large
organizations for critical applications, bulk data processing
such as census,, industry and consumer
– Large: size, power consumption, cooling

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Historic Perspective of Computers (3)


1970, Minicomputers, is a class of smaller computers that
developed in the mid-1960s and sold for much less than
mainframe and mid-size computers from IBM and its direct
competitors.
Minicomputers: machine initially focused on applications in
scientific laboratories, but rapidly branching out as the
technology of time-sharing - multiple users sharing a computer
interactively through independent terminals - became
widespread
1971, Intel introduced the first microprocessor, the Intel 4004.
1973: Robert Metcalfe, a member of the research staff for
Xerox, develops Ethernet for connecting multiple computers
and other hardware

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Historic Perspective of Computers (4)

1977 The birth of the first personal computer (PC) Apple


computer series
1981: The first IBM personal computer is introduced
– It uses Microsoft’s MS-DOS operating system. It has an
Intel chip, two floppy disks and an optional color monitor.
1990: Tim Berners-Lee, a researcher at CERN, the
high-energy physics laboratory in Geneva, develops HyperText
Markup Language (HTML), giving rise to the World Wide
Web.

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Historic Perspective of Computers (5)

1980s and 1990s - the introduction of many commercial


parallel computers with multiple processors.
Intel followed suit by introducing the first of the most popular
microprocessor, the 80x86 series.
PCs from Compaq, Apple, IBM, Dell, and many others, soon
became pervasive, and changed the face of computing
The number of processors in a single machine ranged from
several in a shared memory computer to hundreds of
thousands in a massively parallel system

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Classes of Computers

Desktop
Designed to deliver good performance to a single user at low
cost, usually; executing 3rd party software, incorporating a
graphics display, a keyboard, and a mouse
Servers
Used to run larger programs for multiple, simultaneous users
typically accessed only via a network and that places a greater
emphasis on dependability and (often) security. Examples: file
servers, web servers, database servers

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Classes of Computers (2)

Embedded
A computer inside another device used for running one
predetermined application
Supercomputers
A high performance, high cost class of servers with hundreds
to thousands of processors, terabytes of memory and petabytes
of storage that are used for high-end scientific and engineering
applications

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Embedded Processor Characteristics

The largest class of computers spanning the widest range of


applications and performance
Often have minimum performance requirements.
Often have stringent limitations on cost.
Often have stringent limitations on power consumption.
Often have low tolerance for failure.

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Post PC Era

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Components of a Computer

Same components for all kinds of computer: Desktop, server,


embedded

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Inside the Processor (CPU)

Datapath: performs operations on data


Control: sequences datapath, memory, . . .
Cache memory: Small fast SRAM memory for immediate
access to data

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Below Your Program

System software
Operating system – supervising program that interfaces the
user’s program with the hardware (e.g., Linux, MacOS,
Windows)
– Handles basic input and output operations
– Allocates storage and memory
– Provides for protected sharing among multiple applications
Compiler – translate programs written in a high-level language
(e.g., C, Java) into instructions that the hardware can execute

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Translating your program

Compiler

High-level language program


(in C)

Assembly language program


(for MIPS)

Assembler

Binary machine language program


(for MIPS)

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Computer Progress

The rapid rate of improvements in computers has come both


from:
Progress in computer technology
– Underpinned by Moore’s Law
Innovations in organization and design of computer machines

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Moore’s Law

In 1965, Intel’s Gordon Moore predicted that the number of


transistors that can be integrated on single chip would double
about every 18 months

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Moore’s Law (2)

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Computer Progress - Processor performance

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Computer Progress - Increased Clock Speed

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Computer Progress - Memory Capacity

Electronics technology continues to evolve, leading to


increased capacity and performance

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Processor-Memory Gap
Performance is not improving at the same rate

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Technology – dramatic change

Processor
logic capacity: about 30% per year
clock rate: about 20% per year until early 00’s
has now practically come to a halt
Memory
DRAM capacity: about 40% per year (2x every 2-3 years)
Memory speed: about 10% per year
Cost per bit: improves about 25% per year
Disk
capacity: about 40% per year (2x every 3 years)
Flash Storage
capacity: about 60% per year (2x every 2 years)

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Processor Performance

“Unmatched by any other industry”

John Crawford, Intel Fellow, 1993

Microprocessor performance growth in perspective:


Doubling every 18 months (1982-1996): total of 800X
– Cars travel at 70,000 km/h; get 7,000 km/l
– Air travel: L.A. to N.Y. in 22 seconds (MACH 800)
Doubling every 24 months (1970-1996): total of 9,000X
– Cars travel at 970,000 km/h; get 64,000 km/l
– Air travel: L.A. to N.Y. in 2 seconds (MACH 9,000)

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Performance Motivation
It is often helpful to have some way by which to compare
systems
During the design of new systems
During purchasing to compare between products
...

Which one would you choose?

Name INTEL CORE I7 4770K Name AMD FX 9590


Number of cores 4 Number of cores 8
Number of threads 8 Number of threads 8
Frequency 3.5 GHz Frequency 4.7 GHz
Turbo Frequency 3.9 GHz Turbo Frequency 5 GHz
Data width 64-bit Data width 64-bit
TDP 84 W TDP 220 W
Release June, 2013 Release July, 2013

Designing for Performance 3

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Performance Motivation (2)

It is often helpful to have some way by which to compare


systems
Example (from another field)
Passengers Speed (km/h)
Car 5 60
Bus 60 20
Which one has the best performance?

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

How do we define Performance?

Possible key metrics


The time to perform a task
(latency, response time, execution time, elapsed time)
The number of tasks completed per unit time
(throughput, bandwidth, execution rate)

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Performance - Car/Bus Example

Measure of performance
Latency: time to finish a fixed task
Throughput: number of tasks in fixed time
Example: Move people from A → B, a distance of 10km
Passengers Speed (km/h)
Car 5 60
Bus 60 20
What is the latency of the 1) Car, 2) Bus?
What is the throughput of the 1) Car, 2) Bus?

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Comparing Performance
System A is n times faster than B

Latency (B)
Latency (A) = (1)
n

Throughput(A) = Throughput(B) ∗ n (2)


System A is X% times faster than B

Latency (B)
Latency (A) = (3)
1 + X /100

Throughput(A) = Throughput(B) ∗ (1 + X /100) (4)

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Comparing Performance - Car/Bus Example

Task: Move people from A → B, a distance of 10km

Passengers Speed(km/h) Latency(min) Throughput(PPH)


Car 5 60 10 15
Bus 60 20 30 60

Latency? Car is times (and %) faster than the


bus
Throughput? Bus is times (and %) more
efficient than the car

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Response Time and Throughput

Response Time
Time between start and completion of a task, as observed by
the end user
Response Time =
CPU Time + Waiting Time (I/O, OS scheduling, etc.)
Throughput
Number of tasks the machine can run in a given period of time

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Response Time and Throughput (2)

Decreasing execution time improves throughput


Example: using a faster version of a processor
Less time to run a task => more tasks can be executed
Increasing throughput can also improve response time
Example: increasing number of processors in a multiprocessor
More tasks can be executed in parallel
Execution time of individual sequential tasks is not changed
But less waiting time in scheduling queue reduces response
time

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Defining Performance

For some program running on machine X

1
PerformanceX = (5)
ExecutionTimeX

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Relative Performance

X is n times faster than Y

PerformanceX ExecutionTimeY
= =n (6)
PerformanceY ExecutionTimeX

Problem: machine A runs a program in 10 seconds while


machine B runs the same program in 15 seconds. What is
their relative performance?

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Performance and Workloads

Latency (A) or Throughput (A) means nothing


these must be associated with some task - workload
Workload - set of tasks someone cares about
car/bus example: Task - drive people 10 km
example: A processor executes some program

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Performance and Workloads (2)

Benchmarks: standard workloads


Used to compare performance across machines
Either are or highly representative of actual programs people
run
Micro-benchmarks: non-standard non-workloads
Tiny programs used to isolate certain aspects of performance
Not representative of complex behaviors of real applications

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

SPEC Benchmarks
SPEC (Standard Performance Evaluation Corporation)
http://www.spec.org/
Consortium that collects, standardizes, and distributes
benchmarks
Post SPECmark results for different processors
1 number that represents performance for entire suite
Benchmark suites for CPU, Java, I/O, Web, Mail, etc.
Updated every few years
SPEC CPU 2006
12 “integer”: bzip2, gcc, perl, hmmer (genomics), h264, . . .
17 “floating point”: wrf (weather), povray, sphynx3 (speech)
...
SPEC CPU 2017
2 “integer” suites: latency vs. throughput
2 “floating point” suites: latency vs. throughput

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Measuring Execution Time

Real Elapsed Time


Counts everything (Waiting time, Input/output, disk access,
OS scheduling, . . . )
a useful number, but often not good for comparison purposes
CPU Time
Time spent while executing the program instructions (lines of
code that are ”in” our program)
can be broken up into system time, and user time
Can be measured in seconds, or can be related to number of
CPU clock cycles

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Clock Cycles

The operations of computer hardware is governed by a


constant rate clock

Clock period/cycle: duration of a clock cycle


Clock frequency/rate: cycles per second
1
Clock cycle =
Clock rate

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Clock Cycles

We often use clock cycles to report CPU execution time

CPU Time
= CPU cycles X Clock cycle time (7)
CPU cycles
=
Clock rate

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Improving Performance

To improve performance, we need to


Reduce number of clock cycles required by a program, or
Reduce clock cycle time (increase the clock rate)

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Clock Cycles per Instruction (CPI)

Instructions take different number of cycles to execute, e.g.,


multiplication takes more time than addition
floating point operations take longer than integer ones
accessing memory takes more time than accessing registers

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Performance Equation

To execute, a given program will require ...


Some number of machine instructions
Some number of clock cycles
We can relate CPU clock cycles of a program to the
instruction count

CPU cycles = Instruction count x CPI (8)

Performance Equation:

Execution Time = Instruction count x CPI x Cycle time (9)

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Clock Cycles per Instruction (CPI)

CPI is an average number of clock cycles per instruction


a way to compare two different implementations of the same
ISA
instruction class
A B C
CPI 1 2 3

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Determining the CPI

Different types of instructions have different CPI


Let CPIi = clocks per instruction for class i of instructions and
Let Ci = instruction count for class i of instructions
n
X
CPU cycles = CPIi x Ci (10)
i=1

Designers often obtain CPI by a detailed simulation


Hardware counters are also used for operational CPUs

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

What factors affect Execution time?

Execution Time = Instruction count * CPI * Cycle time


Depends on
Algorithm: affects Instruction count, possibly CPI
Programming language: affects Instruction count, CPI
Compiler: affects Instruction count, CPI
Instruction set architecture: affects Instruction count, CPI,
cycle time
Processor organization: affects CPI, cycle time

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Instruction Rate as a Performance Measure

instruction execution rate can also be used as a performance


metric
Millions of instructions per second (MIPS)
Millions of floating point instructions per second (MFLOPS)
Faster machine implies larger MIPS or MFLOPS
Heavily dependent on instruction set, compiler design,
processor implementation, cache and memory hierarchy

CSC 213: Computer Architecture


Architecture History Types of Computers Structure Technology Trends Performance

Performance Summary

Performance is specific to a particular program


Any measure of performance should reflect execution time
Total execution time is a consistent summary of performance
For a given ISA, performance improvements come from
Increases in clock rate (without increasing the CPI)
Improvements in processor organization that lower CPI
Compiler enhancements that lower CPI and/or instruction
count
Algorithm/Language choices that affect instruction count

CSC 213: Computer Architecture

You might also like