Computer Arichitecture

Computer Architecture
2 1
2 0
J T
Lecture 01 - Introduction U
X@
n
j u Re Pengju Ren
n g of Artificial Intelligence and Robotics
Pe Xi’an Jiaotong University
Institute
http://gr.xjtu.edu.cn/web/pengjuren
Course Administration
Instructor: Pengju Ren
TA: Gelin Fu （Ph.D Candidate)
Lectures: Two 100-minute lectures a week
2 1
2 0
Textbook: Computer Architecture: A Quantitative Approach
6th Edition(2019)J T U
@X
Prerequisite: Digital System Structure and Design
n
j u Re
ng
Pe
2
Preface
“The most beautiful thing we can experience is the

1
mysterious. It is the source of all true art and Science.”
2
0 I believe, 1930
---Albert Einstein,2What
J T U
@ X
n
j u Re
ng
Pe
3
What is Computer Architecture
Application
2 1
2 0 computer
In its broadest definition,
architecture Tis U
the design of the
Gap too large to X J
abstraction/Implementation layers
@ us to implement information
thatnallow
bridge in one step
R e
processing applications efficiently
gj u using available manufacturing
e n technologies.
P
Physics
4
Application
Algorithm 2 1
Programming Language 20
Operating System/Virtual Machines J TU
Gap too largeSet
Instruction to Architecture (ISA)
@X
bridge in one step
e n
Microarchitecture
j u R
n g
Register-Transfer Level (RTL)
e
Gates
PCircuits
Devices
Physics
5
Application
Algorithm 2 1
Programming Language 2 0
Operating System/Virtual Machines J TU
Gap too largeSet
@X
bridge in one step
e n
Microarchitecture
j u R This course
n g
e
Gates
PCircuits
Devices
Physics
6
Application Application Requirement:
Algorithm 2 1
 Suggest how to improve
architecture
 Provide revenue to fund
Operating System/Virtual Machines J T U

development
Gap too largeSet

@X
bridge in one step
e nArchitecture provides feedback to
Microarchitecture
j u R guide application and technology
research directions
n g
e
Gates
PCircuits Technology Constraints:
 Restrict what can be done efficiently
Devices  New technologies make new arch
Physics possible
7
Computing Devices Then…
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
EDSAC, University of Cambridge, UK, 1949

8
Computing Devices Now
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
Modern computing is as much
about enhancing capabilities as
data processing!
9
Architecture continually changing
Applications suggest Improved

technologies make
how to improve
Applications 2 1
new applications
technology, provide
2 0 possible
revenue to fund
J T U
development
@ X
n
Technology
e
j u R
ng
Pe Cost of software development
makes compatibility a major
force in market
10
Single-Thread(Sequential) Processor Performance
Mulitcore or ManyCore
[ Hennessy & Patterson, 2017 ]

2 1
2 0
J T U
@X
n
j u Re
ng
Pe
RISC
11
Moore’s Law Scaling with Cores
2 1
2 0
J TU
@X
n
j u Re
ng
Pe
12
Global Semiconductor Market
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
The global semiconductor market is estimated at $450 billion USD in revenue for 2020.
Products using these semiconductors represent global revenues of $2 trillion USD, or
around 3.5% of global gross domestic product (GDP)
ISSCC2021‘Feb 13
Advanced Tech nodes continue provide value
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
Steady progress in two-dimensional transistor scaling and a variety of device
enhancement techniques have sustained energy-efficiency improvement and device
density gains from one technology generation to the next
ISSCC2021‘Feb 14
Upheaval in Computer Design
• Most of last 50 years, Moore’s Law ruled
– Technology scaling allowed continual performance/energy
improvements without changing software model
2 1
• Last decade, technology scaling slowed/stopped 2 0
J U
– Dennard (voltage) scaling over (supplyTvoltage ~fixed)
@ X
– Moore’s Law (cost/transistor) over?
e
– No competitive replacementn for CMOS anytime soon
u R
– Energy efficiency constrains
j everything
n
• No “free lunch” g for software developers, must
P
consider:
e
– Parallel systems
– Heterogeneous systems
15
Today’s Dominant Target Systems
• Mobile (smartphone/tablet)
– >1 billion sold/year
in system-on-a-chip (SoC)
2 1
– Market dominated by ARM-ISA-compatible general-purpose processor
2 0
– Plus sea of custom accelerators (radio, image, video, graphics, audio,
motion, location, security, etc.)
J T U
• Warehouse-Scale Computers (WSCs)
@X
–
n
100,000’s cores per warehouse
–
–
j u Re
Market dominated by x86-compatible server chips
Dedicated apps, plus cloud hosting of virtual machines
– ng
Now seeing increasing use of GPUs, FPGAs, custom hardware to
e
P
accelerate workloads
• Embedded computing
– Wired/wireless network infrastructure, printers
– Consumer TV/Music/Games/Automotive/Camera/MP3
– Internet of Things!
16
Course Content Computer Architecture
• Instruction Level Parallelism
– Superscalar
– Very Long Instruction Word (VLIW)
2 1
• Advanced Memory and Caches
2 0
• Data Level Parallelism
J T U
– Vector Machine
@X
– GPU
en
j u R
• Thread Level Parallelism
g
– Multithreading
n
e
– Multiprocessor/Multicore/ManyCore
P
• Warehouse-Scale Computers (Request Level Paral.)
• Domain-Specific Architectures (DNN Accelerator)
Intel Nehalem Processor, Core i7

17
Architecture vs. Microarchitecture
“Architecture”/Instruction Set Architecture:

Programmer visible state (Memory & Register)
2 1
0
Operations (Instructions and how they work)
2
Execution Semantics (interrupts)
Input/Output J T U
Data Types/Sizes @ X
e n
Microarchitecture/Organization:
j u
Tradeoffs on how
R
to implement ISA for some metric
g
n Cost)
(Speed,eEnergy,
P Pipeline depth, number of pipelines, cache
Examples:
size, silicon area, peak power, execution ordering, bus
widths, ALU widths
18
Same Architecture Diff Micro-Architecture
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
19
Diff Architecture Diff Micro-Architecture
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
20
Where do Operands come from and
Where do Results Go ?
Processor
2 1
20
J TU
ALU
@X
n
j u Re
ng
Pe
MEMORY
21
Stack Accumulator Reg-Mem Reg-Reg
Processor
Processor
Processor
Processor
2 1
2 0
ALU ALU
J T U
ALU ALU
@X
n
j u Re
ng
MEMORY
MEMORY
MEMORY
MEMORY
… … … …
Pe
22
Processor
Processor
Processor
Processor
2 1
2 0
ALU ALU
J T U
ALU ALU
@X
n
j u Re
ng
MEMORY
MEMORY
MEMORY
MEMORY
… … … …
Pe
Number Explicitly
0 1 2 or 3 2 or 3
Named Operands
23
Stack-Based Instruction Set Architecture(ISA)
Stack
Burrough’s B5000 (1960)
Processor
•  Burrough’s B6700
2 1
•  HP 3000
2 0
ALU
•  ICL 2900
J T U
@X
•  Symbolics 3600
•  Inmos Transputer
n
j u Re
Modern
•  Forth machines
ng
MEMORY
…
Pe •  Java Virtual Machine
•  Intel x87 Floating Point Unit
24
Evaluation of Expressions
2 1
20
J T U
@X
n
j u Re
ng
Pe
25
2 1
20
J T U
@X
n
j u Re
ng
Pe
26
2 1
20
J T U
@X
n
j u Re
ng
Pe
27
2 1
20
J T U
@X
n
j u Re
ng
Pe
28
2 1
20
J T U
@X
n
j u Re
ng
Pe
29
2 1
20
J T U
@X
n
j u Re
ng
Pe
30
2 1
20
J T U
@X
n
j u Re
ng
Pe
31
2 1
20
J T U
@X
n
j u Re
ng
Pe
32
Hardware Organization of the Stack
2 1
Stack is part of the processor state 2 0
T
stack must be bounded and small
J U
X
≈ number of Registers, not the size of main memory
@
n
Conceptually stack is unbounded
j u Re
a part of the stack is included in the
ng
processor state; the rest is kept in the main memory
Pe
33
Stack Operations/Implicit Memory References
1
Suppose the top 2 elements of the stack are kept in registers and
2
the rest is kept in the memory.
2 0
Each push operation 1 memory
J T U reference
pop operation 1@ X
memory reference
e n
u R
Better performance byjkeeping the top N elements in registers, and
g
memory referencesnare made only when register stack overflows or
underflows. P e
34
Stack Size and Memory References
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
Four Store and Fetch
35
Stack Size and Memory References
2 1
20
J TU
@X
n
j u Re
ng
Pe
36
Processor
Processor
Processor
Processor
2 1
2 0
ALU ALU
J T U
ALU ALU
@X
n
j u Re
ng
MEMORY
MEMORY
MEMORY
MEMORY
… … … …
Pe
Push A Load R1, A
Load A Load R1 A
C= A+B Push B Load R2, B
Add B Add R3 R1, B
Add Add R3, R1, R2
Store C Store R3, C
Pop C Store R3, C 37
Classes of Instructions
•  Data Transfer
2 1
– LD, ST, MFC1, MTC1, MFC0, MTC0
•  ALU 2 0
J TU
– ADD, SUB, AND, OR, XOR, MUL, DIV, SLT, LUI
•  Control Flow X
@ERET
•  Floating Point Re
n
– BEQZ, JR, JAL, TRAP,
g j u
– ADD.D, SUB.S, MUL.D, C.LT.D, CVT.S.W,
e n (SIMD)
•  Multimedia
P SUB.PS, MUL.PS, C.LT.PS
– ADD.PS,
•  String
– REP MOVSB (x86)
38
Addressing Modes: How to get operands from
Memory (MIPS)
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
** May not actually access memory 39

ISA Encoding
2 1
20
J TU
@X
n
j u Re
ng
Pe
40
Case study: X86(IA-32) Instruction Encoding
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
41
RISC-V Instruction Encoding(1)
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
42
RISC-V Instruction Encoding(2)
New open-source, license-free

ISA spec
2 1
• Supported by growing shared 2 0
software ecosystem
J T U
• Appropriate for all levels of
@X
computing system, from n
microcontrollers to
j u Re
supercomputers
ng
Pe
• 32-bit, 64-bit, and 128-bit
variants (we’re using 32-bit in
class, textbook uses 64-bit)
http://www-inst.eecs.berkeley.edu/~cs61c/fa18/img/riscvcard.pdf 43
Real World Instruction Sets
2 1
20
J T U
@X
n
j u Re
ng
Pe
44
Why the Diversity in ISAs?
Application Influenced ISA

•  Instructions for Applications
–  DSP instructions 2 1
•  Compiler Technology has improved 2 0
J TU
–  SPARC Register Windows no longer needed
@ X
–  Compiler can register allocate effectively
n
eISA
Technology Influenced
j u R
g
•  Storage is expensive, tight encoding important
n Set Computer
P e
•  Reduced Instruction
–  Remove instructions until whole computer fits on die
•  Multicore/Manycore
–  Transistors not turning into sequential performance
45
Recap
Application
Algorithm 2 1
ISA vs Micro-Architecture
ISAUCharacteristics
Operating System/Virtual Machines
XJ T
• Machine Models
Gap too largeSet
to Architecture (ISA)
Instruction
n @ • Encoding
bridge in one step
Microarchitecture
R e
g j u
• Data Types
e n
Gates • Instructions
PCircuits • Addressing Modes
Devices
Physics
46
And in conclusion …
• Computer Architecture >> ISAs and RTL
• Computer Architecture is about interaction of
hardware and software, and design of appropriate 2 1
abstraction layers 2 0
J TU
X
• Computer architecture is shaped by technology and
@
applications
– History provides lessonsR
e n
for the future
• Computer Science j u
g at the crossroads from sequential
e n
to parallelPcomputing
– Salvation requires innovation in many fields, including computer
architecture
• Read Chapter 1 & Appendix A for next time! (6th)
47
2 1
0 2
Next Lecture： RISC-V ISA, Datapath
T U & Control
XJ
@
(ISA and Micro-Architecture)
n
R e
gj u
n
Pe
48
Acknowledgements
• Some slides contain material developed and copyright by:
– Arvind (MIT)
– Krste Asanovic (MIT/UCB)
2 1
– Joel Emer (Intel/MIT)
20
– James Hoe (CMU)
J T U
–
–
David Patterson (UCB)
@ X
David Wentzlaff (Princeton University)
en
u
• MIT material derived
j Rfrom course 6.823
g from course CS252 and CS 61C
• UCB materialnderived
Pe
49
Idealized Uniprocessor Model
Processor names variables:
– Integers, floats, pointers, arrays, structures, etc.
2 1
0
– These are really words, e.g., 64-bit double, 32-bit INTs, bytes, etc.
2
Processor performs operations on those
J T U variables:
– Arithmetic, logical operations, etc.X
n @on values in registers
e
– Only performs these operations
R
Processor controls the
g j u order, as specified by program
– Branches(if), e n
loops(while), function calls, etc.
Idealized Cost
P
– Each operation has roughly the same cost: add, multiply, etc.
50
2 1
Six Great Ideas in Computer U 20
Architecture
XJ T
n @
j u Re
ng
Pe
51
New School Machine Architecture
• Parallel Requests
Software Hardware
2 1
Assigned to computer Harness
Warehouse
Scale
2 0 Smart
Phone
e.g., Search “Cats” Parallelism &
J T
Computer
U
• Parallel Threads
Assigned to core
Achieve High
@X Computer
e.g., Lookup, Ads
Performance
en Core … Core
• Parallel Instructions
j u R Memory (Cache)
n g
>1 instruction @ one time
Input/Output
Instruction Unit(s)
Core
Functional
• Parallel Data
e
e.g., 5 pipelined instructions
P Unit(s)
A0+B0A1+B1A2+B2A3+B3
>1 data item @ one time Main Memory
e.g., Add of 4 pairs of words Logic Gates
• Hardware descriptions
All gates working in parallel at same time
52
Great Idea #1: Abstraction
(Levels of Representation/Interpretation)
High Level Language temp = v[k];

2 1
Program (e.g., C)
Compiler
v[k] = v[k+1];
2 0
Assembly Language
J
lw T U
v[k+1] = temp; Anything can be represented
$t0, 0($2) as a number,
Program (e.g., RISC-V)
Assembler
@X
lw
sw
sw
$t1, 4($2)
$t1, 0($2)
$t0, 4($2)
i.e., data or instructions
Machine Language
n
Re
1000 1101 1110 0010 0000 0000 0000 0000
Program (RISC-V)
Machine
j u 1000 1110 0001 0000 0000 0000 0000 0100
ng
Interpretation 1010 1110 0001 0010 0000 0000 0000 0000
Pe
Hardware Architecture Description 1010 1101 1110 0010 0000 0000 0000 0100
(e.g., block diagrams)
Architecture
Implementation
Logic Circuit Description
(Circuit Schematic Diagrams)
53
Great Idea #2: Moore’s law
(Technique driven development)
2 1
20
J T U
@X
n
j u Re
ng
Pe
54
Great Idea #3: Principle of Locality/
Memory Hierarchy
2 1
2 0
J T U
@X
n
j u Re
ng
Pe
55
Great Idea #4: Parallelism
2 1
20
J TU
@X
n
j u Re
ng
Pe
56
2 1
20
J TU
@X
n
j u Re Gene Amdahl
ng
Pe
Computer Pioneer
Amdahl’s Law
57
Great Idea #5: Performance Measurement
and Improvement
• Matching application to underlying hardware

to exploit: 2 1
0 2
– Locality
J T U
– Parallelism
@X
n
– Special hardware features, like specialized instructions
j u Re
(e.g., matrix manipulation)
• Latency
e ng
P
– How long to set the problem up
– How much faster does it execute once it gets going
– It is all about time to finish
58
Great Idea #6: Dependability via Redundancy
• Redundancy so that a failing piece doesn’t make the whole 2 1

system fail 2 0
J T U
@
1+1=2X 2 of 3 agree
n
j u Re
e ng
P
1+1=2 1+1=2 1+1=1 FAIL!
Increasing transistor density reduces the cost of redundancy
59
Great Idea #6: Dependability via Redundancy
• Applies to everything from datacenters to 2 1

storage to memory to instructors 2 0
J T U
@X
– Redundant datacenters so that can lose 1
datacenter but Internet service stays online
n
j u Re
– Redundant disks so that can lose 1 disk but not lose
data (Redundant Arrays of Independent Disks/RAID)
ng
Pe
– Redundant memory bits of so that can lose 1 bit but
no data (Error Correcting Code/ECC Memory)
60

Computer Arichitecture

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Arichitecture

Uploaded by

Copyright:

Available Formats

Computer Architecture

“The most beautiful thing we can experience is the

Application Application Requirement:

Operating System/Virtual Machines J T U

Gap too largeSet

EDSAC, University of Cambridge, UK, 1949

Applications suggest Improved

[ Hennessy & Patterson, 2017 ]

Intel Nehalem Processor, Core i7

“Architecture”/Instruction Set Architecture:

** May not actually access memory 39

New open-source, license-free

Application Influenced ISA

High Level Language temp = v[k];

• Matching application to underlying hardware

• Redundancy so that a failing piece doesn’t make the whole 2 1

• Applies to everything from datacenters to 2 1

You might also like