Professional Documents
Culture Documents
Organization
Computer organization and design by
Hennessy and Patterson, 5th edition
Text book
Class timings
Lab timings
Office hours- All visits to my office should be
during office hours only. That time is
allocated for you. Rest of my time is for
other work.
Grading Policy
Final 50%
Mid term 20%
Quizzes = 10%
Class participation and behavior = 10%
Attentive in class
Following instructions
Being on time, late comers would loose marks
Being active in class
No disruptions, talking or other insensitive
behavior
Assignments = 10%
Class participation Task 1
Buy the book and should bring the
book in the next class
Loose 1 mark every time if not
present with the book, notes register,
and pen /pencil/ballpoint
Assignments do not copy. Exam and
quizzes would be from that material
Late submissions would be acceptable for
the trash bin
4-bit accumulator
architecture
8m pMOS
2,300 transistors
3 x 4 mm2
750kHz clock
8-16 cycles/inst.
Hardware
Team from IBM building PC
prototypes in 1979
Motorola 68000 chosen
initially, but 68000 was late
8088 is 8-bit bus version of
8086 => allows cheaper system
Estimated sales of 250,000
100,000,000s sold
!
LP
e ,I
war
ft
so
a nd
e
war
rd
Ha
Intel
Intel cancelled
cancelled high
high
as ed performance
performance uniprocessor,
uniprocessor,
re b joined
joined IBM and Sun
IBM and Sun for
for
ar dwa
H multiple processors
multiple processors
Processor Technology Trends
Performance: 1.15x
Frequency: 1.05x
Power: 1.04x
2004 2010
Source: Micron University Symp.
What you will learn
How programs are translated into the machine language
And how the hardware executes them
Problem:
machine A runs a program in 20 seconds
machine B runs the same program in 25 seconds
Clock Cycles
Instead of reporting execution time in seconds, we often
use cycles
seconds cycles seconds
program program cycle
Clock period
Clock (cycles)
Data transfer
and computation
Update state
4th
5th
6th
...
time
Why? hint: remember that these are machine instructions, not lines of C code
Different numbers of cycles for different instructions
time
ones
accessing registers
Example:
"Suppose a program runs in 100 seconds on a
machine, with
multiply operations responsible for 80 seconds of this
time. How much do we have to improve the speed of
multiplication if we want the program to run 4 times
faster?"
How about making it 5 times faster?
Amdahls Law
1. Speed up = 4
2. Old execution time = 100
3. New execution time = 100/4 = 25
4. If 80 seconds is used by the affected part =>
5. Unaffected part = 100-80 = 20 sec
6. Execution time new = Execution time unaffected +
Execution time affected / Improvement
7. 25= 20 + 80/Improvement
8. Improvement = 16
Example: Speed up using parallel processors
Suppose an application is almost all parallel: 90%.
What is the speedup using 10, 100, and 1000 processors?
1
Speedup 1.56
0.4
0.6
10
Example
Implementations of floating point square root vary
significantly in performance. Suppose FP square root
(FPSQR) is responsible for 20% of the execution time of
a critical benchmark on am machine, One proposal is to
add FPSQR hardware that will speed up this operation
by a factor of 10. The other alternative is just to try to
make all FP instructions run faster; FP instructions are
responsible for a total of 50% of the execution time.
The design team believes that they can make all FP
instructions run two times faster with the same effort
as required for the fast square root. Compare those two
design alternatives. 1
Speedup FPSQR 1.22
0.2
(1 0.2)
10
1
SpeedupFP 1.33
0.5
(1 0.5)
2
Benchmarks
Performance best determined by running a real
application
Use programs typical of expected workload
Or, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics,
etc.
Small benchmarks
nice for architects and designers
easy to standardize
can be abused
SPEC (System Performance Evaluation Cooperative)
companies have agreed on a set of real program and inputs
valuable indicator of performance (and compiler
technology)
can still be abused
Benchmark Games
An embarrassed Intel Corp. acknowledged Friday that a
bug in a software program known as a compiler had led
the company to overstate the speed of its microprocessor
chips on an industry benchmark by 10 percent. However,
industry analysts said the coding errorwas a sad
commentary on a common industry practice of
cheating on standardized performance testsThe error
was pointed out to Intel two days ago by a competitor,
Motorola came in a test known as SPECint92Intel
acknowledged that it had optimized its compiler to
improve its test scores. The company had also said that
it did not like the practice but felt to compelled to make
the optimizations because its competitors were doing the
same thingAt the heart of Intels problem is the practice
of tuning compiler programs to recognize certain
computing problems in the test and then substituting
special handwritten pieces of code
Saturday, January 6, 1996 New York Times
SPEC CPU2000
Problems with Benchmarking
compile
unoptimized IR
Constant folding
cfold
Constant propagation
constprop
Dead code elimination
dce
Compiler and Performance
Scary Stuff
Op Frequency Cycle Count
ALU 43% 1
Load 21% 1
Store 12% 2
Branch 24% 2
CPI IC
Store operations
CPI to 1 with
original
i 1
Instruction _ Count
a
i
CPIcost
i
n
i
ofiICslowing
our
Instruction _ Count
clock by15%. Is this new design feasible? i 1
Example(Contd.)
IC CPI
CPUtime
Clock _ Rate
n
CPUtime
i 1
CPI i IC i Clock _ Cycle _ Time
n
CPI i ICi
n
IC i
overall _ CPI i 1
CPI i
Instruction _ Count i 1 Instructio n _ Count
Exercise
Suppose we have made the following measurements:
Frequency of FP operations = 25%
Average CPI of FP operations = 4.0
Average CPI of other instructions = 1.33
Frequency of FPSQR = 2%
CPI of FPSQR = 20
CPI overall _ for _ new _ FPSQR CPI original CPI Saved _ on _ FPSQR 2 0.36 1.64
Hardware Software
Advantages Speed, Flexibility,
Consistency easier/faster design
lower cost of errors
Disadvantages Cost
No flexibility Slowness
Remember
Performance is specific to a particular program
Total execution time is a consistent summary of performance