You are on page 1of 6

Teaching Computer Architecture/Organisation using simulators

Herbert Grunbacher
Vienna University of Technology
Treitlstrasse3/182-2, A- 1040 Vienna /Austria
E-mail hgQv1sivie.tuwien.ac.at

Abstract
Introduction
Experience shows, that many students, especially Teaching the dynamics of pipelines and caches is
those with little hardware background, encounter rather difficult if done on a paper and pencil basis. In
dijEculties in understanding the consequences and even our experience students find it difficult to understand
concepts of conventional instruction pipelining; the principles and complications of pipelines and to a
superscalar in5 truction processing is even more lesser extend of caches. To support teaching and give
complicated and harder to understand. It is particularly students an environment to experiment, we developed
difJicult to statically teach the concept of a pipeline. several pipeline simulators and a cache simulator.
Therefore we developed software to simulate and My experience is that students appreciate using
dynamically visualize the processing of instructions by simulators and by using them get easily introduced to
pipelined (superscalar) processors. Three simulators the subject. Based on the knowledge gained from using
have been developed: the simulators they are motivated to further study the
WinDLX is based on Hennessy/Pattersons DLX subject using books.
architecture and is modeled at the architecture Almost all of our students have their private PCs and
level, ther(efore very little processor-internal most of them run Windows95mT. This was the main
information is given. reason why we develped the simulators to run under MS
MIPSim is based on PattersordHennessy's MIPS Windows. It turned out that students particuarly like to
processor book and is modeled at the computer work at home and they are usually well prepared to ask
organization level, functional units like register questions in class.
file, pipeline registers, multiplexers are visible and
MIPSim displays content and dynamic behavior of WinDLX
such units.
MIOkSim is based on the MIPS RIO000 WinDLX is a MS-Windows (16 bit) based pipeline
architecture and models the instruction decode and simulator for the DLX processor as described in [l].
dispatch unit, the branch unit, the instruction DLX is modeled at the architecture level, very little
queues and the functional units (address about the underlying computer organization is know at
calculation, both AL Us, floating-point adder, that level.
floating-point multiply/divide/square-root unit). After loading a symbolic DLX assembler code, most
Concepts like register renaming, branch history of the information relevant to the CPU (pipeline,
table, branch resume bufSer, out of order execution registers, YO, memory, ...) can be viewed and modified
can be expbiined easily using the simulator. while executing the code step-by-step or continuously.
Teaching cache organization is an easier task, WinDLX offers statistics about pipeline behavior in
nevertheless visualising cache activities helps time.
understanding the dynamics of a cache memov, WinDLX works with several configurations:
Xcache is a sinzulator which displays the interactions Structure (number of floating point functional units) and
between instruction memov and instruction cache, data latency of the floating point can be changed.
memory and data cache, respectively. Forwarding can be enableddisabled and memory size
can be modified. There is extensive online help
The simulator atre availablefor p e e downloading>om available to explain the simulator and the internals of
http://www.vlsivie.tuwien.ac.at/CompArch DLX.
"Register", "Code", "Pipeline", "Clock Cycle
Diagram", "Statistics" and "Breakpoints" windows show
internals of the pipeline. Further explanation is given
below.
$*,'b*~%,*

0-7803-4762-5/98/$10.0001998 IEEE
. BQ
$"b;
0. 1998 FIE Conference 98CH36214
E
",,,&""ip 1107
Register Pipeline Clock Cycle Statistics Breakpoints
Diagram

Figure I Main Window with open Code Window

Code Window Clock Cycle Diagram Window

The code window displays a three column Figure 2 - the cycle diagram window - shows the
representation of the memory: address (symbolic or in timing behavior of the pipeline. The simulation shown
hex), the machine code in hex and the assembler is in the 4" cycle, the first command is in the MEM
command. Figure 1 shows the main simulation stage, the second in intEX and the fourth in IF. The
window with a code segment in the open Code third command, however, is denoted as "aborted".
Window. Color coding in the different simulation This is because the second command, jal, is an
windows is consistent, e.g. WB (Write Back) is unconditional branch. This is known after the 3rd
colored in blue. Double-clicking on instructions in cycle, when jal has been decoded. During this cycle
any of the simulation windows displays pipeline status the command movi2fp (following after jal) has
information in text form giving details about internal already been fetched, but the next executed command
registers, operations, stalling and forwarding status. will be at another address. Therefore the execution of
movi2fp must be aborted, leaving a "bubble" in the
Pipeline Window pipeline.
The branch address of jal is named
The pipeline window shows the inner structure of "InputUnsigned". By clicking Memory/Symbols in
the DLX processor - the five pipeline stages of the the main window, the correspondence between the
DLX processor and the floating point units (addition / used symbols and the actual addresses is shown,
subtraction, multiplication and division).

Instructions / Cycles

addi r l ,r0,0x1000

ial InputUnsigned
m o v i 2 f p f10,rl

sw SaveR2[rO],r2

Figure 2. Clock Cycle Diagram


.
I
0-7803-4762-5/98/$10.00 01998 IEEE 1998 FIE Conference E
98CH36214
E
E
1108
Breakpoint, Register and Statistics Window Control / Data Flow Signals

Setting breakpoints stops the simulation at user After executing the program code data path and
defined points. control signal can be displayed by clicking on them.
The register vvindow shows all registers, not just The instruction content of the different pipeline stages
the register file, and their content in hex. is displayed on top of each stage.
This statistics window provides information about Extensive help as well as a introductory tutorial is
general aspects (e.g. number of simulation cycles), the available online.
hardware configuration used in the simulation, stalls
and their causes, conditional branches, load-/store- MlOKSim
instructions, floating point stage instructions and
traps. Usually, absolute count of events and The RlOOOO is a dynamic superscalar
percentage are given, e.g. "RAW stalls: 17 (7.91 % of microprocessor which implements the 64-bit Mips
all cycles)". Instruction Set Architecture [3], [4].It fetches and
The statistics window is very useful to compare decodes four instructions per cycle and dynamically
the effects of changes in the pipeline configuration. issues them to five fully-pipelined low-latency
execution units. Instructions can be fetched and
MIPSim executed speculatively beyond branches. Instructions
graduate in order upon completion. Although
MIPSim is a pipeline simulator for the MIPS execution is aggressively out-of-order, the processor
processor as described in [2]. MIPS is modeled at the still provides sequential memory consistency and
computer organization level. Functional units like precise exception handling.
register files, pipeline registers, &U, multiplexers,
data and control l-low are visible. Model of the RlOOOO
The user can write small programs (currently there
is only a sub!jet of the MIPS instruction set Our RlOOOk model concentrates on the most
implemented) and watch the pipeline doing its work, important issues of a superscalar architecture and we
modify the program and the content of data memory wanted to have an easy to learn not to complex user-
and register file 'on the fly' and go on simulating to interface. The following parts of the processor are
see the effects. modelled:
At present MIPSim models a rather simple Instruction decode and dispatch unit, responsible
pipeline without hazard detection and forwarding for instruction fetching, instruction decoding, register
units. renaming and finally dispatching the instruction to the
appropriate queues. The dispatcher works together
Assembler Program / Instruction Memory Content with the branch unit when predicting the outcome of
conditional branches. During this process they need to
In the very kft window in Figure 3 the program access the branch history table and the branch
code is shown. The program can be executed in single resume buffer, which therefore are also simulated. As
step or running mode. By setting the pointer (in soon as instructions are being dispatched to the
essence the program counter) to a particular address, queues they are also given an entry in the active list,
manual jumps in the program can be accomplished. which also is part of our simulation.
By double clicking on the Instr. box a window opens All of the R1000Os instruction queues, namely an
in which modifications of the instruction memory address queue, an integer queue and a floating-point
content (the program) can be done. queue are included in the simulation. To be able to
determine, which operand results are ready, they
Data Memory Content access the also simulated busy table.
The remaining parts of the simulation are the five
By double clicking on the Data box a window opens. functional execution units, the address calculation
Modifications (overwriting) of the data memory unit, both ALUs, the floating-point adder unit and the
content can be done interactively. floating-point multiply/divide/square-root unit.
Modifying the content of instructioddata memory is Data is read from and written to memory, which can
very valuable for experimenting with the pipeline, e.g. be viewed and modified during the simulation.
to show data ha2:ards. The memory is simplified and it is assumed to be
accessible without any delay. Exception handling is
not implemented. The functional units simulate
latencies and repeat rate correctly, but the internal
pipeline structure is not visible as in MIPSim. Only a
reduced set of instructions is implemented.
,,*-+:$'

0-7803-4762-5/98/$10.000 1998 IEEE


:nu"
$oobj 1998 FIE Conference 98CH36214
4%
*"
$
.:,' 1109
I
0-7803-4762-5/98/$10.00 01998 IEEE 1998 F E Conference E
98CH36214
E
E
1110
I

98CH36214
The MlOkSimulator Extensive help and an introduction how to use the
simulator is available online.
The left windows in Figure 4 shows the assembler
window and the active list window. The block diagram Xcache
on the right side of the windows main screen shows the
main components of the simulator. XCache simulates and visualizes the behaviour of a
During a simulation run, the instructions, cache on a step-by-step basis rather than performing
represented as small balls in different colours, "wander" statistical evaluations.
along the connections between these elements, thus However, to enable advanced cache performance
demonstrating their flow through the superscalar analysis, an interface to Mark D. Hill's cache simulator
instruction pipeline. When an instruction reaches a DINERO was incorporated.
XCache used the same format for the memory
processor's unit, such as a queue or a functional unit, its
"ball" representation disappears and the unit takes over reference pattersn DINERO. The user may specify
the display of the instruction. cache parameters like associativity, size, etc., then load
Clicking on the Queue, Register and Data Cache an input stream and watch the cache at work.
boxes displays the content ot the respective functional Alternatively, it is possible to define the command-line
parameters for DINERO, run a simulation with it and
units.
view the results.

Figure 5 Xcache Main Window

Acknowledment the pipeline. Another point which is attractive to


The simualtors have been developed as part of diploma students is that they can work at home and this is also
thesis at our department. Special thanks go to G. Raidl, helpful to the University as is reduces the load on our
M. Frigeri, G. Gridling, Ch. Fuss and J. Silhan. computers.
References
Implementation
[I] D.A.Patterson / J.L. Hennessy: Computer Architecture -
The simulators have been written in C++. We make the A Quantitative Approach, Morgan Kaufmann
source code available provided that we get the modified Publishers, San Mateo, Califomia, 1990
sources in return and that the executables are made [ 2 ] J.L. Hennessy / D.A. Patterson: Computer Organization &
available for free to the public domain. Design The Hardware/Software Interface, Morgan
Kaufmann Publishers, San Mateo, Califomia, 1994
Summary [3] Presentation of the R10000, Hot Chips, August 17,
1995
We have no quantitative measures how much our
[4] J. Heinrich et al.: MIPS RlOOOO Microprocessor User's
simulators improved teaching computer architecture. Manual, Version 2.0, October 1996, MIPS
But we do know that students spend more time getting Technologies,Mountain View, CA
familiar with pipelining than they spent using the paper [ 5 ] WinDLX, MIPSIM, MlOkSim, Xcache at
and pencil approach. We also know that students come http:Nwww.vlsivie.tuwien.ac.at/CompArch
well prepared to the lab which accompanies the course
and even better come with nronosals how to imnrove
I
0-7803-4762-5/98/$10.0001998 IEEE 1998 FIE Conference E
E 98CH362 14
E
1112

You might also like