You are on page 1of 27


Mariagiovanna Sami



Architecture: which definition?

Abstract architecture – the functional specification of a computer: Concrete architecture – an implementation of an abstract architecture. Abstract arcihtecture: a “black box” specification specification of a machine – can be seen:


31/07/2013 3 . e.).. From the designer’s point of view – we deal with a hardware model (a black-box description for the designer: must include additional information. equivalent to description of the machine language. interface protocols etc.g.Architecture definition (2)   From the programmer’s point of view – we deal with a programming model.

extended more in general to the structural description in terms of functional units and interconnections). 4 31/07/2013 .Architecture definition (3)  Usually. “architecture” denotes abstract architecture. “Concrete Architecture” is often called microarchitecture (term originally created for microprogrammed CPUs.

Where do we start from?   Background: the “Von Neumann paradigm” (and the Harvard alternative) Extension to a “reactive paradigm” – still V.N! 31/07/2013 5 .

An Architectural Paradigm:   Composition of hardware and program execution mode. Does not include software. but implies the execution mode of object code! 31/07/2013 6 .

abstract architecture: ALU Memory Control Unit I/O CPU 31/07/2013 7 .The classical V.N.

Variables are identified by “names” translated as addresses. 8 31/07/2013 .Programming style: imperative. control-flow dominated    One “address space” in memory – information is identified by its address Machine instructions are stored sequentially: natural order of fetching and execution is by increasing address values  execution in the same sequential order.

The Control Flow:    The C. “next” instruction is immediately sequential to the one being executed (address computed by incrementing PC) unless otherwise explicitly stated by control instruction 31/07/2013 9 .U.U. address of next instruction is computed: as a rule. decodes the instruction and controls its execution by proper commands to ALU and memory Simultaneously. determines address of next instruction to be executed as contained in ther Program Counter (PC) and fetches it from memory: The C.

Control-dominated execution:   Control is implicitly determined by ordering of instructions in the program or explicitly modified by jump/branch instructions: Execution is inherently sequential and serial. 31/07/2013 10 .

11 31/07/2013 . “polling”): C.U..The basic approach: C. the only active unit    All transfers to/from memory controlled by C.U. All actions are de facto synchronized by execution of the program. activates transfer channels.U. I/O initiated by instructions in the program (“program-controlled I/O”.

.  Basically.. separates program and data memory: ALU Program Memory Control Unit I/O Data Memory 31/07/2013 12 .The Harvard variant.

.Performance Evaluation.. Profiling (execution of the program with suitable sets of data) gives the dynamic sequence of instructions executed 13 31/07/2013 .    Made with reference to a set of benchmark programs (often “synthetic”). For every instruction in the machine’s Instruction Set (IS) the total time required (fetch+execute) is known.

necessity of accessing memory to read/write data. “performance optimization” through choice of “best” algorithm + less time-consuming instructions 31/07/2013 14 . even length of instruction itself).Performance Evaluation (2) Total time required by execution of the program = sum of times required by all instructions in the dynamic sequence of execution (Instructions may have different latency depending on specific operations.

technology dominates instruction latency and overall performances. but ever larger addressable memory space is requested!. Execution is totally serial – an instruction must be completed before its successor is fetched from memory. 31/07/2013 15 .(Some of) the bottlenecks   Memory is slower than logic: larger (and less costly) memory = wider gap.

Bottlenecks (2)  If a “reactive” system is designed (typically. an application-specific or “embedded” system”) an external “event” created by an I/O device is serviced only when the device is polled by the program – “real-time” only as good as the programmer can make it! 16 31/07/2013 .

Allow servicing external events when events arise – in an asynchronous way with respect to program execution.So. Achieve better efficiency for execution of the instruction sequence. 17 31/07/2013 . how to achieve better performances?    Modify memory structure so that the programmer will see a very large addressable space – but the CPU will see a fast “equivalent” memory.

..Starting from the bottom.U.) 31/07/2013 18 .  “Servicing external events”? Solution born with the first “minicomputers” (early ’60s): interrupt (an external unit may initiate an action – execution of the servicing routine is then controlled by the C.

31/07/2013 19 . (longer clock period). identification of “useful” complex instructions for general-purpose CPUs is difficult. Drawbacks: more complex C. one instruction fetched from memory executes actions previously performed by a sequence of instructions).Getting better efficiency for instruction execution?  A first approach: create instructions capable of executing complex operations (object code more compact.U.

 May be very useful when specialized tasks are widely used (e..Complex instructions Still.. DSP or imageprocessing) or for application-specific CPUs. 31/07/2013 20 ..g.:  The solution has been widely adopted– “CISC” machines a winning approach for a long time.

.Getting better efficiency for instruction execution – the alternative   Modify structure of CPU and execution paradigm to introduce parallelism – overcome the “serial execution” bottleneck.. But. Which kind of parallelism? Parallelism has to be detected within the application – at which level? 21 31/07/2013 .

small (and costly) ones at the top (nearest the CPU). Allow a wider memory bandwidth – more than one unit of information at a time is transferred from memory to CPU (or between memories). fast. 22 31/07/2013 .large.What about the memory problem?   Introduce a hierarchy of memories . slow (and cheap) ones at the bottom.

Memory (2) In fact:  Hierarchy: does not imply any assumption on mode of execution other than serial. requires extensions to hw structure controlling memory access. 31/07/2013 23 .  larger bandwidth: meaningful only if some form of parallelism is adopted.

..).: virtual memory and its hw supports... Attention will be given to cache organization and performances: technological aspects are not discussed here (other courses. 24 31/07/2013 .g.What these lectures will be about:  Memory hierarchy: it is assumed that the basic points are already known (e.). the scope of cache memory.

taking into account characteristics of application-specific systems:     31/07/2013 Pipelining Instruction-Level Parallelism (ILP) Multi-threading Multi-processor systems.What these lectures will be about (2):  Parallelism: from “within the CPU” to “system-level”. 25 .

26 31/07/2013 .Course organization    Lectures Exercises Use of tools for architecture evaluation and design:   Analysis of an application’s behaviour given a fixed architecture Design of a “specific” architecture for a given application.

papers accessible via Internet or provided in hardcopy). Manuals of software tools: available in the repository. 27 31/07/2013 . Suggested readings: a list will be circulated (books available in the Library.Texts:    Slides are available in the Master’s repository.