ARCHITECTURES - 1

Mariagiovanna Sami

31/07/2013

1

Architecture: which definition?

Abstract architecture – the functional specification of a computer: Concrete architecture – an implementation of an abstract architecture. Abstract arcihtecture: a “black box” specification specification of a machine – can be seen:
2

31/07/2013

Architecture definition (2)   From the programmer’s point of view – we deal with a programming model. 31/07/2013 3 . equivalent to description of the machine language. From the designer’s point of view – we deal with a hardware model (a black-box description for the designer: must include additional information.g.). e.. interface protocols etc.

“architecture” denotes abstract architecture.Architecture definition (3)  Usually. extended more in general to the structural description in terms of functional units and interconnections). 4 31/07/2013 . “Concrete Architecture” is often called microarchitecture (term originally created for microprogrammed CPUs.

Where do we start from?   Background: the “Von Neumann paradigm” (and the Harvard alternative) Extension to a “reactive paradigm” – still V.N! 31/07/2013 5 .

An Architectural Paradigm:   Composition of hardware and program execution mode. Does not include software. but implies the execution mode of object code! 31/07/2013 6 .

N.The classical V. abstract architecture: ALU Memory Control Unit I/O CPU 31/07/2013 7 .

Variables are identified by “names” translated as addresses.Programming style: imperative. control-flow dominated    One “address space” in memory – information is identified by its address Machine instructions are stored sequentially: natural order of fetching and execution is by increasing address values  execution in the same sequential order. 8 31/07/2013 .

determines address of next instruction to be executed as contained in ther Program Counter (PC) and fetches it from memory: The C. address of next instruction is computed: as a rule.U.U. “next” instruction is immediately sequential to the one being executed (address computed by incrementing PC) unless otherwise explicitly stated by control instruction 31/07/2013 9 . decodes the instruction and controls its execution by proper commands to ALU and memory Simultaneously.The Control Flow:    The C.

31/07/2013 10 .Control-dominated execution:   Control is implicitly determined by ordering of instructions in the program or explicitly modified by jump/branch instructions: Execution is inherently sequential and serial.

“polling”): C.. 11 31/07/2013 .The basic approach: C. All actions are de facto synchronized by execution of the program. the only active unit    All transfers to/from memory controlled by C.U.U. I/O initiated by instructions in the program (“program-controlled I/O”.U. activates transfer channels.

.The Harvard variant.. separates program and data memory: ALU Program Memory Control Unit I/O Data Memory 31/07/2013 12 .  Basically.

Performance Evaluation. Profiling (execution of the program with suitable sets of data) gives the dynamic sequence of instructions executed 13 31/07/2013 ..    Made with reference to a set of benchmark programs (often “synthetic”). For every instruction in the machine’s Instruction Set (IS) the total time required (fetch+execute) is known..

even length of instruction itself).Performance Evaluation (2) Total time required by execution of the program = sum of times required by all instructions in the dynamic sequence of execution (Instructions may have different latency depending on specific operations. necessity of accessing memory to read/write data. “performance optimization” through choice of “best” algorithm + less time-consuming instructions 31/07/2013 14 .

Execution is totally serial – an instruction must be completed before its successor is fetched from memory. but ever larger addressable memory space is requested!.(Some of) the bottlenecks   Memory is slower than logic: larger (and less costly) memory = wider gap. 31/07/2013 15 . technology dominates instruction latency and overall performances.

Bottlenecks (2)  If a “reactive” system is designed (typically. an application-specific or “embedded” system”) an external “event” created by an I/O device is serviced only when the device is polled by the program – “real-time” only as good as the programmer can make it! 16 31/07/2013 .

Achieve better efficiency for execution of the instruction sequence. how to achieve better performances?    Modify memory structure so that the programmer will see a very large addressable space – but the CPU will see a fast “equivalent” memory. 17 31/07/2013 . Allow servicing external events when events arise – in an asynchronous way with respect to program execution.So.

) 31/07/2013 18 ..Starting from the bottom.  “Servicing external events”? Solution born with the first “minicomputers” (early ’60s): interrupt (an external unit may initiate an action – execution of the servicing routine is then controlled by the C.U..

Getting better efficiency for instruction execution?  A first approach: create instructions capable of executing complex operations (object code more compact. 31/07/2013 19 . identification of “useful” complex instructions for general-purpose CPUs is difficult. one instruction fetched from memory executes actions previously performed by a sequence of instructions). Drawbacks: more complex C.U. (longer clock period).

.:  The solution has been widely adopted– “CISC” machines a winning approach for a long time..g.  May be very useful when specialized tasks are widely used (e.Complex instructions Still. 31/07/2013 20 .. DSP or imageprocessing) or for application-specific CPUs.

.Getting better efficiency for instruction execution – the alternative   Modify structure of CPU and execution paradigm to introduce parallelism – overcome the “serial execution” bottleneck. But.. Which kind of parallelism? Parallelism has to be detected within the application – at which level? 21 31/07/2013 .

slow (and cheap) ones at the bottom.What about the memory problem?   Introduce a hierarchy of memories . small (and costly) ones at the top (nearest the CPU).large. 22 31/07/2013 . Allow a wider memory bandwidth – more than one unit of information at a time is transferred from memory to CPU (or between memories). fast.

31/07/2013 23 . requires extensions to hw structure controlling memory access.Memory (2) In fact:  Hierarchy: does not imply any assumption on mode of execution other than serial.  larger bandwidth: meaningful only if some form of parallelism is adopted.

. 24 31/07/2013 .).).What these lectures will be about:  Memory hierarchy: it is assumed that the basic points are already known (e. the scope of cache memory.: virtual memory and its hw supports..g.. Attention will be given to cache organization and performances: technological aspects are not discussed here (other courses..

taking into account characteristics of application-specific systems:     31/07/2013 Pipelining Instruction-Level Parallelism (ILP) Multi-threading Multi-processor systems. 25 .What these lectures will be about (2):  Parallelism: from “within the CPU” to “system-level”.

Course organization    Lectures Exercises Use of tools for architecture evaluation and design:   Analysis of an application’s behaviour given a fixed architecture Design of a “specific” architecture for a given application. 26 31/07/2013 .

papers accessible via Internet or provided in hardcopy).Texts:    Slides are available in the Master’s repository. 27 31/07/2013 . Suggested readings: a list will be circulated (books available in the Library. Manuals of software tools: available in the repository.

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.