1 Up votes0 Down votes

653 views34 pagesadvanced computer architecture1

Jul 09, 2013

© Attribution Non-Commercial (BY-NC)

PPT, PDF, TXT or read online from Scribd

advanced computer architecture1

Attribution Non-Commercial (BY-NC)

653 views

advanced computer architecture1

Attribution Non-Commercial (BY-NC)

You are on page 1of 34

Pipeline Design

Instruction Pipeline Design

Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

A typical instruction execution includes a sequence of operations which includes: Instruction Fetch (F) Decode (D) Operand Fetch or Issue (I) Execute, several stages (E) Write Back (W)

Pipeline Design

Instruction Pipeline Design

Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Each operation (F, D, I, E, W) may require one clock cycle or more. Ideally, these operations need to be overlapped. Example (assumptions): load and store instructions take four cycles add and multiply instructions take three cycles

Pipeline Design

Instruction Pipeline Design

Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Goal: Achieve maximum parallelism in pipeline by smoothening the instruction flow and minimizing the idle cycles Mechanisms: Prefetch Buffers Multiple Functional Units Internal Data Forwarding Hazard Avoidance

Prefetch Buffers

Used to match the instruction fetch rate to the pipeline consumption rate In a single memory access, a block of consecutive instructions are fetched into a prefetch buffer Three types of prefetch buffers: Sequential buffers, used to store sequential instructions Target buffers, used to store branch target instructions Loop buffer, used to store loop instructions

At times, a specific pipeline stage becomes the bottleneck Identified by large number of checks in a row in reservation table To resolve dependencies, we use reservation stations Each RS is uniquely identified with a tag monitored by tag unit (Register Tagging) Helps in conflict resolution and serving as buffer

Goal: Memory access operations to be replaced with register transfer operations Types: Store load forwarding Load load forwarding Store store forwarding

Hazard Avoidance

Read/write of shared variables by different instructions in pipeline may lead to different results if instructions are executed out of order Types: Read after Write (RAW) Hazard Write after Write (WAW) Hazard Write after Read (WAR) Hazard

Instruction Pipeline Design

Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Instruction Scheduling

Aim: To schedule instructions through an instruction pipeline Types of instruction scheduling: Static Scheduling

Supported by optimizing compiler

Dynamic Scheduling

Achieved by Tomasulos register-tagging scheme Using scoreboarding scheme

Static Scheduling

Data dependency in a sequence of instructions create interlocked relationships Interlocking can be resolved by compiler by increasing separation between interlocked instructions Example:

Two independent load instructions can be moved ahead so that spacing between them and multiply instruction is increased.

Tomasulos Algorithm

Hardware dependent scheme Data operands are saved in Register Station (RS) until dependencies get resolved Register tagging is used to allocate/deallocate register All working registers are tagged

Scoreboarding

Multiple functional units appear in multiple execution pipelines. Parallel units allow instruction to execute out of order w.r.t. original program sequence. Processor has instruction buffers, instructions are issues regardless of the availability of their operands. Centralized control units called scoreboard is used to keep track of unavailable operands for instructions stored in buffer

Instruction Pipeline Design

Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Pipeline performance is limited by presence of branch instructions in program Various branch strategies are applied to minimize performance degradation To evaluate branch strategy, two approaches can be followed

Trace data approach Analytical analysis

Branching Illustrated

Ib: Branch Instruction Once branch taken is decided, all instructions are flushed Subsequently, all the instructions at branch target are run

Effect of Branching

Nomenclature:

Branch Taken, action of fetching non-sequential (remote) instructions after branch instruction Branch Target, (remote) instruction to be executed after branch taken Delay Slot (b), number of pipeline cycles consumed between branch taken and branch target In general, 0 <= b <= k-1 where k is number of pipeline stages

Effect of Branching

When branch taken occurs, all instruction after branch instruction become useless, pipeline is flushed, loosing number of cycles Let Ib be branch instruction, then branch taken shall cause all instructions from Ib+1 till Ib+k-1 to be drained from pipeline Let p be probability of instruction to be branch instruction and q be probability of branch taken, then penalty, in terms of time is expressed as Tpenalty = pqnbt , where n: number of instructions; b: number of pipeline cycles consumed; t: cycle time Effective execution time becomes T = kt + (n-1)t +

Branch Prediction

Branch can be predicted based on

Static Branch Strategy

Probability of branch with respect to a particular branch type can be used to predict branch Probability may be obtained by collecting frequency of branch taken and branch types across large number of program traces

Uses limited recent branch history to predict whether or not branch will be taken when it occurs next time

Branch prediction buffer

Used to store the branch history information in order to make branch prediction

Delayed Branches

Branch penalty can be reduced by the concept of delayed branch The central idea is to delay the execution of branch instruction to accommodate independent* instructions

Delaying by d cycles allows few useful instructions (independent*) of branch instructions to be executed * Execution of these instructions should be independent of outcome of branch instruction

Instruction Pipeline Design

Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Arithmetic is performed with finite precision due to use of fixed size memory words or registers Finite precision implies that data exceeding the limit is either truncated or rounded off Types of arithmetic operations: Fixed point operations

Represented internally mostly using 2s complement

Represented internally mostly using IEEE 754 standard

Instruction Pipeline Design

Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Most arithmetic pipelines performs fixed functions. Due to performance of a fixed function, it is also called unifunctional pipeline ALUs performs fixed-point using integer unit Floating-point operations is performed using a separate unit (coprocessor)

All arithmetic operations can be performed using basic add and shift operations Arithmetic and logical shifts can be performed with shift registers Addition can be done using carry propagation adder (CPA) or carry save adder (CSA)

CSA and CPA are used at different stages to design pipeline for fixed point multiplication

Example: multiplication of two 8-bit numbers, producing a 16-bit result S1: generates eight partial products S2: two levels of CSAs taking eight numbers and producing four S3: two CSAs convert four numbers into two numbers S4: one CPA takes two numbers and result into one number

Source: Kai Hwang

Instruction Pipeline Design

Instruction Execution Phases Mechanism for Instruction Pipelining Dynamic Instruction Scheduling Branch Handling Techniques

Computer Arithmetic Principles Static Arithmetic Pipelines Multifunctional Arithmetic Pipelines

Multifunctional arithmetic pipeline perform many functions Types of multifunctional pipelines: Static pipeline

Performs single function at a given time, another function at some other time

Dynamic pipeline

Performs multiple functions at the same time Care needs to be taken in sharing the pipeline

Example: Advanced Scientific Computer

Key features: Four pipeline arithmetic units Large number of working registers in the processor which controls operations of memory buffer units and arithmetic units IPU handles fetching and decoding of instructions

Pipeline Interconnections

Example: Advanced Scientific Computer Arithmetic pipeline has eight stages It is an example of static multifunctional pipeline With change in interconnections, different functions (fixed-point and floating point) can be performed

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.