This action might not be possible to undo. Are you sure you want to continue?
Click to edit Master subtitle style
Department of Computer Science and Engineering
q q q q q q q
Basic concepts Data hazards Instruction hazards Influence on instruction sets Data path and control considerations Performance considerations Exceptionhandling.
Speed of execution of programs can be
increased through many factors. Simplest solution is:
processor Increase main memory.
Another solution is:
faster circuit technology to build the
arrange the hardware in such a manner so
that more than one operation can be performed at the same time. Thereby the number of operations performed per second is increased even 5/27/12 though the elapsed time needed to perform
Basic Concepts (Contd.)
Pipelining can be adopted in a computer. A processor executes a program by
fetching and executing instructions, one after the other.
Execution of a program consists of a
sequence of fetch and execute steps, as shown in Figure 1.
as shown in Figure 2 5/27/12 .Sequential Execution Now consider a computer that has two separate hardware units. one for fetching instructions and another for executing them.
B1. 5/27/12 .Hardware Organization The instruction fetched by the fetch unit is deposited in an intermediate storage buffer.
(Figure 3) 5/27/12 . The computer is controlled by a clock whose period is such that the fetch and execute steps of any instruction can each be completed in one clock cycle.Basic Concepts This buffer is needed to enable the execution unit to execute the instruction while the fetch unit is fetching the next instruction.
Pipelined Execution 5/27/12 .
D Decode : decode the instruction and fetch the source operand(s). 5/27/12 E Execute : perform the operation specified by the instruction.Basic Concepts The processing of an instruction need not be divided into only two steps. : read the instruction from the . as follows: F Fetch memory. a pipelined processor may process each instruction in four steps. For example.
Instruction Execution Figure 4 5/27/12 .
Hardware Organization Figure 5 5/27/12 .
Role of Cache Memory Each stage in a pipeline is expected to complete its operation in one clock cycle. If different units require different amounts of time. 5/27/12 . the clock period must allow the longest task to be completed. Hence. the clock period should be sufficiently long to complete the task being performed in any stage.
5/27/12 . Instruction fetch requires main memory access. The access time of main memory takes more time.Role of Cache Memory (Contd. To avoid such problem we can go for cache memory.) A unit that completes its task early is idle for the remainder of the clock period.
5/27/12 .Pipeline performance The following Figure 6 shows an example in which the operation specified in instruction I2 requires three cycles to complete and others require only one cycle.
Effect of an execution taking more than one clock cycle 5/27/12 .
The different types of hazard which would occur are u u u Data hazard Control hazard (instruction hazard) Structural hazard 5/27/12 . Any condition that causes the pipeline to stall is called a hazard. In the given example the pipeline operation is said to have been stalled for two clock cycles.
17 5/27/12 . Assume that A = 5.DATA HAZARD Example for Data Hazard A data hazard is a situation in which the pipeline is stalled because the data to be operated on are delayed for some reason. and consider the following two operations: The data used in the second instruction depends on the result of the first instruction.
Handling Data Hazards Ø Operand forwarding Ø Handling data hazards in software Ø Side effects 5/27/12 .
R4 Add R5. So the pipeline is stalled for 2 clock cycles.R6 The result of the multiply operation is stored in register R4 R4 is one of the source operands of Add instruction.R4. As decode unit decodes Add instruction in cycle 3. The data are available at the output of ALU once the execute stages completes step E1 In operand forwarding we rearrange result of the instruction I1 is forwarded directly to step E2. 5/27/12 . Hence decode instruction of I2 cannot completed until the W step of I1 instruction has been completed.R3. realizes R4 is source operand.Operand forwarding Mul R2.
SRC2. these registers constitute the interstage buffers needed in pipeline operation. SRC1. and RSLT are the registers. SRC1 and SRC2 are the part of buffer B2 RSLT is part of buffer B3 The two multiplexers connected at the inputs to ALU allow the data on the destination bus to be selected instead of the contents of either SRC1 or SRC2 register. 5/27/12 .
The instructions are executed in the datapath and the operations performed in each clock cycle as follows. a decision is made to use data forwarding. After decoding instruction and detecting data dependency. and loaded in register SRC1 in clock cycle 3 is available in Register RSLT. The operand not involved in dependency is R2 is read In next clock cycle the product produced by instruction I1 5/27/12 Because of forwarding connection it can be used in step .
In this case. as follows: The dependencies is left entirely to software. thus achieve better performance. the compiler can introduce the two-cycle delay needed between instructions I1and I2 by inserting NOP (No-operation) instructions. 5/27/12 . the compiler must insert NOP instructions to obtain correct result.Handling Data Hazards in software An alternative approach is to leave the task of detecting data dependencies and dealing with them to the software. It illustrates the close link between compiler and the hardware. The compiler can attempt to reorder instructions to perform useful task in the NOP slot.
If it is often the given processor architecture has several hardware implementations.Disadvantages In other hand the NOP instructions leads larger code size. 5/27/12 . NOP instructions are inserted to satisfy the requirements of one implementation and would lead to reduced performance in different implementation.
The data dependencies in the previous examples are easily detected because
the register involved in destination of instruction I1 and source of instruction I2.
Sometimes an instruction changes the contents of a register in destination.
(e.g. Autoincrement or autodecrement).
All the precautions needed to be handle data dependencies involving the
destination location must also be applied to the registers affected by an autoincrement or autodecrement operation.
When a location other than one explicitly named in an instruction as a
destination operand is affected, the instruction is said to have side effect.
Stack instruction such as push, pop, produce similar side effects because they
implicitly use auto increment and auto decrement addressing modes. Another possible side effect involves the condition code flags which are used by the instructions such as conditional branches and add with carry. 5/27/12
The registers R1 and R2 hold double precision integer number and we wish
to add another double precision number in registers R3 and R4. This may be accomplished as follows
• The implicit dependency exists between these two instructions through
carry flag. • This flag is set by first instruction and used in second instruction, which performs the operation
• Instructions with side effects give rise to multiple data dependencies,
which lead to a substantial increase in the complexity of the hardware or software to resolve them. • The instruction designed for execution on pipelined hardware should have few side effects. 5/27/12
Pipeline may be stalled delay in availability of an instruction.
Unconditional branches •Conditional branches and branch prediction.
A branch instruction may also cause the pipeline to stall.
Instructions I1 to I3 are stored in successive memory addresses. Consider I2 is a branch instruction and the target to be Ik. In clock cycle 3 the fetch operation for instruction I3 is in progress at the
same time branch address is being decoded and target address is computed.
In clock cycle 4 processor discard I3, which has been incorrectly fetched and
fetch instruction Ik.
In the mean time hardware unit responsible for Execute step must be told to
do nothing during that clock period. Thus pipeline is stalled for one clock cycle.
The time lost as a result of a branch instruction is often referred as branch
The instruction fetch unit has dedicated hardware to identify a branch instruction and compute branch target address quickly as possible after an instruction is fetched. the branch penalty is only one clock 5/27/12 cycle. In this case.Reducing branch penalty requires the branch address to be compute earlier in the pipeline. . With this additional hardware both of these tasks can be performed in step D2.
• 5/27/12 The dispatch unit also perform decoding function. . • A separate unit called dispatch unit takes instructions from the front of the queue and sends them to the execution unit. • To reduce these many processors employ sophisticated fetch units that can fetch instructions before they are needed and put them in a queue.Instruction queue and prefetching Cache miss or a branch instructions stalls the pipeline for one or more clock cycles.
So if there is delay in fetching instructions because of branch or cache miss. The fetch unit continues to fetch instruction and add them to the queue. It attempts to keep the instruction queue filled at all times to reduce delays. 5/27/12 . the dispatch unit continues to issue instructions from the instruction queue. When the pipeline stalls because of data hazard. To be effective the fetch unit must have sufficient decoding and processing capability to recognize and execute branch instructions.
Every fetch operation adds one instruction to the queue and every dispatch operation reduces queue length by one.• Assume that initially the queue contains one instruction. • • The queue length remains the same for first 4 clock cycles. The instruction I1 introduces 2-cycle stall. space is available in the queue the fetch unit continues to fetch instructions and add the queue length rises to 3 in clock cycle 6 • 5/27/12 .
Every fetch operation adds one instruction to the queue and every dispatch operation reduces queue length by one. The queue length drops to 1 in cycle 8. The branch instruction would normally cause a stall in clock cycle 7 as a result of discarding instruction I6 but I4 is dispatched from the queue to decoding stage. 5/27/12 • • • • • . The queue length remains the same for first 4 clock cycles. space is available in the queue the fetch unit continues to fetch instructions and add the queue length rises to 3 in clock cycle 6 Instruction I5 is a branch instruction (target is Ik is fetched in cycle 7 so I6 is discarded). The instruction I1 introduces 2-cycle stall.• • Assume that initially the queue contains one instruction.
The branch folding occurs only if at the time branch instruction is encountered.3. The branch instruction does not increase overall execution time. at least one instruction available in the queue other than the branch instruction. 5/27/12 .I2. This is because the instruction fetch unit has executed the branch instruction concurrently with the execution of other instructions.I4 and Ik complete execution in successive clock cycles. The instructions I1. This technique referred as branch folding.
Instruction queue is also beneficial in dealing with cache misses. The width of connection between the fetch unit and the instruction cache allow reading more than one instruction in each clock cycle. 5/27/12 . It is desirable to arrange for the queue to be full most of the time to ensure the adequate supply of instructions for processing. Branch folding occurs only if at the time a branch instruction is encountered atleast one instruction is available in the queue other than branch instruction.
they represent about 20 percent of the dynamic instruction count of most programs.Conditional branch and branch prediction Branch instructions occur frequently. Delayed branch Branch prediction Dynamic branch prediction 5/27/12 .
5/27/12 . The objective is to place useful instructions in these slots. these slots must be filled with NOP instructions. If no useful instructions can be placed in delay slots. The instructions in the delay slots are always fetched. so we need to arrange them to be fully executed whether or not the branch is taken.Delayed branch Branch delay slot A technique delayed branching can minimize the penalty incurred as a result of conditional branch instructions.
The shift instruction is fetched while branch instruction is executed. In either case completes the execution of the shift instruction. 5/27/12 . The instruction in reordered as shown in the figure. After evaluating the branch condition the processor fetches the instruction at LOOP or at NEXT depending upon whether the branch condition is true or false. The pipeline is not interrupted and there are no idle cycles.
Branch Prediction Assume that the branch will not take place and to continue to fetch instructions in sequential address order. Instruction execution is done on a speculative basis. If the branch decision indicates otherwise. Instructions are executed before the processor is certain that they are in the correct execution sequence is called as speculative execution. 5/27/12 . the instructions and all their associated data in the execution unit must be purged and correct instructions fetched and executed. A care must be taken that no processor registers or memory locations updated until it is confirmed that these instructions should indeed be executed.
while instruction I3 is being fetched. Branch prediction takes place in cycle 3. At this time the instruction fetch unit realizes that prediction was incorrect and the two instructions in the execution pipe are purged. assuming that they are forwarded immediately to the instruction fetch unit the branch condition is evaluated in cycle 4. A new instruction Ik is fetched from the branch target address in clock cycle 5. An incorrectly predicted branch in shown in the above figure. The figure shows a compare instruction followed by a Branch>0 instruction. The results of compare operation are available at cycle 3. The fetch unit predicts that the branch will not be taken. 5/27/12 . and it continues to fetch instruction I4 as I3 enters into decode stage.
Dynamic branch prediction The objective of branch prediction algorithms is to reduce the probability of making a wrong decision. The execution history used in predicting the out come of a given branch instruction is the result of the most recent execution of the instruction. The processor assumes that the next time the instruction is executed the result is likely to be same. The two states are: LT: Branch is likely to be taken LNT: Branch is likely not to be taken 5/27/12 . to avoid fetching instructions that eventually have to be discarded.
the machine will lead to the wrong prediction. Once a loop is entered. Assume that the algorithm started in the state LNT. and assume that there will be more than one pass through the loop. 5/27/12 . the branch prediction will turn out to be incorrect. and the branch history state machine will be changed to the opposite state. Better performance can be achieved by keeping more information about execution history. the branch is predicted as taken if the corresponding state machine is in state LT. When the branch instruction is executed and if the branch is taken. The next time the same instruction is encountered. it remains same in LNT. other wise. In the last pass. the machine moves to state LT. the branch instruction that controls looping will always yield the same result until the last pass through the loop is reached. Otherwise it is predicted as taken. Unfortunately this means that the next time this same loop is entered.
The four states are: ST: Strongly likely to be taken LT: Likely to be taken LNT: Likely not to be taken SNT: Strongly likely not to be taken 5/27/12 . thus requiring two bits of history information for each branch instruction is shown in the figure. An algorithm that uses 4 states.
and if the branch is actually taken. Assume that the same state of algorithm is initially set to LNT. and it begins to fetch instructions at the branch target address. Assume that the branch instruction is at the end of the loop and that the processor sets initial state of the algorithm to LNT. the instruction fetch unit predicts that the branch will not be taken. When in state SNT. otherwise. that is if the prediction is incorrect. This means that the next time the same branch instruction is encountered. the state is changed to ST. the instruction fetch unit predicts that the branch will be taken if the state is either LT or ST. Otherwise it continues to fetch instruction in sequential order. After the branch instruction has been executed. Only if the prediction is incorrect twice in a row will change the state to ST. During the first pass. 5/27/12 . If the branch is actually taken. the instruction fetch unit will still predict that the branch will not be taken. and hence the state will be changed to ST. it is changed into SNT. the state changes to LNT. the prediction will be wrong (not taken). When a branch instruction is encountered.
In the absence of additional information about the nature of the branch instruction. When the loop is entered a second time. 5/27/12 . The information needed to set the initial state correctly can be provided by any of the static prediction schemes discussed earlier. the processor sets the initial state of the algorithm to LNT or LT. we assumed that the processor sets initial state to LNT. At that time the state will change to LT. except the last pass. In all subsequent passes the prediction will be correct. the prediction will be correct (branch taken). Either by comparing addresses or by checking a prediction bit in the instruction. We now one final modification to correct the mispredicted branch at the time the loop is first entered.
it is possible for two branch instructions share the same table entry. the compiler would indicate that the branch should be predicted as taken. except for the final pass through the loop. 5/27/12 . but it does not cause an error in execution. causing initial state to be set to LT. It may reordered in a look-up table. This may lead to a branch being mispredicted. branch prediction will be correct all the time. The state information used in dynamic branch prediction algorithms may kept by the processor in a variety of ways. It only introduces a small delay in execution time. With this modification. In this case at the end of the loop.
indirect. autoincrement.Influence on Instruction Sets v Addressing modes v Conditional codes Addressing modes It is for accessing a variety of data structures simply and efficiently. Addressing modes to be implemented in a pipelined processor. we must consider the effect of each addressing mode on instruction flow in the pipeline. The given mode is likely to be used by compilers. Useful addressing modes are index. Two important considerations are side effects of modes such as autoincrement and auto decrement and the extent to which complex addressing modes cause the pipeline to stall. and autodecrement. 5/27/12 . Many processors provide various combination of these modes to increase the flexibility of their instruction sets.
R2 may be executed. To implement the same Load operation using only simple addressing modes requires several instructions. on a computer that allows three operand addresses. the instruction Load (X(Rl)). After computing the address in the cycle 3. For example. For example. which can be reduced to two cycles with operand forwarding. assume that the index offset. If R2 is a source operand in the next instruction that instruction would be stalled for 3 clock cycles. the processor needs to access memory twice First to read the location X+[R1] in clock cycle 4 Then read location [X+[R1]] in cycle 5. X is given in the instruction word. we can use 5/27/12 .
as shown in the figure. single load instruction . Adv: It reduces number of instructions needed to perform a given task and there by reduce the program space needed in the main memory. The add instruction performs the operation R2 X+[R1]. 5/27/12 . complex addressing modes that involves several accesses to the memory do not lead to faster execution. This sequence of instructions takes exactly the same number of clock cycles as the original. They require more complex hardware to decode and execute them and thet are not convenient for compiles to work with. In a pipelined processor. Disadv: Long execution times cause the pipeline to stall. Two load instructions fetch the addresses and operand from memory. thus reducing effectiveness.
whether the index value is given in the instruction or in the register. In the index mode the address can be computed in one cycle. register indirect. The address modes used in modern processors often have the following features: Access to an operand does not require more than one access to the memory. The addressing modes used do not have side effects. None of these modes has any side effects. and index. Only load and store instructions access memory operands. The instruction sets of modern processors are designed to take maximum advantage of pipelined hardware. with one possible exception. 5/27/12 . First two require no address computation. Three basic addressing modes that have these features are register.
as shown in 8. They are either set or cleared by many instructions.17b. The execution time of the branch instruction can be reduced by interchanging the Add and Compare instructions. Consider the sequence of instructions in the fig 8. The compiler must ensure that reordering does not cause a change in outcome of a computation. so that they can be tested by a subsequent conditional branch instruction to change the flow of program execution. An optimizing compiler for a pipelined processor attempts to reorder instructions to avoid stalling when branches or data dependencies or occurs. the branch decision takes place in step E2 rather than D2 because it must await the result of the Compare instruction. 5/27/12 .17a.Condition codes The condition code flags are stored in processor status register.
Interchanging the Add and Compare instructions can be done only if the Add instruction does not affect the condition codes. 5/27/12 . As a result. These lead to two important conclusions: To provide flexibility in reordering instructions. at the time the branch instruction being decoded and the result of the compare instruction will be available and correct branch decision will be made. This will delay the branch instruction by one cycle relative to the compare instruction. the condition-code flags should be affected by as few instructions as possible. The compiler should be able to specify in which instructions of a program the condition codes are affected and in which they are not.
8 57 5/27/12 . 7.Datapath and control considerations Fig.
5/27/12 . The data address in DMAR can be obtained directly from the register file or from the ALU to support the register indirect and indexed addressing modes. so that the contents of the PC can be transferred to IMAR at the same time that an independent ALU operation is taking place. Separate MDR registers are provided for read and write operations. The PC is connected directly to the IMAR. There are separate instruction and data caches that use separate address and data connections to the processor. IMAR for accessing the instruction cache and DMAR for accessing the data cache. This requires two versions of the MAR register. Data can be transferred directly between these registers and the register file during load and store operations without the need to pass through the ALU.
Buffer registers have been introduced at the inputs and output of the ALU. They may be added if desired.18. The output of the instruction decoder is connected to the control signal pipeline. These are registers SRC SRC2. 5/27/12 . which is loaded from the instruction cache. Forwarding connections are not included in Figure 8. The need for buffering control signals and passing them from one stage to the next along with the instruction. The instruction register has been replaced with an instruction queue. and RSLT in Figure 8.7.
The following operations can be performed independently in the processor of Figure 8. Reading an instruction from the instruction cache Incrementing the PC Decoding an instruction Reading from or writing into the data cache Reading the contents of up to two registers from the register file Writing into one register in the register file Performing an ALU operation 5/27/12 .l8.
they can be performed simultaneously in any combination. Decode instruction I3 Fetch instruction I4 and increment the PC. 5/27/12 . The following actions are happen during clock cycle 4: Write the result of instruction I1 into the register file Read the operands of instruction I2 from the register file.These operations do not use shared resources.
Performance consideration The execution time T is given by T = (N*S)/R N – Dynamic instruction count S – Average number of clock cycles it takes to fetch and execute one instruction. R – Clock rate The instruction throughput Ps is given by Ps = R/S 5/27/12 .
In general. the instruction throughput is reduced. This lead to two questions: How much of this potential increase in instruction throughput can be realized in practice? What is good value for n? Anytime a pipeline is stalled. it would appear that the higher value of n. an n-stage pipeline has the potential to increase throughput n times. A four-stage pipeline may increase instruction throughput by a factor of four. Thus. Hence. 5/27/12 . larger the performance gain. the performance of pipeline is highly influenced by factors such as branch and cache miss penalties.
The clock rate. Pp is given by Pp = R = 500 MIPS ( million instructions per second). a clock of 500MHz can be used. This is the time needed to add two integers. hence the time allocated to each step in the pipeline is determined by the longest step.Effect of instruction hazards Consider the processor that uses the four-stage pipeline. Let the delay through the ALU be the critical parameter. Thus. 5/27/12 . if ALU delay is 2 ns. The on-chip instruction and data caches for this processor should also be designed to have an access time of 2 ns. Under ideal conditions. this pipelined processor will have an instruction throughput.
A cache miss stalls the pipeline by an amount equal to the cache miss penalty for the instruction in which the miss occurs. a pipelined processor completes the execution of one instruction in each clock cycle. thus TI = 1clock cycle. A cache miss can occur for either an instruction or data. 5/27/12 . in that system is computed to be 17 clock cycles. Let TI be the time between two successive instruction completions. The cache miss penalty. For sequential execution TI = S. Mp. However. in the absence of hazards.
and let d be the percentage of instructions that refer to data operands in the memory. 5/27/12 . Eg: Assume that 30 percent of the instructions access data and memory. Pp = 210 MIPS.36 cycles Taking this delay into account.1)*17 = 1. the throughput is obtained directly in MIPS. the processor’s throughput would be Pp = R/TI = R/(1 + &miss ) = 0.42R Note that R is expressed in MHz. For R=500 MHz. is given by &miss = ((0.05+0.3*0. with a 95 percent instruction hit rate and 90 percent data hit rate. The average increase in the value of TI as a result of cache miss is given by &miss = ( (1-hi) + d (1-hd) ) * Mp where hi and hd are the hit ratios for instructions and data respectively. Consider a computer that has shared cache for instructions and data.
Ps = 95 MIPs.2 is only slightly better than one- half the ideal case. Assume that the time needed to transfer an 8-word block from the secondary cache is 10 ns.42/0. This can be achieved by introducing a secondary cache between the primary. on-chip cache and the memory. 5/27/12 . Let us compare this value to the throughput obtainable without pipelining. Reducing cache miss penalty is particularly worthwhile in a pipelined processor.19 = 2. Clearly pipelining leads to significantly higher throughput.19R For R = 500 MHz. But the performance gain of 0. Its throughput would be Ps = R/(4 + &miss ) = 0. A processor that uses sequential execution requires 4 cycles per instruction.
5/27/12 .22R. An equivalent non-pipelined processor would have a throughput of 0. In this case of a miss in the secondary cache. the average increase in TI is &miss = ( (1-hi) + d (1-hd) ) * (hs*Ms + (1-hs) * Mp) = 0. Hence. or 340 MIPs. 110 MIPS.22 = 3.1. Thus the pipelining provides a performance gain of 0.68R. Hence assuming a hit rate Hs of 94 percent in the secondary cache. a miss in the primary cache for which required block is found in the secondary cache introduces penalty.46 cycle The instruction throughput in this case in 0.68/0. the full 17-cycle penalty is still incurred. Ms of 5 clock cycles.
in a processor that uses an instruction queue . 5/27/12 . Also. the cache miss penalty during instruction fetches may have reduced effect as the processor is able to dispatch instructions from the queue. An optimizing compiler attempts to increase the distance between two instructions that create dependency by placing other instructions between them whenever possible.
The dependencies between instructions may still cause pipeline to stall also branch penalties. For these reasons. and the associated cost is not justified. because more instructions are being executed concurrently. so does the probability of the pipeline being stalled. the gain from increasing the value of n begins to diminish. However. as the number of pipeline stages increase. 5/27/12 .Number of pipeline stages The fact that an n-stage pipeline may increase instruction throughput by a factor of n suggests that we should use a large number of stages. Another important factor is the inherent delay in the basic operations performed by the processor.
the cycle time of the processor clock is chosen such that one ALU operation can be completed in one clock cycle. The most important among these is the ALU delay. It is also possible to use a pipelined ALU. in which each stage completes its operation in 5 ns. Other operations are divided into steps that take about the same time as an add operation. In many processors. 5/27/12 . Eg: Compaq Alpha 21064 processor consist of a two stage pipeline.
Many pipelined processors use four to six stages. For fast operations there are two pipeline stages in one clock cycle. 5/27/12 . Others divide instruction execution into smaller steps and use more pipeline stages and faster clock.5 GHz. UltraSPARC II uses 9-stage pipeline Pentium pro uses 12-stage pipeline Pentium 4 has 20-stage pipeline and uses the clock speed in the range 1.3 to 1.
Exception Handling Exceptional situations are harder to handle in a pipelined CPU because the overlapping of instructions makes it more difficult to know whether an instruction can safely change the state of the CPU. 5/27/12 . In a pipelined CPU. Unfortunately other instructions in the pipeline can raise exceptions that may force the CPU to abort the instructions in the pipeline before they complete. an instruction is executed piece by piece and is not completed for for several clock cycles.
Types of Exceptions I/O device request Invoking an operating system service from a user program Tracing instruction execution Breakpoint Integer arithmetic overflow FP arithmetic anomaly Page fault Misaligned memory accesses Memory-protection violation Undefined or unimplemented instruction Hardware malfunctions Privilege violation Hardware and power failure 5/27/12 .
asynchronous: If the event occurs at the same place every time the program is executed with the same data and memory allocation is synchronous. asynchronous events are caused by devices external to the CPU and memory. it is a user-requested event. which makes easier to handle. coerced If the task directly asks for it. User requested vs. Asynchronous events usually can be handled after the completion of the current instruction. The requirement can be characterized into five semi independent axes: Synchronous vs. . With the exception of hardware malfunctions. User requested exceptions are predictable and easier to handle. which makes easier to handle. Usually can be handled after the completion of the current 5/27/12 instruction.
If the program execution continues after the interrupt. it is terminating event. The mask simply controls whether the hardware responds to the exception or not. It is easier to implement exceptions that terminate program execution 5/27/12 . it is user maskable. it is a resuming event. Within vs. User Maskable vs. between instructions Exceptions occurring within instructions are synchronous It is harder to deal with exceptions that occur within instructions Resume vs. unmaskable If an event can be masked or disabled by a user task. terminate If the program’s execution always stops after the interrupt.
when a program accesses a page that is mapped in address space. trace all trace methods trace off trace results Page fault: Page fault is an interrupt (or exception) to the software raised by the hardware. 5/27/12 . but not loaded in physical memory.I/O device request: Errors that are related to an I/O request are usually indicated in the status data provided with the I/O interrupt. and is primarily used for debugging. These errors are: Tracing instruction: The trace instruction is used to control the tracing of the execution.
disable any antivirus protection in the BIOS. Restart the computer to see whether the error messages persist.Hardware malfunction This behavior can occur if a hardware component malfunctions. Check the Adapters Remove any adapters that are not required to start the computer and run Windows. Check the Computer BIOS/Configuration Verify that you have installed the latest revisions for your computer's BIOS or firmware configuration software. In many cases. Check the Memory Remove any extra memory modules that are in the computer. Check For Updated Drivers 5/27/12 . leaving only the least amount that is required for the computer to start and run Windows. you can start your computer with only the drive subsystem controller and the video adapter. or if there are damaged or incompatible drivers Installed. Go into the BIOS and set load Fail-safe defaults or BIOS defaults. and then set Plug and Play OS to No.
Arithmetic overflow If you use fixed precision datatypes (smallint. Here's an example: if you multiply 9.11 (both numeric(18. If Firebird would store that into numeric(18.4). 5/27/12 .9632.0032. Otherwise you need to check every operation and calculate the result.2)) you would get 73. the result is stored in numeric(18. you can leave it at that. Try casting the values in complex expressions as double precision and see whether the error goes away. it is possible that the result of calculation doesn't fit the datatype. Doesn't look much.12 with 8. If it works and you don't care about being too precise.2) datatype. bigint. we would lose 0. but when you have complex calculations. decimal and numeric). you can easily loose thousands (dollars or euros). integer. Therefore.
as it is used by a 5/27/12 .Break point Exception This is a DMA problem. You would not normally see this exception. and when such an event occurs. It sounds like your program is executing code in some DLL and that DLL has found something really wrong. debugger under the direction of a programmer to help solve or validate an issue in a program. and the debugger will regain control. It is fairly common to program a debugger break instruction in the code so that a debugger gets a chance to stop. the processor issues this exception. The debugger can define a breakpoint event. . Fixing this is probably going to be tough. like a destroyed heap.
The page table is usually invisible to the process. 5/27/12 . Using a virtual memory mechanism. Paging In paging.Memory protection violation Segmentation Segmentation refers to dividing a computer's memory into segments. small pieces. Virtual memory makes it possible to have a linear virtual memory address space and to use it to access blocks fragmented over physical memory address space. each page can be made to reside in any location of the physical memory. the memory address space is divided into equal. A page table is used for mapping virtual memory to physical memory. as each new page can be allocated from anywhere in physical memory. called pages. Page tables make it easier to allocate new memory. or be flagged as being protected.
Stopping & Restarting Execution The most difficult exceptions have two properties: They occur within instructions They must be restartable When an exception occurs. After the exception-handling routine in the operating system receives control. Until the trap is taken. it saves the PC of the faulting instruction. it is impossible to recreate the state using a single PC since the instructions may not be sequentially related 5/27/12 . the pipeline control can do the following: Force a trap instruction into the pipeline on the next IF. turn off all writes for the faulting instruction and all the instructions that follow. When using delayed branching.
Exceptions in MIPS ----------------------------------------------------------------------------------------------LD IF ID EX MEM WB -----------------------------------------------------------------------------------------------DADD IF ID EX MEM WB 5/27/12 ------------------------------------------------------------------------------------------------This pair of instructions can cause a data page fault and an arithmetic exception at a same .
any control signal that may cause a data value to be written is turned off 5/27/12 . The MIPS Approach: Hardware posts all exceptions caused by a given instruction in a status vector associated with the instruction The exception status vector is carried along as the instruction goes down the pipeline Once an exception indication is set in the exception status vector. LD followed by DADD. The LD can get data page fault. seen when the DADD instruction is in IF. Since we are implementing precise exceptions. it can be handled independently. This case can be handled by dealing with only the data page fault and restarting the execution when the second exception occurs. the pipeline is required to handle the exception caused by the LD instruction first. Consider the instructions. The instruction in the position of the LD instruction i and the instruction in the position of the DADD instruction i+1. and the DADD can get an instruction page fault. seen when instruction is in MEM.
if any. will be handled according to the time they occurred Allowing an instruction to continue execution till the WB stage is not a problem since all write operations for that instruction will be disallowed Notes: The MIPS machine design does not allow exception to occur at the WB stage . Upon entering the WB stage the exception status vector is checked and the exceptions. All write operations in the MIPS pipeline are in late stages Machines that allow writing in early pipeline stages are difficult to handle since exceptions can occur after the machine state has been already changed 5/27/12 .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.