You are on page 1of 13



1. What are embedded computers? List their characteristics.
Embedded computers are computers that are lodged into other devices where the presence of
the computer is not immediately obvious. These devices range from everyday machine to handheld
digital devices. They have a wide range of processing power and cost.
2. Define Response time and Throughput.
Response time is the time between the start and the completion of the event. Also referred to
as execution time or latency. Throughput is the total amount of work done in a given amount of time.
3. Mention the use of transaction processing benchmarks
It measures the ability of system to handle transactions, which consists of database accesses
and updates. An airline reservation system and bank ATM are the examples of TP system.
4. State Amdalhs law.
Amdalhs law states that the performance improvement to be gained from using some faster
mode of execution is limited by the fraction of time the faster mode can be used.
5. What are toy benchmarks?
Toy benchmarks are typically between 10 and 100 lines of code and produce result the user
already knows before running the toy program. E,g puzzle.
6. What is profile based static modeling?
In this technique, a dynamic execution profile of the program, which indicates how often each
instruction is executed, is maintained.
7. Suppose that we are considering an enhancement to the processor of a server
system used for web serving. The new CPU is 10 times faster on computation in the web
serving application than the original processor. Assuming that the original CPU is busy with
computation 40% of the time and is waiting for I/O 60% of the time. What is the overall speedup
gained by incorporating the enhancement? Fraction enhanced = 0.4 Speedup
enhanced = 10 Speedup
overall = 1/(0.6+0.4/10) =1/0.64 = 1.56
8. Explain the different types of locality.
Temporal locality, states that recently accessed items are likely to be accessed in the near

Spatial locality, says that items whose addresses are near one another tend to be referenced
close together in time.
9. Name the addressing modes used for signal processing?
Modulo or circular addressing mode Bit reverse addressing mode.
10. Specify the CPU performance equation.
CPU time = Instruction Count x Clock cycle Time x cycles per instruction
11. Explain the hybrid approach for encoding an instruction set?
The hybrid approach reduces the variability in size and work of the variable architecture but
provide multiple instruction lengths to reduce code size.
12. What are the registers used for MIPS processors.
MIPS has 34, 64-bit general purpose registers (GPRs), named R0, R1R31. GPRs are
sometimes called as integer registers. There are also a set of 32 floating point registers (FPRs), named
F0,F1.F31, which can hold 32 single precision values and 32 double precision values.
13. Explain the concept behind pipelining.
Pipelining is an implementation technique whereby multiple instructions are overlapped in
execution. It takes advantage of parallelism that exists among actions needed to execute an
14. Write about pipe stages and processor cycle.
Different steps in an instruction are completed in different parts of different instruction is
parallel. Each of these steps is called a pipe stage or pipe segment. The time required between moving
an instruction one step down the pipeline is called processor cycle.
15. Explain pipeline hazard and mention the different hazards in pipeline.
Hazards are situations that prevent the next instruction in the instruction stream from
executing during its designated clock cycle. Hazards reduce the overall performance from the ideal
speedup gained by pipelining. The three classes of hazards are,
Structural hazards.
Data hazards.
Control hazards.
16. Explain the concept of forwarding.
Forwarding can be generalized to include passing a result directly to the functional unit that
fetches it. The result is forwarded from the pipeline register corresponding to the output of one unit to
the input of the same unit.
17. Mention the different schemes to reduce pipeline branch penalties.

a. Freeze or flush the pipeline

b. Treat every branch as not taken
c. Treat every branch as taken d. Delayed branch
18. Consider an unpipelined processor. Assume that it has a 1ns clock cycle and that it uses 4
cycles for ALU operations and branches and 5 cycles for memory operations. Assume that the
relative frequencies of these operations are 40%, 20% and 40% respectively. Suppose that due
to clock skew and setup, pipelining the processor adds 0.2 ns of overhead to the clock. Ignoring
any latency impact, how much speedup in the instruction execution rate will we gain from a
The average instruction execution timeon an unpipelined processor is
= clock cycle x Average CPI = 1 ns x ((40% x 4)+(20 x 4)+(40 x 5)) = 4.4 ns.
The average instruction execution time on an pipelined processor is = 1+0.2ns = 1.2ns Speedup =
Avg. instruction time unpipelined/ Avg.
instruction time pipelined = 4.4/1.2 = 3.7 times.
19. Briefly explain the different conventions for ordering the bytes within a larger object?
Little endian byte order puts the byte whose address is x.x000 at the least significant
position in the double word. Big endian byte order puts the byte whose address is x.x000 at the
most significant position in the double word.
20. When do data hazards arise?
Data hazards arise when an instruction depends on the results of a previous instruction in a
way that is expressed by the overlapping of instructions in the pipeline.
1. Discuss the different ways how instruction set architecture can be classified? Stack Architecture
Accumulator Architecture Register-Memory Architecture Register-Register Architecture
2. Explain about memory addressing and discuss the different addressing modes in instruction set
architecture? Little endian, and big endian Register, immediate, displacement, register indirect,
Indexed, direct or absolute, Memory indirect, auto increment, auto decrement. Scaled addressing
3. Explain the operands and operation used for media and signal processing? Operands - Fixed point,
Blocked floating point Operations - Partitioned add operation, SIMD (single instruction multiple
data), paired single operation,
4. Explain with examples the various hazards in pipelining? Data hazard, structural hazards, Control
5. Discuss in detail about data hazards and Explain the technique used to overcome data hazard? Data
hazards how does it occur with an example Forwarding technique

1. List the various data dependence.
Data dependence
Name dependence
Control Dependence
2. What is Instruction Level Parallelism?
Pipelining is used to overlap the execution of instructions and improve performance. This
potential overlap among instructions is called instruction level parallelism (ILP) since the instruction
can be evaluated in parallel.
3. Give an example of control dependence?
if p1 {s1;}
if p2 {s2;}
S1 is control dependent on p1, and s2 is control dependent on p2
4. What is the limitation of the simple pipelining technique?
These technique uses in-order instruction issue and execution. Instructions are issued in
program order, and if an instruction is stalled in the pipeline, no later instructions can proceed.
5. Briefly explain the idea behind using reservation station?
Reservation station fetches and buffers an operand as soon as available, eliminating the need
to get the operand from a register.
6. Give an example for data dependence.
Loop: L.D F0,0(R1) ADD.D F4,F0,F2 S.D F4,0(R1) DADDUI R1,R1,#-8 BNE R1,R2, loop
7. Explain the idea behind dynamic scheduling?
In dynamic scheduling the hardware rearranges the instruction execution to reduce the stalls
while maintaining data flow and exception behavior.
8. Mention the advantages of using dynamic scheduling?
It enables handling some cases when dependences are unknown at compile time and it
simplifies the compiler. It allows code that was compiled with one pipeline in mind run efficiently on
a different pipeline.
9. What are the possibilities for imprecise exception?

The pipeline may have already completed instructions that are later in program order than
instruction causing exception. The pipeline may have not yet completed some instructions that are
earlier in program order than the instructions causing exception.
10. What are multilevel branch predictors?
These predictors use several levels of branch-prediction tables together with an algorithm for
choosing among the multiple predictors.
11. What are branch-target buffers?
To reduce the branch penalty we need to know from what address to fetch by end of IF
(instruction fetch). A branch prediction cache that stores the predicted address for the next instruction
after a branch is called a branch-target buffer or branch target cache.
12. Briefly explain the goal of multiple-issue processor?
The goal of multiple issue processors is to allow multiple instructions to issue in a clock
cycle. They come in two flavors: superscalar processors and VLIW processors.
13. What is speculation?
Speculation allows execution of instruction before control dependences are resolved.
14. Mention the purpose of using Branch history table?
It is a small memory indexed by the lower portion of the address of the branch instruction.
The memory contains a bit that says whether the branch was recently taken or not.
15. What are super scalar processors?
Superscalar processors issue varying number of instructions per clock and are either statically
scheduled or dynamically scheduled.
16. Mention the idea behind hardware-based speculation?
It combines three key ideas: dynamic branch prediction to choose which instruction to
execute, speculation to allow the execution of instructions before control dependences are resolved
and dynamic scheduling to deal with the scheduling of different combinations of basic blocks.
17. What are the fields in the ROB?
Instruction type Destination field Value field Ready field
18. How many branch selected entries are in a (2,2) predictors that has a total of 8K bits in a
prediction buffer?
number of prediction entries selected by the branch = 8K number of prediction entries
selected by the branch = 1K
19. What is the advantage of using instruction type field in ROB?
The instruction field specifies whether instruction is a branch or a store or a register operation

20. Mention the advantage of using tournament based predictors?

The advantage of tournament predictor is its ability to select the right predictor for right
1. What is instruction-level parallelism? Explain in details about the various dependences caused in
ILP? Define ILP Various dependences include Data dependence, Name dependence and control
2. Discuss about tomasulos algorithm to over come data hazard using dynamic scheduling? Dynamic
scheduling with an example. How data hazard is caused. Architecture Reservation station.
3. Explain how to reduce branch cost with dynamic hardware prediction? Basic branch prediction and
prediction buffers Correlating branch predictors Tournament based predictors
4. Explain how hardware based speculation is used to overcome control dependence? Ideas in
hardware based speculation dynamic scheduling, speculation, dynamic branch prediction.
Architecture and example Reorder buffer.
5. Explain the limitations of ILP?
Hardware model Window size and maximum issue count Realistic branch and jump prediction Effects
of finite registers Effects of imperfect alias analysis
1. What is loop unrolling?
A simple scheme for increasing the number of instructions relative to the branch and overhead
instructions is loop unrolling. Unrolling simply replicates the loop body multiple times, adjusting the
loop termination code.
2. When static branch predictors are used?
They are used in processors where the expectation is that the branch behavior is highly
predictable at compile time. Static predictors are also used to assists dynamic predictors.
3. Mention the different methods to predict branch behavior?
Predict the branch as taken Predict on basis of branch direction (either forward or backward)
Predict using profile information collected from earlier runs.
4. Explain the VLIW approach?
They uses multiple, independent functional units. Rather than attempting to issue multiple,
independent instructions to the units, a VLIW packages the multiple operations into one very long
5. Mention the techniques to compact the code size in instructions?

Using encoding techniques Compress the instruction in main memory and expand them when
they are read into the cache or are decoded.
6. Mention the advantage of using multiple issue processor?
They are less expensive. They have cache based memory system. More parallelism.
7. What are loop carried dependence?
They focuses on determining whether data accesses in later iterations are dependent on data
values produced in earlier iterations; such a dependence is called loop carried dependence. e.g
for(i=1000;i>0;i=i-1) x[i]=x[i]+s;
8. Mention the tasks involved in finding dependences in instructions?
Good scheduling of code. Determining which loops might contain parallelism Eliminating
name dependence
9. Use the G.C.D test to determine whether dependence exists in the following loop:
for(i=1;i<=100;i=i+1) X[2*i+3]=X[2*i]*5.0;
Solution: a=2,b=3,c=2,d=0 GCD(a,c)=2 and d-b=-3 Since 2 does not divide -3, no
dependence is possible.
10. What is software pipelining?
Software pipelining is a technique for reorganizing loops such that each iteration in the
software pipelined code is made from instruction chosen from different iterations of the original loop.
11. What is global code scheduling?
Global code scheduling aims o compact code fragment with internal control structure into the
shortest possible sequence that preserves the data and control dependence. Finding a shortest possible
sequence is finding the shortest sequence for the critical path.
12. What is trace?
Trace selection tries to find a likely sequence of basic blocks whose operations will be put
together into a smaller number of instructions; this sequence is called trace.
13. Mention the steps followed in trace scheduling?
Trace selection Trace compaction
14. What is superblock?
Superblocks are formed by a process similar to that used for traces, but are a form of extended
basic block, which are restricted to a single entry point but allow multiple exits.
15. Mention the advantages of predicated instructions?
Remove control dependence Maintain data flow enforced by branch Reduce overhead of
global code scheduling

16. Mention the limitations of predicated instructions?

They are useful only when the predicate can be evaluated early. Predicated instructions may
have speed penalty.
17. What is poison bit?
Poison bits are a set of status bits that are attached to the result registers written by the
speculated instruction when the instruction causes exceptions. The poison bits cause a fault hen a
normal instruction attempts to use the register.
18. What are the disadvantages of supporting speculation in hardware?
Complexity Additional hardware resources required
19. Mention the methods for preserving exception behavior?
Ignore Exception Instructions that never raise exceptions are used Using poison bits Using
hardware buffers
20. What is an instruction group?
It is a sequence of consecutive instructions with no register data dependence among them. All
the instructions in the group could be executed in parallel. An instruction group can be arbitrarily
1. Explain loop unrolling with an example? Loop unrolling technique Example
2. Discuss about the VLIW approach? VLIW approach basic idea. Limitations and problems.
Technical problem and logistical problem
3. Explain the different techniques to exploit and expose more parallelism using compiler support?
Loop-Level parallelism Software pipelining Global code scheduling Trace scheduling and
4. Explain how hardware support for exposing more parallelism at compile time? Conditional or
predicated instructions Compiler speculation with hardware support
5. Differentiate hardware and software speculation mechanisms? Control flow Exception conditions
1. What is cache miss and cache hit?
When the CPU finds a requested data item in the cache, it is called cache miss. When the
CPU does not find that data item it needs in the cache, a cache miss occurs.

2. What is write through and write back cache?

Write through- the information is written to both the block in the cache and to the block in the
lower level memory. write back- The information is written only to the block in the cahce. The
modified cache block is written to main memory only when it is replaced.
3. What is miss rate and miss penalty?
Miss rate is the fraction of cache access that result in a miss. Miss penalty depends on the
number of misses and clock per miss.
4. Give the equation for average memory access time?
Average memory access time= Hit time + Miss rate x Miss penalty
5. What is striping?
Spreading multiple data over multiple disks is called striping, which automatically forces
accesses to several disks.
6. Mention the problems with disk arrays?
When devices increases, dependability increases Disk arrays become unusable after a single
7. What is hot spare?
Hot spares are extra disks that are not used in normal operation. When failure occurs, an idle
hot spare is pressed into service. Thus, hot spares reduce the MTTR.
8. What is mirroring?
Disks in the configuration are mirrored or copied to another disk. With this arrangement data
on the failed disks can be replaced by reading it from the other mirrored disks.
9. Mention the drawbacks with mirroring?
Writing onto the disk is slower Since the disks are not synchronized seek time will be
different Imposes 50% space penalty hence expensive.
10. Mention the factors that measure I/O performance measures?
Diversity Capacity Response time Throughput Interference of I/o with CPU execution
11. What is transaction time?
The sum of entry time, response time and think time is called transaction time.
12. State littles law?
Littles law relates the average number of tasks in the system. Average arrival rate of new asks.
Average time to perform a task.

13. Give the equation for mean number of tasks in the system?
Mean number of arrival in the system = Arrival rate x Mean response time.
14. What is server utilization?
Mean number of tasks being serviced divided by service rate Server utilization = Arrival
Rate/Server Rate The value should be between 0 and 1 otherwise there would be more tasks arriving
than could be serviced.
15. What are the steps to design an I/O system?
Nave cost-performance design and evaluation
Availability of nave design
Response time
Realistic cost-performance, design and evaluation
Realistic design for availability and its evaluation.
16. Briefly discuss about classification of buses?
I/O buses - These buses are lengthy ad have any types of devices connected to it. CPU
memory buses They are short and generally of high speed.
17. Explain about bus transactions?
Read transaction Transfer data from memory Write transaction Writes data to memory
18. What is the bus master?
Bus masters are devices that can initiate the read or write transaction. E.g CPU is always a
bus master. The bus can have many masters when there are multiple CPUs and when the Input
devices can initiate bus transaction.
19. Mention the advantage of using bus master?
It offers higher bandwidth by using packets, as opposed to holding the bus for full transaction.
21. What is spilt transaction?
The idea behind this is to split the bus into request and replies, so that the bus can be used in
the time between request and the reply
1. Explain the different technique to reduce cache miss penalty? Multiple caches Critical word first
and early restart Giving priority to read misses over writes. Merging write buffers Victim caches

2. Explain the different technique to reduce miss rate? Larger block size Larger caches Higher
associativity Way prediction and pseudoassociative caches Compiler optimization
3. Discuss how main memory is organized to improve performance? Wider main memory Simple
interleaved memory Independent memory banks
4. Explain the various levels of RAID? No redundancy Mirroring
Bit-interleaved parity Block- interleaved parity P+Q redundancy
5. Explain the various ways to measure I/O performance? Throughput versus response time Little
queuing theory
1. What are multiprocessors? Mention the categories of multiprocessors?
Multiprocessors are used to increase performance and improve availability. The different
categories are SISD, SIMD, MISD, MIMD
2. What are threads?
These are multiple processors executing a single program and sharing the code and most of
their address space. When multiple processors share code and data in the way, they are often called
3.What is cache coherence problem?
Two different processors have two different values for the same location.
4. What are the protocols to maintain coherence?
Directory based protocol Snooping Protocol
5. What are the ways to maintain coherence using snooping protocol?
Write Invalidate protocol
Write update or write broadcast protocol
6. What is write invalidate and write update?
Write invalidate provide exclusive access to caches. This exclusive caches ensure that no
other readable or writeable copies of an item exists when the write occurs. Write update updates all
cached copies of a data item when that item is written
7. What are the disadvantages of using symmetric shared memory?
Compiler mechanisms are very limited Larger latency for remote memory access Fetching
multiple words in a single cache block will increase the cost.

8. Mention the information in the directory?

It keeps the state of each block that are cached. It keeps track of which caches have copies of
the block.
9. What the operations that a directory based protocol handle?
Handling read miss Handling a write to a shares clean cache block
10. What are the states of cache block?
Shared, Uncached, Exclusive
11.What are the uses of having a bit vector?
When a block is shared, the bit vector indicates whether the processor has the copy of the
block. When block is in exclusive state, bit vector keep track of the owner of the block.
12. When do we say that a cache block is exclusive?
When exactly one processor has the copy of the cached block, and it has written the block.
The processor is called the owner of the block.
13. Explain the types of messages that can be send between the processors and directories?
Local node Node where the requests originates Home Node Node where memory location
and directory entry of the address resides. Remote Node - The copy of the block in the third node
called remote node
14. What is consistency?
Consistency says in what order must a processor observe the data writes of another processor.
15. Mention the models that are used for consistency?
Sequential consistency Relaxed consistency model
16. What is sequential consistency?
It requires that the result of any execution be the same, as if the memory accesses executed
by each processor were kept in order and the accesses among different processors were interleaved.
17. What is relaxed consistency model?
Relaxed consistency model allows reads and writes to be executed out of order. The three
sets of ordering are: W-> R ordering W->W ordering R->W and R-> R ordering.
18. What is multi threading?
Multithreading allows multiple threads to share the functional uits of the single processor in
an overlapping fashion.

19. What is fine grained multithreading?

It switches between threads on each instruction, causing the execution of multiple threads to
be interleaved.
20. What is coarse grained multithreading?
It switches threads only on costly stalls. Thus it is much less likely to slow down the execution of an
individual thread.
1. Explain the snooping protocol with a state diagram? Basic schemes for enforcing cache coherence
Cache coherence definition Snooping protocol An example protocol state diagram
2. Explain the directory based protocol with a state diagram? Directory based cache coherence
protocol basics Example directory protocol
3. Define synchronization and explain the different mechanisms employed for synchronization among
processors? Basic hardware primitive Implementing locks using coherence Barrier synchronization
Hardware and software implementation
4. Discuss about the different models for memory consistency? Sequential consistency model Relaxed
consistency model
5. How is multithreading used to exploit thread level parallelism within a processor? Multithreading
definition Fine grained multithreading Coarse grained multithreading Simultaneous multithreading