You are on page 1of 8



1. Define computer architecture

Computer architecture is defined as the fundamental operation of the individual unit. Computer
architecture is a specification detailing how a set of software and hardware technology standards interact
to form a computer system or platform

2. Define computer H/W

Computer H/W is the electronic circuit and electro mechanical equipment that constitutes the computer.
Computer hardware is the collection of all the parts you can physically touch

3. What is meant by cache memory?

A memory that is smaller and faster than main memory and that is interposed between the CPU and main
memory. The cache acts as a buffer for the recently used memory location

4. What is IO mapped input output?

A memory referenced instruction activated the READ or WRITE control line and doesnot affect the IO
device. Separate IO instruction are required to activate the READIO and WRITE IO lines, which cause a
word to be transferred between the address of IO port and CPU.

5. Specify three types of DMA transfer techniques

 Single transfer mode

 Block Transfer Mode
 Demand Transfer mode
 Cascade Mode

6. Differentiate between RISC and CISC

Simple instructions take one cycle per operation Complex instruction takes multiple cycles per
Few instruction and address modes are used Many instruction and address modes
Fixed format instructions are used Variable format instructions are used
Complexity in compiler Complexity in micro program
Highly pipelined Not pipelined

7. What are embedded computers?

Embedded computers are computers that are lodged into other devices where the presence of computer is
not immediately obvious. These devices range from every day machine to handheld digital devices. They
have a wide range of processing power and cost.
8. Define CPI

The term Clock cycles Per Instruction which is the average number of clock cycles each instruction takes
to execute is abbreviated as CPI

CPI= CPU clock cycles/ Instruction count

9. Mention different schemes to reduce pipeline penalties

 Freeze or flush the pipeline

 Treat every branch as not taken
 Treat every branch as taken delayed branch

10. What is ILP?

Pipelining is used to overlap the execution of instruction and improve performance. This potential overlap
among instructions is called Instruction Level Parallelism (ILP). The instructions can be evaluated in

11. List the advantages of dynamic scheduling

 Handles cases when dependencies unknown at compile time

eg. Because they may involve a memory reference
 It simplifies the compiler

12. What is loop level parallelism?

The simplest and common way to increase the ILP is to exploit the parallelism among iterations of a loop.
This type of parallelism is often called loop level parallelism

13. What is the limitation of VLIW?

 Very smart compiler needed

 Loop unrolling increases code size
 Unfilled slot waste bits
 Cache miss stalls whole pipeline

14. What is superscalar processor?

Superscalar- multiple instructions issued per cycle

 Statistically scheduled
 Dynamically scheduled
15. What is locality of reference?

Many instruction in localized area of the program, are executes repeatedly during some time period and
the remainder of the program is accessed relatively infrequently called as locality of reference

16. Why is refreshing circuit needed?

All cells on the corresponding row to be read and refreshed during both read and write operation. The
contents of DRAM are maintained each row of cell must be accessed periodically once every 2-16 ms.

17. What is the function of control unit?

The memory arithmetic and logic input and output units store and process information and perform I/O
and O/P operations of these unit must be co-ordinate in some way this is the task of control unit and is
effectively the nerve center that sends the control signal to other units.

18. What is interrupt?

An interrupt is an event that causes the execution of one program to be suspended and another program to
be executed.

19. Explain pipeline stages and processor cycle

Different steps in an instruction are completed in different parts of different instruction is parallel. Each of
these steps is called pipe stage, the time required between moving the instruction one step down the
pipeline is called processor cycle

20. Define latency

The term latency is used to refer the amount of time it takes to transfer a word of data to or from the
memory. It is used to denote the time it takes to transfer the first word of data,. This time is usually longer
that the time needed to transfer each word of a block.

21. Specify CPU performance equation

CPU time = Instruction Count * Clock Cycle Time * Cycles per Instruction

22. Define computer organization

Computer organization refers to the operational units and their interconnections that realize the
architectural specifications. Eg. Hardware details transparent to the programmer such as control signals
and interface between the computer and peripherals and memory.

23. What is speculation?

Hardware based speculation follows unpredicted flow of data values to choose when to execute
instructions. This method of executing programs is essentially a data flow execution. Operations executes
as soon as their operands are available.
24. Differentiate between Pipelining and ILP

Pipelining ILP
Single functional unit Multiple functional Unit
Instructions are issued sequentially Instructions are issued in parallel
Throughput increased by overlapping the instruction Instructions are not overlapped but executed in
execution parallel in multiple functional units
Very little extra hardware requires to implement Multiple functional units within CPU are required

25. What is dynamic scheduling?

It is defined as hardware rearranges instruction execution to reduce stalls and it allows instruction behind
stall to process.

26. What is ideal pipeline?

It is the measure of maximum performance attained. By minimizing each term the overall pipeline CPI
gets reduced and increases Instruction per clock.

27. What are the types of multiple issue processors?

 Statistically scheduled superscalar processor

 VLIW( Very long Instruction word) processor
 Dynamically scheduled superscalar processor

28. List out various dependences

 Data dependence
 Name dependence
 Control dependence

29. What is loop unrolling

A simple scheme for increasing the number of instructions relative to the branch and overhead
instructions is loop unrolling. It simply replicates the loop body multiple times adjusting loop termination

30. What is a vector input?

vector processor is a central processing unit (CPU) that implements an instruction set containing
instructions that operate on one-dimensional arrays of data called vectors. Vector input is a set of inputs
provided to a system in order to test that system.
29. What is a vector processor?

Vector processor is the machines built to handle large scientific and engineering calculations. Their
performance derives from a heavily pipelined architecture which operations on vector and matrices are
effectively exploited.

30. Define vector mask register

Mask register provides conditional execution for each element operation in a vector instruction. The
vector mask control uses a Boolean vector to control the execution of Test instruction.

31. What is strip mining?

Strip mining is the generation of code such that each vector operation is done for a size equal or less than
maximum vector length

32. Define CUDA thread and thread block

 CUDE thread is composed of SIMD instructions

 A thread is associated with each data element
 Thread are organized into blocks

33. What is loop carry dependency?

Loop level parallelism determine whether data accesses in later iterations are dependent on data values
produced in earlier iterations such as dependence is called loop carried dependence.

34. Define function of streaming SIMD Extension (SSE)

SSE is an SIMD instruction set extension designed by Intel and the goal of these extension has been to
accelerate carefully written libraries rather than for compiler to generate them.

35. Name the hardware scheduler used in GPU

Thread Block Scheduler- determines the number of thread blocks needed for loop and keep assigning
them in different multithreaded SIMD processor until loop gets completed

Thread Scheduler- defines which threads of SIMD instruction are ready to run and then it send them off to
a dispatch unit to be run on multithread SIMD processor.

36. What is SIMT in GPU

GPU uses a single instruction multiple thread model where individual scalar instruction streams for each
CUDA thread are grouped together for SIMD execution on hardware.

37. What is multiprocessor and list its categories?

Multiprocessor is used to increase the performance and improve availability. The different categories are
38. List parameters to optimize performance of vector architecture

 Increase memory bandwidth

 Strip Mining
 Vector Chaining
 Multiple parallel lanes or pipes

39. What are threads?

These are multiple processors executing a single program and sharing the code and most of their address
space. The way of multiple processor share code and data in the way called as threads.

40. List ways to maintain coherence using snooping protocol

 Write Invalidate protocol

 Write Update or Write broadcast protocol

41. What is consistency?

Consistency says in what order a processor must observe the data writes of another processor

42. What is sequential consistency?

It requires that the result of any execution be the same, as if the memory access executes by each
processor were kept in order and the accesses among different processors were interleaved,

43. What is fine grained multithreading?

It switches between threads on each instruction, causing the execution of multiple threads to be

44. What is coarse grained multithreading?

It switches threads only on costly stalls. Thus it is much less likely to slow down the execution of an
individual thread.

45. What are the challenges in parallel processing?

 The first hurdle is to do with the limits parallelism available in the programs and the second arises
from the high cost of communications.
 The second major challenge in parallel processing involves the large latency of remote access in
parallel processor.

46. Which protocol is more suited for distributed shared memory architecture?

The protocols to maintain coherence for multiple processors are called cache coherence protocols. There
are two classes of protocols use different techniques to track the sharing status

 Directory based
 Snooping
47. What is cache miss and cache hit?

When a CPU finds a requested data item in a cache it is called cache miss. When the CPU does not find
that data item it needs in the cache called cache miss

48. What is miss rate and miss penalty?

Miss rate is the fraction of cache access that results in a miss. Miss penalty depends on the number of
misses and clock per miss.

49. What is striping?

Spreading multiple data over multiple disks is called striping which automatically forces access to several

50. What is hot spare?

Hot spares are extra disks that are not used in normal operation. When failure occurs, and idle hot spare is
pressed into services. Thus hot spare reduce the MTTR.

51. What is server utilization?

Mean number of tasks being serviced divided by service rate server utilization= Arrival rate/ Server rate.
The value should be between 0 and 1, otherwise there would be more task arriving than could be serviced.

52. What is bus master?

Bus masters are services that initiate the read or write transactions. Eg. CPU is always a bus master. The
bus can have many masters when there are multiple CPUs and when the input devices can initiate bus

53. What are the basic cache optimizations?

 Reduces miss rate

 Larger block size
 Bigger cache
 Higher associativity
 Reduces miss penalty

54. What is sequential interleaving?

A simple mapping that works well is to spread the address of the block sequentially across the banks
called sequential interleaving.

55. What is non blocking cache?

Non blocking cache allows data cache to continue to supply cache hits during a miss requires out-of-order
execution CPU

56. How is memory latency related?

Memory latency is traditionally quoted using two measures

 Access time - Access time is the time between when a read is requested and when the desired
word arrives.
 Cycle Time - Cycle Time is the minimum time between requests.

57. What is RAID stands for?

RAID stands for Redundant Array of Inexpensive Disks.