You are on page 1of 3

SREE SASTHA INSTITUTE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


ACADEMIC YEAR (2017-18 EVEN SEM)
CYCLE TEST - I - KEY ANSWERS
Sub Name: Multi-Core Architectures & Programming Date : 22.01.2018 (A.N)
Sub Code : CS6801 Duration : 1hr 45 min
Semester : VIII Max marks : 75
PART – A (25 * 2 = 50 MARKS)
1. Define Von Neumann Architecture. (CO-1)
The “classical” von Neumann architecture consists of main memory, a central processing unit (CPU) or processor or core, and an
interconnection between the memory and the CPU. Main memory consists of a collection of locations, each of which is capable of
storing both instructions and data.
2. What is Multi-core system? (CO-1)
Multiple, relatively simple, complete processors on a single chip. Such integrated circuits are called multicore processors.
3. Give the Flynn’s taxonomy. (CO-1)
SIMD, MIMD, MISD, SISD
4. What is task parallelism and data parallelism? (CO-1)
In task-parallelism, we partition the various tasks carried out in solving the problem among the cores. In data-parallelism, we partition
the data used in solving the problem among the cores, and each core carries out more or less similar operations on its part of the data.
5. What is a parallel program? (CO-1)
Parallel program is one that can make use of multiple cores.
6. What are the three different extensions to C Language? (CO-1)
Three different extensions to C: the Message-Passing Interface or MPI, POSIX threads or Pthreads, and OpenMP.
7. What is OpenMP and MPI? (CO-1)
OpenMP is a relatively high-level extension to C designed for programming shared-memory systems and MPI was designed for
programming distributed-memory systems.
8. What are Shared Memory Systems? (CO-1)
In a shared-memory system a collection of autonomous processors is connected to a memory system via an interconnection network,
and each processor can access each memory location. In a shared memory system, the processors usually communicate implicitly by
accessing shared data structures.
9. What are Distributed Memory Systems? (CO-1)
In a distributed-memory system, each processor is paired with its own private memory, and the processor-memory pairs
communicate over an interconnection network. So in distributed-memory systems the processors usually communicate explicitly by
sending messages or by using special functions that provide access to the memory of another processor.
10. Draw UMA and NUMA Architecture. (CO-1)

11. What is Vector Processor? (CO-1)


A vector processor can operate on arrays or vectors of data, while conventional CPUs operate on individual data elements or scalars.
12. List the characteristics of vector processors? (CO-1)
Vector registers
Vectorized and pipelined functional units.
Vector instructions
Interleaved memory
Strided memory access
Hardware Scatter/Gather
13. What are graphical processing Units? (CO-1)
Real-time graphics application programming interfaces, or APIs, use points, lines,and triangles to internally represent the surface of an
object. They use a graphics processing pipeline to convert the internal representation into an array of pixels that can be sent to a
computer screen.
14. Define Latency and Bandwidth. (CO-1)
The latency is the time that elapses between the source’s beginning to transmit the data and the destination’s starting to receive the first
byte. The bandwidth is the rate at which the destination receives data after it has started to receive the first byte.
15. Define the type of interconnects. (CO-1)
Switch interconnects
Crossbar interconnects
Direct interconnect – hypercube
Indirecxt interconnect – crossbar , omega network
Toroidal mesh
Ring interconnect
Fully connected network
16. Define bandwidth and bisection bandwidth? (CO-1)
The bandwidth of a link is the rate at which it can transmit data. It’s usually given in megabits or megabytes per second. Bisection
bandwidth is often used as a measure of network quality. It’s similar to bisection width. However, instead of counting the number of
links joining the halves, it sums the bandwidth of the links.
17. Define Cache Coherence Problem. (CO-1)
There is no mechanism for insuring that when the caches of multiple processors store the same variable, an update by one processor to
the cached variable is “seen” by the other processors. That is, that the cached value stored by the other processors is also updated. This
is called the cache coherence problem.
18. What are the two main approaches to cache coherence? (CO-1)
There are two main approaches to insuring cache coherence: snooping cache coherence and directory based cache coherence.
19. Describe working of snooping cache coherence. (CO-1)
The idea behind snooping comes from bus-based systems: When the cores share a bus, any signal transmitted on the bus can be “seen”
by all the cores connected to the bus. Thus, when core 0 updates the copy of x stored in its cache, if it also broadcasts this information
across the bus, and if core 1 is “snooping” the bus, it will see that x has been updated and it can mark its copy of x as invalid.
20. Describe Directory based coherence. (CO-1)
Directory-based cache coherence protocols attempt to solve this problem through the use of a data structure called a directory. The
directory stores the status of each cache line. Typically, this data structure is distributed; in our example, each core/memory pair might
be responsible for storing the part of the structure that specifies the status of the cache lines in its local memory.
21. Define False Sharing. (CO-1)
If n is large, we would expect that a large percentage of the assignments y[i] += f(i,j) will access main memory—in spite of the fact
that core 0 and core 1 never access each others’ elements of y. This is called false sharing, because the system is behaving as if the
elements of y were being shared by the cores. Note that false sharing does not cause incorrect results.
22. Define speed-up linear speed-up. (CO-1)
If we define the speedup of a parallel program to be S = Tserial / Tparallel. then linear speedup has S = p, which is unusual.
Furthermore, as p increases, we expect S to become a smaller and smaller fraction of the ideal, linear speedup p. If we call the serial
run-time Tserial and our parallel run-time Tparallel, then the best we can hope for is Tparallel / Tserial=p. When this happens, we say
that our parallel program has linear speedup.
23. State Amdahl’s law. (CO-1)

where
Slatency is the theoretical speedup in latency of the execution of the whole task;
s is the speedup in latency of the execution of the part of the task that benefits from the
improvement of the resources of the system;
p is the percentage of the execution time of the whole task concerning the part that benefits from the improvement of the resources
of the system before the improvement.
24. Define scalability. (CO-1)
If we can find a corresponding rate of increase in the problem size so that the program always has efficiency E, then the program is
scalable.
25. List the outline of steps in building Parallel programs. (CO-1)
Partitioning
Communication
Agglomeration or Aggregation
Mapping

MULTIPLE-CHOICE QUESTIONS (25 * 1 = 25 MARKS)


1. In a direct-mapped cache of eight words (1)10 (00001two) and (29)10 (11101two) map to locations
a)0ten (001two) and 5ten (101two) b)1ten (001two) and 4ten (101two) c)1ten (001two) and 5ten (101two) d)1ten (001two) and 6ten (101two)
Ans: c
2. Levels between CPU and main memory were given a name of
a) Hit time b)Miss rate c)Locality in time d)Cache
Ans:d
3. A queue holding data while data are waiting to be written in memory, is known as
a)Read buffer b)Queue buffer c)Write buffer d)Data buffer
Ans: c
4. Cache having 64 blocks and a block-size of 16 bytes, will have block-no for address 1200 map to
a)75 modulo 64 b)75 modulo 60 c)70 modulo 64 d)72 modulo 64
Ans: a
5. Having a cache block of 4words, having one-word-wide bank of DRAMs and miss penalty 65, then no of bytes transferred/bus-clock cycle for
a single miss will be
a)0.2 b)0.25 c)0.75 d)1.5
Ans: b
6. PerformanceX = 1/ Execution Time x given relation shows that
a)Performance is increased when execution time is decreased b)Performance is increased when execution time is increased c)Performance is
decreased when execution time is decreased d)None
Ans: a
7.Processor having Clock cycle of 0.25ns will have clock rate of
a)2GHz b)3GHz c)4GHz d)8GHz
Ans: c
8.If computer A execute a program in 10 seconds and computer B runs same in 15 seconds, how much faster is computer A than computer B
a)1.4 times b)1.5 times c)1 time d)5.1 times
Ans: b
9.For two computers X and Y, if performance of computer X is greater than performance of computer Y, we have
a)PerformanceX =2PerformanceX b)PerformanceX =PerformanceX c)PerformanceX < PerformanceY d) PerformanceX > PerformanceY
Ans: d
10. Total amount of work done during execution, in a given time is referred to as
a)Response time b)Execution time c)Through put d)Delay time
Ans: c
11. Processor having Clock cycle of 0.25ns will have clock rate of
a)2GHz b)3GHz c)4GHz d)8GHz
Ans: c
12.Straight-forward model used for memory consistency, is called
a)Sequential consistency b)Random consistency c)Remote node d)Host node
Ans: a
13.One that programming model which allows for having a more efficient implementation, is to suppose that programs are
a)Remote b)Synchronized c)Atomic d)Shared
Ans: b
14.Multiplication of these floating points (1.110)10 x (9.200)10 yields
a)11.212000ten b)10.212011ten c)10.212000ten d)10.112000ten
Ans: c
15. Given instruction multu $s2,$s3; shows multiplication of
a)Signed numbers b)Unsigned numbers c)Integers d)Whole number
Ans: b
16. MIPS architecture detects exception of overflow also known as
a)Interrupt b)Error c)Unconditional flow d)Trap
Ans: a
17. Two main types of branch instructions are
a)conditional branch b)unconditional branch c)logical branch d)both a and b
Ans: d
18. By default counters are incremented by
a)1 b)2 c)3 d)4
Ans: a
19. Memory that is called a read write memory is
a)ROM b)EPROM c)RAM d)Registers
Ans: c
20. Secondary storage memory is basically
a)volatile memory b)non volatile memory c)backup memory d)impact memory
Ans: b
21. A piece of data that can be transferred between CPU and backup storage is called
a)block b)stream c)cartridge d)gap
Ans: a
22. In fixed head discs, sum of rotational delay and transfer time is equals to
a)access time b)delay time c)processing time d)storage time
Ans: a
23. Devices which are used to receive data from central processing unit are classified as
a)output/input devices b)digital devices c)signaled devices d)output devices
Ans: d
24. Counter that holds addresses of next fetched instruction is called
a)sequence control register b)program counter c)temporary register d)both A and B
Ans: d
25.Alternative way of a snooping-based coherence protocol, is called a
a)Memory protocol b)Directory protocol c)Register protocol d)None of above
Ans: b

Faculty-in-charge HOD

You might also like