M1-CS405 Computer System Architecture-Ktustudents - in

CS 405
COMPUTER SYSTEM
KTUStudents.in
ARCHITECTURE
SUFYAN P
Assistant Professor
sufyan@meaec.edu.in
Computer Science and engineering
MEA Engineering College, Perinthalmanna
For more study materials: WWW.KTUSTUDENTS.IN

TEXT BOOK:
K. Hwang and Naresh
KTUStudents.in
Jotwani, Advanced
Computer Architecture,
Parallelism,
Scalability,
Programmability, TMH,
2010.

Introduction to advanced computer
architecture
❖ Computer Organization:
• It refers to the operational units and their interconnections
that realize the architectural specifications.
• It describes the function of and design of the various units of
digital computer that store and process information
KTUStudents.in
❖ Computer hardware:
• Consists of electronic circuits, displays, magnetic and optical
storage media, electromechanical equipment and
communication facilities.
❖ Computer Architecture:
• It is concerned with the structure and behavior of the
computer.
• It includes the information formats, the instruction set and
techniques for addressing memory.

Introduction to advanced computer
architecture
Syllabus
• Basic concepts of parallel computer models
• SIMD computers
KTUStudents.in
• Multiprocessors and multi-computers
• Cache Coherence Protocols
• Multicomputers
• Pipelining computers and Multithreading

MODULE-1
KTUStudents.in

CONTENTS
• Parallel computer models
o Evolution of Computer Architecture
o System Attributes to performance.
•
•
KTUStudents.in
Amdahl's law
Multiprocessors and Multicomputers,
• Multivector and SIMD computers,
• Architectural development tracks
• Conditions of parallelism.

Evolution of computer
KTUStudents.in
architecture

INTRODUCTION
• Study of computer architecture involves
both
o Hardware organization
o Programming
• Evolution of computer architecture started
KTUStudents.in
with Von Neumann Architecture
o Build as a sequential machine
o Executing scalar data
• Major advancement came due to following
techniques
o Look-ahead technique
o Parallelism & pipelining
o Flynn’s classification
o Parallel / vector computers

Look-ahead Technique
• Introduced for enabling instruction prefetching
• Used to overlap I/E operations
KTUStudents.in
o I/E➔ instruction fetch and execute
• Enable functional parallelism

o Different functions are distributed & they are performed concurrently
by processes or threads across different processors

Look-ahead Technique[2]
KTUStudents.in

Flynn’s classification
• Classification is based on
o Instruction stream
o Data streams
• Classifications are:-
KTUStudents.in
o SISD(Single instruction stream over single data stream)
• E.g.: conventional sequential machines
o SIMD(single instruction stream over multiple data stream)
• E.g.: Vector computers
o MIMD
• E.g.: parallel computers
o MISD
• E.g.: special purpose computers

Pipelining
• Pipelining is a technique where multiple instructions are
overlapped during execution.
• Pipeline is divided into stages and these stages are
KTUStudents.in
connected with one another to form a pipe like structure.
• Instructions enter from one end and exit from another
end.
• Pipelining increases the overall instruction throughput.

Parallel computers
• Executions are carried out simultaneously
• 2 classes
o Shared memory multiprocessors
KTUStudents.in
o Message passing multicomputers
• Distinction lies in
o Memory sharing
o Interprocessor communication

Shared memory multiprocessor
• Processors in the multiprocessor system uses common
memory
• They communicate using shared variables
KTUStudents.in

Message passing multicomputer
• Each computer node in the multicomputer system has a
local memory
KTUStudents.in
• Interprocessor communication done via message passing

Vector processor
• It is a processor whose instructions operate on
vector data
• 2 families of vector processor
o Memory to memory
o Register to register
KTUStudents.in
• Memory to memory architecture
o supports pipelined flow of vector operands directly from memory to
pipelines & then back to the memory
• Register to register architecture

o Uses registers to interface between memory & pipelines

KTUStudents.in

System attributes to performance
• Ideal performance of a system demands perfect
matching between
o machine capability
o program behavior
• Machine capability can be enhanced via
KTUStudents.in
o Better h/w technology
o Innovative architectural features
o Efficient resource management
• Factors affecting program behavior
o Algorithm design
o Data structures
o Language efficiency
o Programmer skill
o Compiler technology

Performance factors
• Cycle time τ
• Clock rate f
o f= 1/τ
• CPI (cycles per instruction)

•
•
KTUStudents.in
Instruction count Ic
Processor cycle p
• Memory cycle m
• Ratio between memory cycle & processor cycle k

• Cycle time
o Time taken to complete once clock cycle
• Clock rate
o Inverse of cycle time
• Instruction count
o No: of machine instructions to be executed in a program
KTUStudents.in
• CPI (cycles per instruction)
o No: of cycles taken to execute one instruction

CPU time (T)
• CPU time is the time needed to execute a program
• It depends on following factors
o Ic
o CPI
o Cycle time
KTUStudents.in
• T = Ic * CPI* τ ………………..( 1)
• Instruction execution involves a cycle of events like

o Instruction fetch ➔ memory access
o Decode ➔ carried out by CPU
o Operand fetch ➔ memory access
o Execution ➔ carried out by CPU
o Store result ➔ memory access

CPI (cycles per instruction)
• It can be divided into 2 component terms
o Processor cycles p
o Memory cycles m
• Eq (1) can be replaced as

KTUStudents.in
o T= Ic * (p+m*k)* τ……..(2)
• p➔ no: of processor cycles needed for inst decode & execution
• m➔ no: of memory reference needed
• k➔ ratio between memory and processor cycles
• τ➔ processor cycle time

• Memory cycle
o Time needed to complete one memory reference
o Denoted as m
o k➔ depends on
• speed of cache
• memory technology
KTUStudents.in
• Processor memory interconnection scheme

• C ➔ total no: of clock cycles needed to execute a program (n
instructions)
• CPI➔ no: of clock cycles needed to execute single instruction
• CPI= C ………(3)
KTUStudents.in
Ic
• The eq1.1 can be rewritten as following
• T= Ic * CPI* τ ➔ T=Ic *C*τ
Ic
➔ T= C * τ ……….(4)
➔ T= C ………….(5)
f

System attributes
• 5 Performance factors( Ic, p,m,k, τ) are
influenced by 4 system attributes
o Instruction set architecture
o Compiler technology
o
o
KTUStudents.in
CPU implementation & control
Cache and memory hierarchy
• The instruction-set architecture affects the program

length (Ic) and processor cycle needed (p).
• The compiler technology affects the values of IC , p, and
the memory reference count (m).

MIPS rate (million instructions per
second)
• MIPS rate is based on following factors
o Clock rate f
o Instruction count Ic
o CPI of given machine
• MIPS rate= Ic ………(6)
KTUStudents.in
T * 106
• In terms of eq 1.1, the above eq can be rewritten as
following
Ic ➔f …..(7) ➔ f
Ic * CPI * τ * 106 CPI * 106 C/Ic * 106
➔ Ic * f ………(8)
MIPS rate = C * 106

Throughput rate
• CPU throughput Wp
• It is the measure of how many programs can be executed
per second, based on the MIPS rate & average program
length
KTUStudents.in
• Wp = f ……..(9)
Ic * CPI

Problem 1
KTUStudents.in

solution
KTUStudents.in

• Total no: of cycles required to execute complete
program
➔ 45000+64000+30000+16000
➔ 155000 cycles
• C=155000 cycles
KTUStudents.in
• Effective CPI= C/Ic
➔ 155000/100000
CPI = 1.55

• MIPS rate= f/CPI*106
= 40/ 1.55*106
=40*106/1.55*106
KTUStudents.in
= 25.8

• Given f=40 MHz ➔ τ =1/40
• T = Ic * CPI * τ
• = 100000*1.55*1/40
KTUStudents.in
• =100000*1.55*0.025
• =3875
• = 3.875 ms

Problem 2
KTUStudents.in

Floating point operations per second
• Most computer applications use floating point
operations
• For those application, performance is measured using
FLOPS
KTUStudents.in
• Flops➔ no : of floating point operations per second

Implicit parallelism
o In this approach conventional languages like C,C++, Fortran is
used to write the source program
o Sequentially coded source program is translated to a parallel
object code
KTUStudents.in
o This is done by compiler
o Compiler must be able to detect parallelism

Explicit parallelism
• More effort is needed by the programmer
• Source program is developed using the parallel dialects
of C,C++,fortran
• Parallelism is explicitly specified in the source program
KTUStudents.in
• Compiler need not detect parallelism
• Reduce the burden of compiler

KTUStudents.in

MULTIPROCESSORS &
KTUStudents.in
MULTICOMPUTERS

Introduction
• Parallel computers are divided into 2
o Shared memory multiprocessors
o Message passing multicomputers
KTUStudents.in
• Their difference is based on memory
o One has shared common memory
o Other has unshared distributed memory

Shared memory multiprocessor
• 3 models
o UMA model (uniform memory access)
o NUMA model (non-uniform memory access)
KTUStudents.in
o COMA model (cache only memory architecture)
• These models differ in how the memory &

peripheral resources are shared or distributed

UMA model
• Physical memory is uniformly shared by all the
processors
• All processors have equal access time
• Peripherals are also shared
KTUStudents.in
• Due to this high degree of resource sharing,
multiprocessors are also called as tightly coupled
systems
• Communication & synchronization b/w processors are
done via shared variables
• System interconnection is done using
o Bus
o Crossbar switch
o Multistage network

UMA model
KTUStudents.in

UMA model
• UMA is sometimes called CC-UMA - Cache Coherent

UMA.
KTUStudents.in
• Cache coherent means if one processor updates a
location in shared memory, all the other processors
know about the update

Advantages
• Suitable for general purpose & time sharing
applications by multiple users
KTUStudents.in
• Speed up the execution of a single large program
in time critical applications

Symmetric Vs asymmetric
multiprocessor system
• Symmetric multiprocessor system
o All processor have equal access to all peripheral devices
o All processor are equally capable of running executive
KTUStudents.in
programs such as OS kernel, I/O routines
• Asymmetric multiprocessor system

o Only one or subset of processors are executive-capable
o Master processor (MP) can execute OS and I/O routines
o Remaining have no I/O capabilities
o Remaining processors are called as attached processors
(AP)
o AP executes user codes under the supervision of master
processors
NUMA model
• It is a shared memory system in which the access time
varies with the location of memory word
• There are two NUMA models
KTUStudents.in
o Shared local memory model
o Hierarchical cluster model

Shared local memory model
• Shared memory is physically distributed to all
processors
• They are called as local memories
KTUStudents.in
o Collection of local memories forms a global address space
o This is accessible by all processors
• Access variations
o Access to the local memory attached with a local processor is faster
o Access of remote memory attached with other processors takes longer
time
• This is due to delay in interconnection n/w

KTUStudents.in

KTUStudents.in

KTUStudents.in

Hierarchical cluster model
• Processors are divided into several clusters
• Each cluster itself is a UMA or NUMA multiprocessor
• Clusters are connected to global shared memory
modules (GSM)
KTUStudents.in
o All clusters have equal access to global memory
• All processors belonging to same cluster uniformly
access the cluster shared memory modules (CSM)
• Access time to CSM is shorter than GSM
• Access rights to inter cluster memories can be specified
in various ways

KTUStudents.in

COMA model
• Its is a multiprocessor using only the cache memory
• Special case of NUMA
• Distributed main memories are converted to caches
KTUStudents.in
• No memory hierarchy
• All caches form a global address space
• Remote cache access is assisted by distributed cache
directories

KTUStudents.in

Representative multiprocessors
KTUStudents.in

• Also called as distributed- memory multicomputer
• This system consist of multiple computers known as
nodes
• Nodes are interconnected by message passing n/w
KTUStudents.in
• Each node is an autonomous computer consisting
o Processor
o Local memory
o Attached disks or I/O peripherals

KTUStudents.in

• Message passing n/w provide point to point static
connections among nodes
• All local memories are private & are accessible only by
KTUStudents.in
local processors
• Therefore they are also called as no-remote-memory
access machines (NORMA)
• Inter-node communications are carried out by message
passing

Advantages
• Scalability
• Fault tolerance
• Suitable for certain applications
KTUStudents.in

Representative multicomputer
KTUStudents.in

MULTIVECTOR AND SIMD
COMPUTERS
KTUStudents.in

Introduction
• In this section we introduce super computer and parallel
processors for vector processing and data parallelism
KTUStudents.in
• Supercomputers are classified as
o Vector supercomputers
• Uses powerful processors equipped with vector hardware
o SIMD supercomputers
• Provide massive data parallelism

Supercomputers
KTUStudents.in

Vector supercomputers
• A vector computer is build on top of a scalar processor
o Ie a vector processor is attached to the scalar processor
• Host computer loads the program & data to the main

memory
KTUStudents.in
• Scalar control unit decodes all the instructions
• If the decoded instruction is a scalar operation or a

program control operation,
o it is directly executed by the scalar processor
o Execution is done using scalar functional pipelines

KTUStudents.in

• If the decoded instruction is a vector operation,
o It is sent to vector control unit
• Vector control unit

o It supervise the flow of vector data b/w main memory & vector
functional pipelines
o It coordinates the vector data flow
KTUStudents.in

Vector processor models
• Register to register architecture
o Vector registers are used to hold the following
• Vector operands
• Intermediate vector results
KTUStudents.in
• Final vector results
o Vector functional pipelines receives operands from
this registers & put the results back to these registers
o All vector registers are programmable
o Each vector register has a component counter
• It keeps track of component registers used in
successive pipeline cycles

Vector processor models
• Memory to memory architecture
KTUStudents.in
• A vector stream unit is used instead of vector registers
• Vector operands & results are directly retrieved from
and stored into the main memory
• Eg: Cyber 205

Examples of vector supercomputers
KTUStudents.in

SIMD Supercomputers
• Computers with multiple processing elements (PE)
• They perform the same operation on multiple data
points simultaneously,
KTUStudents.in
• Operational model of SIMD computers are specified
using 5 tuple
• M=(N,C,I,M,R)

SIMD Supercomputers
KTUStudents.in

• N➔ no of processing elements(PE) in the machine
• C➔ set of instructions directly executed by CU
• I➔ set of instructions broadcasted by CU to all PE’s for
parallel execution
o This include arithmetic, logic, data routing operations
KTUStudents.in
executed by each PE over the data within that PE
• M➔ set of masking schemes
o Each mask partitions set of PE’s to enabled & disabled
subsets
• R➔ set of data routing functions
o Used in interconnection n/w for inter PE
communications

Examples of SIMD supercomputer
KTUStudents.in

Amdahl’s law[1]
• It is named after computer scientist Gene Amdahl( a
computer architect from IBM and Amdahl corporation),
It is also known as Amdahl’s argument.
• It is a formula which gives the theoretical speedup in
KTUStudents.in
latency of the execution of a task at a fixed workload that
can be expected of a system whose resources are
improved.
• In other words, it is a formula used to find the maximum
improvement possible by just improving a particular
part of a system.
• It is often used in parallel computing to predict the
theoretical speedup when using multiple processors.

Amdahl’s law[2]
• Speed-up
• Speedup is defined as the ratio of performance for the
KTUStudents.in
entire task using the enhancement and performance for
the entire task without using the enhancement
• OR
• Speedup can be defined as the ratio of execution time
for the entire task without using the enhancement and
execution time for the entire task using the
enhancement.

Amdahl’s law[3]
• If Pe is the performance for entire task using the
enhancement when possible,
• Pw is the performance for entire task without using the
enhancement,
KTUStudents.in
• Ew is the execution time for entire task without using
the enhancement and
• Ee is the execution time for entire task using the
enhancement when possible then,
• Speedup = Pe/Pw
or
Speedup = Ew/Ee

Amdahl’s law[4]
• Amdahl’s law uses two factors to find speedup from
some enhancement –
• Fraction enhanced
KTUStudents.in
• Speedup enhanced

Amdahl’s law[5]
• Fraction enhanced –
• The fraction of the computation time in the original computer
that can be converted to take advantage of the enhancement.
KTUStudents.in
• For example- if 10 seconds of the execution time of a program
that takes 40 seconds in total can use an enhancement , the
fraction is 10/40. This obtained value is Fraction Enhanced.
• Fraction enhanced is always less than 1. (<1)

Amdahl’s law[6]
• Speedup enhanced –
• The improvement gained by the enhanced execution
mode; that is, how much faster the task would run if the
KTUStudents.in
enhanced mode were used for the entire program.
• For example – If the enhanced mode takes, say 3 seconds
for a portion of the program, while it is 6 seconds in the
original mode, the improvement is 6/3. This value is
Speedup enhanced.
• Speedup Enhanced is always greater than 1. (>1)

Amdahl’s law[7]
KTUStudents.in

KTUStudents.in
THANK YOU

CONDITIONS OF PARALLELISM
KTUStudents.in

Introduction
• To execute several program segments in parallel, each
segment has to be independent of the other segment
• Dependency is the main challenge of parallelism
• Dependence graph shows the relation between program
KTUStudents.in
statements
o Nodes of dependence graph➔ program statements
o Directed edges with labels➔ relations among the
statements

Types of dependences
• Data dependence
KTUStudents.in
• Control dependence
• Resource dependence

DATA DEPENDENCE
• Ordering relationship b/w statements are
indicated by data dependence
• Types
KTUStudents.in
o Flow dependence
o Anti-dependence
o Output dependence
o I/O dependence
o Unknown dependence

Flow dependence
• A statement S2 is flow dependent on statement S1,
• if an execution path exists from S1 to S2, and if at least
one output of S1 is fed as an input to S2
• Denoted as S1➔S2
o Eg: consider following instruction
KTUStudents.in
• S1: LOAD R1,A
• S2: ADD R2,R1
• S2 is flow dependent on S1
o Coz o/p of S1 is fed as i/p of S2
o Ie variable A is passed into register R1

Anti-dependence
• Statement S2 is anti-dependent on statement S1 if,
• S2 follows S1 in program order and if o/p of S2 overlaps
the input to S1
• Denoted using a direct arrow crossed with a bar
• S1
KTUStudents.in
S2
o Eg: consider the following statements
o S2: ADD R2,R1
o S3: move R1,R3
• S3 is anti-dependent on S2 since S3 is overlapping the

input to S2
• Ie conflicts in the register content of R1

Output dependence
• Two statements are output dependent if they produce
the same output variable
• Denoted as S1 S2
• Eg:
KTUStudents.in
• S1: LOAD R1,A
• S3: MOVE R1,R3
• S3 is output dependent on S1 coz they both modify the

same register R1

I/O dependence
• I/O dependence occurs if same file is referenced by
both I/O statements
• Read and write are the I/O statements
• Eg:
•
•
S1:
S2:
KTUStudents.in
READ(4),A(i)
PROCESS
• S3: WRITE(4),B(I)
• S4: CLOSE(4)
• S1 and S3 are I/O dependent since both access the

same file
Control dependence
• Implies that order of execution of statements cannot be
determined before runtime
• Conditional statements will not be resolved until
runtime
KTUStudents.in
• Conditional branching may eliminate or introduce data
dependencies
• Control dependency prohibits parallelism
• Solution
o Compiler techniques
o Hardware Branch prediction technique

Resource dependence
• It is concerned with the conflicts in using shared
resources in parallel areas
• Shared resources are
o Integer units
o Floating point units
KTUStudents.in
o Registers
o Memory areas
• When conflicting resource is ALU➔ALU dependence

• If conflict involve workplace storage➔ storage
dependence

Hardware and software parallelism
• To implement parallelism we require
o Hardware support
o Software support
• Joint efforts from hardware designers & software
KTUStudents.in
programmers are required to exploit parallelism

Hardware parallelism
• This is a type of parallelism defined by machine
architecture & hardware multiplicity
• It is a function of cost & performance tradeoffs
• Indicates peak performance of processor resources
KTUStudents.in
• One way to characterize parallelism in processor
o No: of inst issues per machine cycle
• If a processor issues k inst per cycle➔ k-issue
processor
• If a processor issues 1 inst per cycle➔ one-issue
processor
o Eg: conventional pipelined processor
• i960CA➔ three-issue machine
Software Parallelism
• This type of parallelism is revealed in program profile or
program flow graph
o Flow graph displays the simultaneously executable operations
• Software parallelism is a function of
KTUStudents.in
o Algorithm
o Programming style
o Program design
• Types of software parallelism

o Control parallelism
o Data parallelism

Control Parallelism
• Allows 2 or more operations to be performed
simultaneously
• Control parallelism is achieved using
o pipelining
KTUStudents.in
o Multiple functional units
• Limitations
o Length of pipeline
o Multiplicity of functional units
• Pipelining & functional parallelism are handled by h/w

• Programmers need not take any special efforts to invoke
them

Data Parallelism
• Same operation is performed over many data elements
by many processors simultaneously
• Offers highest potential for concurrency
• Practiced in SIMD & MIMD modes
KTUStudents.in
• Data parallel code is easy to write & debug than control
parallel code

Role of compilers
• Compiler techniques are used to improve performance
• Early compilers which exploits parallelism are:-
o CDC STACKLIB
o Cray CFT
KTUStudents.in
• Features included in existing compilers to improve
parallelism are:-
o Loop transformation
o s/w pipelining
• To exploit parallelism to the most,

o Design the compiler & h/w jointly at the same time

M1-CS405 Computer System Architecture-Ktustudents - in

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

M1-CS405 Computer System Architecture-Ktustudents - in

Uploaded by

Copyright:

Available Formats

CS 405

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

• Enable functional parallelism

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

• Register to register architecture

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

• CPI (cycles per instruction)

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

• Instruction execution involves a cycle of events like

For more study materials: WWW.KTUSTUDENTS.IN

• Eq (1) can be replaced as

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

• The instruction-set architecture affects the program

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

• These models differ in how the memory &

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

• UMA is sometimes called CC-UMA - Cache Coherent

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

• Asymmetric multiprocessor system

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN

For more study materials: WWW.KTUSTUDENTS.IN