Professional Documents
Culture Documents
School of Computing
Vel Tech Rangarajan Dr. Sagunthala R&D Institute of
Science and Technology
Course Outcomes
CO
Course Outcomes K - Level
No’s
and Project
Management
(SEPM)
and Project
Management
(SEPM)
Cache Memory is a special very high-speed memory. The cache is a smaller and
faster memory that stores copies of the data from frequently used main memory
locations.
There are various different independent caches in a CPU, which store instructions
and data.
The most important use of cache memory is that it is used to reduce the average time
to access data from the main memory.
Cache memory faster than main memory.
Cache memory, also called CPU memory, It can access more quickly than it can
regular RAM.
and Project
This memory is typically integrated
Management directly with the CPU chip or placed on a
(SEPM)
separate chip that has a separate bus interconnect with the CPU.
CPU REFERENCE
In the above figure, you can see that the CPU wants to read or fetch the data or
instruction. First, it will access the cache memory as it is near to it and provides very
fast access. If the required data or instruction is found, it will be fetched. This situation
is known as a cache hit. But if the required data or instruction is not found in the cache
memory then this situation is known as a cache miss.
THE LOCALITY PRINCIPLE AND CACHING
1. Direct Mapping
• The simplest technique, known as direct mapping, maps each
block of main memory into only one possible cache line. or In
Direct mapping, assign each memory block to a specific line in
the cache.
• The cache is used to store the tag field whereas the rest is stored
in the main memory. Direct mapping`s performance is directly
proportional to the Hit ratio.
DIRECT MAPPED CACHES
• The cache logic interprets these s bits as a tag of s-r bits (the
most significant portion) and a line field of r bits. This latter
field identifies one of the m=2r lines of the cache. Line offset
is index bits in the direct mapping.
DIRECT MAPPING STRUCTURE
ASSOCIATIVE MAPPING
• This means that the word id bits are used to identify which word in
the block is needed, but the tag becomes all of the remaining bits.
This enables the placement of any word at any place in the cache
memory.
• = 2.25 + 28.25
• = 30.5 ns
• Note: AMAT can also be calculated as Hit Time + (Miss Rate * Miss
Penalty)
PERFORMANCE IMPROVEMENT
TECHNIQUES
• Example 2: Calculate AMAT when Hit Time is 0.9 ns, Miss Rate is
0.04, and Miss Penalty is 80 ns.
• Solution :
• Average Memory Access Time(AMAT) = Hit Time + (Miss Rate *
Miss Penalty)
• Here, Given,
• Hit time = 0.9 ns
Miss Rate = 0.04
• Miss Penalty = 80 ns
• Average Memory Access Time(AMAT) = 0.9 + (0.04*80)
• = 0.9 + 3.2
• = 4.1 ns
• Hence, if Hit time, Miss Rate, and Miss Penalty are reduced, the AMAT
reduces which in turn ensures optimal performance of the cache.
DRAM
ORGANIZATION:
• Dynamic random access memory (DRAM) is a type of semiconductor
memory that is typically used for the data or program code needed by a
computer processor to function.
Random Access
• In this method, any location of the memory can be
accessed randomly like accessing in Array. Physical
locations are independent in this access method.
ACCESS TECHNIQUES
• Direct Access
• In this method, individual blocks or records have a unique address based
on physical location. access is accomplished by direct access to reach a
general vicinity plus sequential searching, counting or waiting to reach
the final destination.
• This method is a combination of above two access methods. The access
time depends on both the memory organization and characteristics
of storage technology. The access is semi-random or direct.
ACCESS TECHNIQUES
• Associate Access
Each tile contains a subset of the overall computation and can be processed
independently or in parallel with other tiles.
Here are the key components and characteristics of a typical tile processor
architecture:
1.Grid Structure: The PEs are arranged in a grid structure, often referred to as
a mesh or a 2D array. The grid structure allows for easy interconnection and
communication between neighboring PEs.
2.Local Memory: Each PE has its own local memory, which is used to store
data and instructions specific to that PE. This local memory provides low-
latency access and enables fast data sharing between neighboring PEs.
2.Routers: Routers form the backbone of a NoC and are responsible for routing
packets of data across the network. They receive packets from input channels and
determine the appropriate path to forward the packets to the desired destination.
3.Topologies: NoCs can have various network topologies, determining the structure
and connectivity of the communication channels. Common topologies include
mesh, torus, ring, star, and tree, each with its own advantages and trade-offs in
terms of scalability, latency, and power consumption.
Routing Mechanism: The routing mechanism determines how data packets are
forwarded from the input ports to the desired output ports. NoC routers
employ routing algorithms or tables to make routing decisions based on packet
headers, destination addresses, and network conditions. Common routing
algorithms include deterministic routing, adaptive routing, and dimension-
ordered routing.
Crossbar Switch or Interconnect: The interconnect or crossbar switch connects
the input ports to the output ports, allowing for the exchange of data packets. It
enables the routing and switching of packets between different ports based on
the routing decisions made by the router. The interconnect is a critical
component that should have sufficient bandwidth to accommodate the traffic
demands of the NoC.
Arbitration: In situations where multiple input ports contend for the same
output port, arbitration mechanisms resolve conflicts and determine the order
of packet transmission. Different arbitration algorithms can be employed, such
as round-robin, prioritized, or fairness-based arbitration, to ensure fair and
efficient resource allocation.
Department of Computer Science and Engineering 1-86
Noc Router – Architecture and Design
NoCs are a paradigm for on-chip communication that replace traditional bus-
based communication architectures with a network of interconnected
communication nodes.
Methodologies Used In Noc Design :
3.Flow Control: Flow control mechanisms regulate the flow of data in the NoC
to prevent congestion and ensure reliable communication. Techniques like
credit-based flow control, wormhole routing, or virtual channel allocation are
commonly used to manage the data flow and handle contention.
5.Minimal Routing: Minimal routing aims to find the shortest path between
the source and destination routers. It selects the next hop that minimizes the
distance to the destination, considering the topology of the NoC. Minimal
routing algorithms are often used in combination with other routing
techniques to optimize performance.
2.Virtual Channel Flow Control: Virtual channel flow control assigns multiple
virtual channels to each physical link in the NoC. Each virtual channel
maintains its own buffers and control signals. It enables packets from different
traffic flows to be multiplexed onto the same physical link, reducing
congestion and providing isolation between different traffic classes.
3.Wormhole Routing: In wormhole routing, packets are divided into small flits
(flow control digits) and sent through the network as soon as the first flit arrives
at a router. The subsequent flits follow the same path, allowing pipelining of
packet transmission. Wormhole routing reduces latency but requires careful
management of flow control to avoid deadlocks and head-of-line blocking.
4.Store-and-Forward: Store-and-forward is a simple flow control technique
where the entire packet is received at a router before being forwarded to the next
router. This approach ensures that the packet is error-free and reduces the
chances of congestion. However, it introduces higher latency compared to other
flow control techniques.
5.Cut-Through Routing: Cut-through routing is a variant of wormhole routing
where flits are forwarded through the network as soon as they arrive at a router,
without waiting for the complete packet. This technique reduces latency but
requires additional mechanisms to handle flow control and ensure data integrity.