You are on page 1of 32

DATA COMMUNICATION AND COMPUTER

NETWORKS

PROJECT REPORT
NETWORK ON CHIP

Submitted To: Dr. Muhammad Kaleem Ullah

Submitted By: Registration #:


Amna Shafa Irfan Ciit/Sp12-Bet-006/Isb
Ayisha Nisar Ciit/Sp12-Bet-011/Isb
Hifza Sajid Ciit/Sp12-Bet-029/Isb
Maleeha Altaf Ciit/Sp11-Bet-041/Isb
Marryam Nawaz Ciit/Sp12-Bet-043/Isb

Class: Bet-6A
Date: 09/01/2015
ABSTRACT
This report represents a basic Network on Chip (NOC) structure. A scalable NOC
system consisting of 5 port routers in a 16- core mesh has been designed,
analyzed and implemented for the performance evaluation in terms of latency, of
deterministic routing algorithm. All the work has been done on Xlinx Software
using Verilog Language.
TABLE OF CONTENTS
1) Introduction
 System on Chip
 Problem Background
 Objective

2) Background
 On Chip vs. Off Chip Design
 On Chip Interconnection Types
o Point to Point Interconnection
o Single Bus Interconnection
o Hierarchal/Segmented Bus Interconnection
o Crossbar Interconnection

3) Network on Chip
 Basic Concept
 Features of NoC
o Links
o Routers
o Network Adaptor(NA) or Network Interface(NI)
 Front End
 Back End
 Performance Evaluation
o Quantitative Terms
 Network Equilibrium
 Bandwidth
 Throughput
 latency
 Area
 Power
o Qualitative Terms
 Quality of Service
(i) Best-Effort (BE) NoCs
(ii) Guaranteed Services (GS) NoCs
 Load Balancing
 Reconfigurability
(i) Static Part
(ii) Reconfigurable Part
 Fault tolerance
 Problems Addressed by NoC
 Network Abstraction
 Network Architectures
 NoC Building blocks
o Topologies
o Routing Algorithms
o Routing Mechanisms
o Switching
o Flow Control
 Handshake Flow Control
 Credit Flow Control
o Network Interface
o Router Architecture

4) Design
 Routing Algorithm
 Deadlock
 Livelock

5) Applications
 Telecommunication Systems and NoC

6) Conclusion

7) Future work

8) References

9) Appendix
 Main Code
 Test Bench
1) INTRODUCTION
With the expansion of multi-core technology, researchers in the field have exposed critical
scalability issues and limiting bandwidth performance of customary bus-based interconnect
architectures and point-to-point links. This major flaw has led Network-on-Chip (NoC)
interconnects to become an increasingly popular solution. To comprehend the origins of the
NoC, the evolution of processing core technology will be highlighted along with the ensuing
problems which will lead onto discussion of why NoCs are considered to be a viable solution to
accommodate the ever growing communication demands of future multi-core systems.

 System on Chip (SoC)


The sustained increase in the transistor count due to Moore’s Law resulted in the need for a
System-on-Chip (SoC) design. SoC is a design that integrates several Intellectual Property (IP)
cores as a complete system on a single chip; where IP cores could be embedded processors,
memory blocks, input-output elements, dedicated hardware cores etc. SoC design is becoming
more intricate over time due to the increasing number of IP cores that can be integrated over a
single chip. The number of IP cores integrated onto the single chip is mainly limited by current IP
core interfacing technology. In a bus-based platform, the performance of SoC design decreases
when the number of interconnections in between cores increases and it causes difficulty for
reliable inter-IP on-chip communication. Hence, a new design slant for SoC is very much
essential to maintain system scalability without performance degradation, permitting SoC
designer to reuse IPs and communication infrastructure.

Traditional Soc using Shared Bus

 Problem Background
Integrated circuits are prone to errors both in fabrication and utilization, this equates to
potential profit loss for companies and expensive costs for consumers.
o Fabrication Failures

A continued reduction in chip feature sizes have made the fabrication process increasingly
difficult to perfect since process variations have become magnified; leading to greater defect
rates and significantly poor yield .The integrated circuit (IC) manufacturing process is an
extremely delicate process that involves several stages each of which must achieve minimal
contamination. Working in the scale of tens of nanometers has lowered the precision of control
and can result in imprecise impurity deposition and non-uniform fields leading to transistor
malfunction. A report by the ITRS predicts percentage variation in 22nm technology to be three
times greater than compared to 45 - 65nm at 18% compared to 5%.

o Electro-Migration

The effect of operation time on chips causes an elemental ageing phenomenon named electro-
migration (EM). The EM process gradually diffuses metal ions in wires during current flow and
causes them to thin, increasing their local resistance. Ultimately, the resistance becomes too
large to hold and results in EM link failure. With present day chip designs housing a greater
number of interconnections per chip, the possibility of EM failures are becoming more probable.

Additional superfluous hardware into designs has been implemented to cope with failures. It
was shown that including extra wires and cores in the initial design allowed repair during
manufacture and also post-production. It was revealed that with one spare wire per link the
percentage of chips ready for market could be increased by up to 10%. Even though redundancy
does improve yield, the increased manufacturing costs of extra area overhead can outweigh the
yield savings.
A software based solution which uses packet flooding algorithms was also investigated. In these
algorithms, injected packets are sent to all neighbors with a probability ‘p’ and are dropped with
probability ‘1-p’. The value of ‘p’ must be carefully selected to achieve both, near flooding
performance and minimize transmission of redundancies. Eventually, these solutions are
probabilistic. Henceforth, packets are not guaranteed to reach the destination. The final decree
for flooding methods is that they are limited to low injection rates due to a large communication
overhead.

 Objective
Research on Network on Chip and implement a 16-core 2D mesh using simple XY routing
algorithm.
2) BACKGROUND
This section places NoCs into context by detailing the characteristics of existing interconnects
and leads onto why a new paradigm is essential for sustainability.

 On-Chip vs. Off-Chip Design


On-Chip Design is sensitive to cost in terms of area and power. Wires are relatively cheap and
latency is critical.Traffic may be known a-priori. It is a time specialized design.

In an Off-Chip design, cost is in the links. Latency is tolerable .Traffic/applications are unknown.
It changes at runtime. It is an adherence to networking standards .

 On-Chip Interconnection Types

o Point-to-Point

It represents a direct link between exactly two interfaces.

o Single Bus

Traditional bus architectures form broadcast mediums in which a single shared backbone
enables intra-chip communication. Nevertheless, there is often a need for dedicated point-to-
point links for critical signals which, when scaled can result in an exponential increase of wiring
and prolonged design cycles. Therefore, wiring can contribute to excessive power consumption
and extremely poor scalability. Moreover, with multiple processors there are significant
increases in latency due to contention.
A bus architecture depicting system bottleneck due to a single point arbitration scheme.
These designs are only efficient when accommodating fewer than five cores.

o Hierarchical Bus / Segmented Bus

To alleviate arbitration bottleneck, hierarchical bus structures were adopted to partition


communication domains into layers that operate at different frequencies. This also optimizes
power usage because buses at the bottom of the hierarchy function at low frequencies and
connect high latency modules. Examples of advanced bus architectures are the ARM AMBA and
the IBM Core Connect which are widely used today. Effective consideration of power is
important because increasing number of components means increased capacitive load .Overall,
optimization is limited since bus architectures are broadcast networks and so data must reach
all bus-connected modules.
o Crossbar

Buses were developed further by incorporating crossbars to form a switch fabric. This overcame
the problem of increased non-parallelism by enabling concurrent transactions to occur through
simulation of point-to-point interconnections. Although this facilitates bandwidth
improvements, the number of links increases exponentially with rising core count toting up to
the ever restricting area overhead and power consumption. This translates to the intricacy of
crossbars scaling as the number of ports squared. Crossbars can also suffer from cross-chip wire
delays that limit clock frequencies due to large RC delays and quadratic scaling with distance,
this eventually forces cross-chip communication latencies up to tens of cycles in sub-hundred
nanometer technology. These conclusions show that, although crossbars overcome limitations
of the bus design, they ultimately have poor scalability.

Buses and crossbars have been the trusted communication providers for multiprocessor
technology since their introduction but, with new generations of complex designs appearing
there is a requirement for a future sustainable interconnection framework. A change that has
been commonly accepted is, the transition towards a shared and segmented global
communication architecture since it can provide flexibility and scalability. This structure is
described as a data-routing network; a NoC.

3) NETWORK ON CHIP
 Basic Concept
As the number of IP modules in Systems-on-Chip (SoCs) increases, bus-based interconnection
architectures may prevent these systems to meet the performance required by many
applications. For systems with rigorous parallel communication requirements buses may not
provide the required bandwidth, latency, and power consumption. A solution for such a
communication holdup is the use of an embedded switching network, called Network-on-Chip
(NoC), to interconnect the IP modules in SoCs. NoCs design space is significantly larger when
compared to a bus-based solution, as different routing and arbitration strategies can be
implemented. In addition, NoCs have an innate redundancy that helps tolerate faults and deals
with communication bottlenecks. NoC is a platform for system integration, debugging and
testing.

Closer look at a NoC Hardware

 Features of NoC
NoC has the following three features:

o Links
The first and most important one is the links that physically connect the nodes and actually
implement the communication.

o Routers
The second block is the router, which implements the communication protocol (the
decentralized logic behind the communication protocol). The router basically receives packets
from the shared links and, according to the address informed in each packet, it forwards the
packet to the core attached to it or to another shared link. The protocol itself consists of a set of
policies defined during the design (and implemented within the router) to handle common
situations during the transmission of a packet, such as, having two or more packets arriving at
the same time or disputing the same channel, avoiding deadlock and livelock situations,
reducing the communication latency, increasing the throughput, etc.
The design and implementation of a router requires the definition of a set of policies to deal
with packet collision, the routing itself, and so on. A NoC router is composed of a number of
input ports (connected to shared NoC channels), a number of output ports (connected to
possibly other shared channels), a switching matrix connecting the input ports to the output
ports, and a local port to access the IP core connected to this router, we use the terms router
and switch as synonymous, but the term switch can also mean the internal switch matrix that
actually connects the router inputs to its outputs.

o Network Adaptor (NA) or Network Interface (NI)


The last building block is the network adapter (NA) or network interface (NI). This block makes
the logic connection between the IP cores and the network, since each IP may have a distinct
interface protocol with respect to the network. This block is important because it allows the
separation between computation and communication. This allows the reuse of both, core and
communication infrastructure independent of each other.
The adapter can be divided into two parts:

 Front End
The front end handles the core requests and is ideally unaware of the NoC. This part is
usually implemented as a socket.

 Back End
The back end part handles the network protocol (assembles and disassembles the packet,
reorders buffers, implements synchronization protocols, helps the router in terms of
storage, etc).
 Performance Evaluation
The performance of a NoC can be evaluated on the basis of quantitative and qualitative terms
which are briefly discussed as follows:

o Quantitative Terms

 Network Equilibrium

The network is initialized with vacant buffers and idle resources for ease of design purpose.
Initial packets injected into the network will therefore experience minimal contention and exit
with low latency. After a certain time period, buffers begin to populate and increase overall
packet latency. Eventually, the initialization effect diminishes and the network reaches steady
state. A similar phenomenon occurs at the end of the simulation where the last remaining
packets exit the network and endure less traffic because packet injection stop this is known as
the drain phase. Latency measurements within these periods introduce systematic errors
therefore, to minimize their influence; measurements will be taken during steady state
operation. To do this, generated packets will be flagged with a ‘measure’ bit once the warm-up
phase has passed, the test-bench sink will assess this and only record measurements when it is
logic high. Testing will be monitored to ensure packets injections continue long enough for all
the measurement packets to reach their destinations. This continual injection is required up
until the very last measurement packet so that it experiences similar background traffic
interactions as the previous packets did throughout the test.

 Bandwidth

The bandwidth refers to the maximum rate of data propagation once a message is in the
network. The unit of measure for bandwidth is bit per second (bps) and it usually considers the
whole packet, including the bits of the header, payload and tail.

 Throughput

Throughput is defined as the maximum traffic accepted by the network, that is, the maximum
amount of information delivered per time unit. The throughput measure is messages per second
or messages per clock cycle. One can have a normalized throughput (independently from the
size of the messages and of the network) by dividing it by the size of the messages and by the
size of the network. As a result, the unit of the normalized throughput is bits per node per clock
cycle (or per second).

 Latency

Latency is the time elapsed between the beginning of the transmission of a message (or packet)
and its complete reception at the target node. Latency is measured in time units and mostly
used as comparison basis among different design choices. In this case, latency can also be
expressed in terms of simulator clock cycles. Normally, the latency of a single packet is not
meaningful and one uses the average latency to evaluate the network performance. On the
other hand, when some messages present a much higher latency than the average, this may be
important. Therefore the standard deviation may be an interesting measure as well.

 Area
As the non-segmented buses operate at a very slow frequency and has no parallelism, it has to
be made excessively wide in order to provide the same effective bandwidth as the NoC. As a
result, its width grows and its length grows, so that its total area cost function increases.
Average link frequency is slower than in the NoC (longer links with higher capacitance). The, link
length grows and since the link width is asymptotically one, its total area also grows.

 Power
Power dissipated by all architectures is proportional to the product of operating frequency and
total wire length. Advantage of NoC, assuming a uniform traffic distribution and also assuming
that load capacitance depends only on the interconnect (ignoring the capacitance of system
module ports). Moreover it is clear that non-uniform, mostly-local traffic favors NoC, as does the
inclusion of input port capacitance. In more advanced VLSI technology generations the
capacitance and delay of long interconnect wires becomes even more dominant. As the
technology improves, NoC is the only communication architecture where the links become
shorter and less vulnerable to delays and noise.

 Qualitative Terms

 Quality of Service

Quality of service (QoS) is defined “as service quantification that is provided by the network to
the demanding core”. Quality of service is also about predictability of the communication
behavior. One must first define the required services and a quantification measure. Typically,
services are defined based on the performance metrics of the network (low latency, high
throughput, low power, etc.). Then, one must define implementation mechanisms to meet the
core’s demands using the network services.

There are two types of NoCs with respect to QoS:

(i) Best-Effort (BE) NoCs

These NoCs offer no commitment. For most NoCs this means that only completion of the
communication is ensured. Best effort NoCs tends to present a better utilization of the
available resources.

(ii) Guaranteed Services (GS) NoCs

Those NoCs ensure that some services requirements can be accomplished. Commitment can
be defined in several levels, such as correctness of the result, completion of the transaction,
bounds on performance, and so on.
Due to the characteristics of the on-chip networks and the systems based on those
structures, service guarantees and predictability are usually given through hardware
implementations, as opposed to statistical guarantees of macro-networks. In order to give
hard guarantees, one must use specific routing schemes (such as virtual channels, adaptive
routing, fault-tolerant router and link structures, etc.) to ensure the required levels of
performance and reliability.
In guaranteed services NoCs, costs and complexity grows with the complexity of the system
and the QoS requirements.

 Load Balancing

Interconnection networks-on-chip (NOCs) are rapidly replacing other forms of interconnect in


chip multiprocessors and system-on-chip designs. Existing interconnection networks use either
oblivious or adaptive routing algorithms to determine the route taken by a packet to its
destination. Even though rather higher implementation complexity, adaptive routing enjoys
better fault tolerance characteristics, increases network throughput, and decreases latency
compared to oblivious policies when faced with non-uniform or bursty traffic. However,
adaptive routing can hurt performance by disturbing any inherent global load balance through
greedy local decisions.

 Reconfigurability

The communication infrastructure of the proposed architecture is based on the Network on


Chip (NoC) paradigm in order to exploit a 2-layered approach, in which the computational layer
is completely decoupled from the communication-layer; the proposed reconfigurable
architecture mainly consists of two different parts:

(i) Static Part

The static part consists of all the computational elements and the network interfaces. On
the one hand, computational elements can be further divided into two categories. The first
one consists of masters, which are the active components of the system, such as
microprocessors that can initialize new transactions on the network these components are
connected to the communication infrastructure. The second one consists of slaves, such as
memories, that represent the components that act in a passive mode, by receiving and
answering transaction coming from active elements.

(ii) Reconfigurable Part

The reconfigurable part is composed by all the reconfigurable elements, used to adapt at
run-time the structure of the system implemented on the FPGA. These elements can be
either computational components or elements used to update the communication
infrastructure. Network interfaces toward the communication infrastructure can implement
bridges between On-chip Peripheral Bus (OPB), Processor Local Bus (PLB) or Open Core
Protocol (OCP) and the network protocol. The only part of network interfaces (both initiator
and target network interfaces) that has to be modified at run-time is routing tables that are
used to dynamically change the routing of packets on the network. Thus, all the network
interfaces have been placed into the static part of the system and routing tables have been
deployed on BRAM blocks.

o Fault Tolerance

As technology scales, fault tolerance is becoming a key concern in on-chip communication. Two
different flooding algorithms and a random walk algorithm are investigated. The flood-based
fault tolerant algorithms have an exceedingly high communication overhead. We find that the
redundant random walk algorithm offers significantly reduced overhead while maintaining
useful levels of fault tolerance. We then compare the implementation costs of these algorithms,
both in terms of area as well as in energy consumption, and show that the flooding algorithms
consume an order of magnitude more energy per message transmitted. It offers many
advantages:

(i) Avoids costly packet retransmissions


(ii) Avoids catastrophic data loss
(iii) Can increase chip yield
(iv) Allows higher speed operation

 Problems Addressed by NoC

 Global interconnect design problem:


a) NoC and Global wire delay
 Long wire delay is dominated by Resistance
 Addition of repeaters is required
 Repeaters become latches (with clock frequency scaling)
 Latches evolve to NoC routers

b) Wire design for NoC

NoC links:

 Regular

 Point-to-point (no fan out tree)

 Can use transmission-line layout

 Well-defined current return path

c) NoC Scalability

d) NoC and Communication Reliability


e) NoC and GALS
 Modules in NoC use different clocks
 May use different supply voltages
 NoC can handle synchronization
 NoC design may be asynchronous
 No waste of power when the links and routers are idle

 System integration

Productivity problem

 NoC eliminates ad-hoc global wire engineering

 NoC separates computation from communication

 NoC is a complete platform for system integration, debugging and testing

 Multicore Processors

Key to power-efficient computing

 Uniprocessors cannot provide Power-efficient performance growth

 Interconnect dominates dynamic power

 Global wire delay doesn’t scale

 ILP is limited

 Power-efficiency requires many parallel local computations

 Multicore chip

 Thread-Level Parallelism (TLP)

 Network is a natural choice for Multicore

 Network Abstraction

An abstraction layer or abstraction level is a way of hiding the implementation details


of a particular set of functionality, allowing the separation of concerns to
facilitate interoperability and platform independence.
 Software layers

o Application Layer

Includes both application and Presentation layers of OSI model

Issues At Application Layer

• Minimizing the end-to-end latency

• Making network communication “transparent”

• Support for different traffic types

o Network & transport layers

 Network topology e.g. crossbar, ring, mesh, torus, fat tree,…

 Switching Circuit / packet switching(SAF, VCT), wormhole

 Addressing Logical/physical, source/destination, flow, transaction

 Routing Static/dynamic, distributed/source, deadlock avoidance

 Quality of Service e.g. guaranteed-throughput, best-effort

 Congestion control, end-to-end flow control

Issues at Transport Layer

 Message-to-packet overhead (and vice versa)


 Breaking up and re-sequencing
 Error Correction cost
 Addressing techniques
 End-to-end flow control
 Message buffer sizing
 Circuit-switching-like properties
 QoS (Network setup cost, BW reservation)

Issues at Network Layer

 Need specialized technique for “fast and dedicated routing”


 Combination of deterministic and non-deterministic routing?
 Topology
 Link-to-link Flow Control
 Packet buffer sizing
 Virtual channels (deadlock avoidance)
 Back pressure (Congestion look-ahead)?
 Routing Mechanism
 Address resolution
 Bandwidth reservation

o Data link layer

 Flow control

 Handling of contention

 Correction of transmission errors

Issues at Data link layer


 Fast flexible links
 Virtual channels
 Split channels for multi-flit-per-cycle communication
 Dedicated Address / Control lines

o Physical layer

 Wires, drivers, receivers, repeaters, signaling, circuits


 Network Architectures

Network architecture is the design of a communications network. It is a framework for


the specification of a network's physical components and their functional organization
and configuration, its operational principles and procedures, as well as data
formats used in its operation. In telecommunication, the specification of a network
architecture may also include a detailed description of products and services delivered
via a communications network, as well as detailed rate and billing structures under
which services are compensated. The network architecture of the Internet is
predominantly expressed by its use of the Internet Protocol Suite, rather than a specific
model for interconnecting networks or nodes in the network, or the usage of specific
types of hardware links. Network Architecture provides the detail overview of a
network. It is used to classify all the network layers step-by-step in logical form by
describing each step in detail. It is also based on the complete working definitions of the
protocols. The architecture is emphasized in a distributed computing environment and
its complexity cannot be understood without a framework. Therefore there is a need to
develop applications or methods to layout an overview of a network.

 NoC Building blocks

o Topologies

The need of topology rises when complexities increase. As an improved topology is


selected complexities decrease and power-efficiency increases. The design of a NOC is
very large, and it includes topology choice.
Network on Chip topologies are categorized into the following basic types:
(i) Ring
(ii) Spidergon
(iii) 2D Mesh
(iv) 2D Torus
The network topology and physical placement of components yields better performance
and power than a regular pattern network. Buffers and links of NOC consume nearly
75% of the total NOC power, thus there is significant benefit when we optimize buffer
size, link length and bandwidth of a NOC design. Generally, determining the optimal
topology to implement any given application does not have a known theoretical
solution; although the synthesis of customized architectures is desirable for improved
performance.
o Routing Algorithms

This is another important design choice with the algorithm taxonomy that is very
complex. It determines path(s) from source to destination.

Types based on where routing decisions are taken


(i) Source routing
(ii) Distributed routing
Types based on how path is defined
(i) Deterministic routing
(ii) Adaptive routing

Routing schemes differ in their delivery semantics:

 Unicast delivers a message to a single specific node. Unicast is the dominant form of
message delivery on the Internet.

 Broadcast delivers a message to all nodes in the network.


 Multicast delivers a message to a group of nodes that have expressed interest in
receiving the message.
 Anycast delivers a message to anyone out of a group of nodes, typically the one nearest
to the source.
 Geocast delivers a message to a geographic area.

Routing algorithms can be classified by type. Key differentiators include these:

(i) Static versus dynamic


(ii) Single-path versus multipath
(iii) Flat versus hierarchical
(iv) Host-intelligent versus router-intelligent
(v) Intra-domain versus Inter-domain
(vi) Link-state versus distance vector
o Routing Mechanisms

Implementation of algorithm can be as a finite state machine or via the look up table
which contains the entry for each IP in the network and the contents can be dynamically
changed by OS at run time. This allows us to customize the routing for each IP at the link
level. The drawback arises that is of course the required memory space grows
proportionally with the number of IPs. In comparison with the finite state machine that
has a fixed size independent of the number of IPs but has no flexibility. Each time a task
that produces a change in the communication between the IPs is started, the OS
computes a new optimum distribution of the communication channels and the look up
tables changes accordingly.

o Switching

Switching algorithms is relatively simple; it is the same for most routing protocols. In
most cases, a host determines that it must send a packet to another host. Having
acquired a router's address by some means, the source host sends a packet addressed
specifically to a router's physical (Media Access Control [MAC]-layer) address, this time
with the protocol (network layer) address of the destination host .As it examines the
packet's destination protocol address, the router determines that it either knows or
does not know how to forward the packet to the next hop. If the router does not know
how to forward the packet, it typically drops the packet. If the router knows how to
forward the packet, however, it changes the destination physical address to that of the
next hop and transmits the packet. The next hop may be the ultimate destination host.
If not, the next hop is usually another router, which executes the same switching
decision process. As the packet moves through the internetwork, its physical address
changes, but its protocol address remains constant. The preceding discussion describes
switching between a source and a destination end system. The International
Organization for Standardization (ISO) has developed a hierarchical terminology that is
useful in describing this process. Using this terminology, network devices without the
capability to forward packets between sub networks are called end systems (ESs),
whereas network devices with these capabilities are called intermediate systems (ISs).
ISs are further divided into those that can communicate within routing domains (intra
domain ISs) and those that communicate both within and between routing domains
(inter domain ISs). A routing domain generally is considered a portion of an
internetwork under common administrative authority that is regulated by a particular
set of administrative guidelines. Routing domains are also called autonomous systems.
With certain protocols, routing domains can be divided into routing areas, but intra
domain routing protocols are still used for switching both within and between areas.

 Flow Control

The flow control is the mechanism that allows to the transmitter to know if the receiver
is qualified to receive data (the receiver can be with its full buffers or some momentary
problem that disables the data reception).

Types of flow control

 Handshake flow control


 Credit based flow control

In the handshake flow control, when a router A wishes to transmit a data to a neighbor
router B, the router A activates the tx signal and puts the data in the data out. When the
router B perceives the activated rx signal, it stores the data that is in data in signal and
activates the ack-rx signal, indicating the data reception. After that, the router A can
transmit the subsequent data. This protocol is asynchronous and each flit consumes, at
least, 2 (two) clock cycles.

In the credit based flow control, when a router A wishes to transmit a data to the
neighbor router B, the router A verifies if the channel selected to the data transmission
has credits, i.e., if the credit-i signal is activated. After that, the router A puts the data in
the data out signal and activates the tx signal. When the router B perceives the
activated rx signal, it stores the data that is in the data-in signal and decreases the
number of credits. The router A can transmit data while the credit-i signal is activated.
The number of credits is incremented when a data is consumed by the router B, i.e.,
when the router B transmits a data to the core or to a neighbor router. In this protocol,
the data transmission is synchronized through the clock_tx signal and each data
consumes, at least, 1 clock cycle.

 Network Interface

The network interface is designed as a bridge between an OCP interface and a NOC
switching fabric. Its purposes are the synchronization between OCP and network timing,
(de-) packetization, the computation of routing information (stored in a Look-Up Table,
LUT) and flit buffering to improve performance. Differentiated bridges exist between
communication initiators and the network (network interface initiator) and between
communication targets and the network (network interface target). Each NI is split into
two sub-modules: one for the request and one for the response channel. These sub-
modules are loosely coupled. Whenever a transaction requiring a response is processed
by the request channel, the response channel is notified; whenever the response is
received, the request channel is unblocked. The request path of the NI is built around
two registers one holds the transaction header (1 refresh per OCP transaction), while
the second one holds the payload (refreshed at each OCP burst beat). A set of flits
encodes the header register, followed by multiple sets of flits encoding a snapshot of
the payload register subsequent to a new burst beat. Header and payload content is
never allowed to mix, and padding is eventually used. Routing information is attached to
the header flit of a packet by checking the transaction address against a LUT. The length
of this field depends on maximum switch radix and maximum number of hops in the
specific network instance at hand. The NI performs clock domain crossing, however in
order to keep the architecture simple the ratio between network and core clock
frequencies needs to be an integer divider.

 Router Architecture

Depending on topology and on the number of IPs attached to them, the routers have a
variable number of input and output ports. Separating the input ports from the output
ports allows a flexible network design. Each router provides fixed length input FIFO
buffers to accommodate the local processing core and the neighboring routers. No
output buffering is provided. Write asserted flits are sampled every clock cycle and are
written into input buffers. Routing logic produces output port requests for each buffer
depending on the routing information contained within the head-of-line flit and are sent
to a switch allocator, which configures the switch fabric to connect the input buffers to
the output ports. A cyclic iterative priority arbiter employing round-robin scheduling is
used for decision making.
4) DESIGN

 Requirements
High-performance interconnect
 High-throughput, latency, power, area
Complex functionality
 Support for virtual-channels
 QoS
Synchronization
 Reliability, high-throughput, low-latency

 Routing Algorithm
Dimension Ordered Routing (DOR-XY) routing algorithms are the most commonly used routing
algorithms for on-chip networks due to their simplicity for implementation within hardware. A
deterministic XY dimension-ordered routing (DOR-XY) algorithm, also a shortest path algorithm,
is implemented and operates by first routing packets along the x-direction until they reach the x-
destination address and then likewise for the y-dimension. This means that packets always
traverse the same path between a particular set of source-destination nodes. A variant of the
deterministic DOR-XY is the ‘XYX’ which uses DOR-XY and DOR-YX. This algorithm works by
forcing source nodes to inject redundant copies of every message, with a header flag bit to
indicate whether the message is the original or the copy. Original messages are routed using XY
while, copies follow YX routing. Overall, the results show that the traffic distribution is close to
uniform. The problem with this technique is that the addition of redundancy increases the
amount of traffic on the network and limits the maximum injection rate for acceptable
performance.
 Deadlock

Deadlock occurs when a set of agents holding resources are waiting on another set of resources
such that a cycle of waiting agents is formed. Most networks are designed to avoid deadlock,
but it is also possible to recover from deadlock by detecting and breaking cyclic wait-for
relationships. Deadlock is more of an issue on routing algorithms that do not allow for
misrouting of data. When this data reaches a node or link it cannot traverse through, it stops its
progress, and is usually dropped. Because of finite buffers, and random failures, deadlock is
usually an issue to be considered within routing algorithms.

 Livelock

Livelock occurs when a packet is not able to make progress in the network and is never
delivered to its destination. Unlike Deadlock, though, a livelock packet continues to move
through the network. Livelock tends to occur most frequently when data is allowed to be
misrouted, or routed non-minimally. More specifically, livelock becomes a reality when data is
forever in transit towards its destination, and continues to move within the network, but does
not reach the destination within finite time. Usually this issue comes about because of
overutilized channels or link failures. There are two schemes towards tackling the issue of
livelock within routing algorithms. By using only minimal paths, and having great restrictions on
non-minimal paths, livelock can be less of an issue.

5) APPLICATIONS
To design on-chip network for the application the chip is utilized for, there is a growing integration of
various application cores within a single chip. Hence there is a need for the network to be flexible and
robust so that it can support diverse types and volume of traffic generated by different cores.

 Telecommunication systems and NOC


The trend nowadays is to integrate telecommunication system on complex MCSoC:

a. Network processors,

b. Multimedia hubs

c. base-band telecom circuits

Therefore there is a great need of NoC in Telecommunication.

Other applications are as follows:


 FlexNoC interconnect fabric IP deployed in market leading Smartphone and tablet mobile
devices
 Asynchronous NoC using an elastic channel protocol

 Nokia’s future mobile set that can provide real-time application.

6) CONCLUSION
From our research, we concluded that on-chip networks have now become a promising solution
not only for parallel applications but also for data intensity applications due to the following
reasons:
 Implementation and optimization of layers is independent.
 It gives simplified customization per application.
 It supports multiple topologies and options for different parts of the network.
 It is a simplified feature development, interface interoperability, and scalability.

7) FUTURE WORK

In future many-core processors will require high-performance yet energy-efficient on-chip


networks to provide a communication substrate for the increasing number of cores. Recent
advances in silicon nanophotonics create new opportunities for on-chip networks. To efficiently
exploit the benefits of nanophotonics, we propose Firefly – a hybrid, hierarchical network
architecture. Firefly consists of clusters of nodes that are connected using conventional,
electrical signaling while the inter-cluster communication is done using nanophotonics –
exploiting the benefits of electrical signaling for short, local communication while
nanophotonics is used only for global communication to realize an efficient on-chip network.

8) REFERENCES
http://en.wikipedia.org/wiki/Network_on_a_chip
http://www6.in.tum.de/pub/Main/TeachingWs2013MSE/bolotin_NoC_costs.pdf
http://www.ann.ece.ufl.edu/courses/eel6935_10spr/papers/Reconfigurable_Network-
on_Chip_Architecture.pdf
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.5414
http://webee.technion.ac.il/bolotin/papers/QNoC-Dec2003.pdf
http://gram.eng.uci.edu/comp.arch/lab/NoCOverview.htm
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6121922&url=http%3A%2F%2Fieeexplo
re.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6121922
https://www.ee.ucl.ac.uk/~pwatts/Danny_Ly_Final.pdf
http://www.arteris.com/Arteris_mobility_pr_4_june_2013
http://pop-art.inrialpes.fr/~girault/Fmgals07/Final/03-gebhardt.pdf
http://www.altera.com/literature/wp/wp-01149-noc-qsys.pdf
B. Ahmad, Ahmet T. Erdogan, and Sami Khawam, “Architecture of a Dynamically Reconfigurable
NoC for Adaptive Reconfigurable MPSoC”, Proceedings of the First NASA/ESA Conference on
Adaptive Hardware and Systems, IEEE Computer Society, 2006.
Tobias Bjerregaard and Shankar Mahadevan, “A survey of research and practice of Network on
Chip”, ACM Computing Surveys, Vol. 38, 2006.
Williams James Dally and Brian Towles, “Principles and Practices of Interconnection Networks”,
Morgan Kaufmann Publishers, San Francisco, 2004.
Rickard Holsmark and Magnus Hugberg, “Modelling and Prototyping of Network on Chip”,
Master of Science Thesis, Ingenjorshogskolan, 2002.
Nikolay Kavaldijev and Gerard J. M. Smit, “A survey of efficient On-Chip communications for
SoC”, Department of EEMCS, University of Twente, Netherland, 2003.
Shashi Kumar, Axel Jantsch, Juha-Pekka Soininen, Martti Forsell, Mikael Millberg, Johnny Oberg,
Kari Tiensyrja, and Ahmed Hemani, “A Network on Chip Architecture and Design Methodology”,
in Proceedings of IEEE Computer Society Annual Symposium on VLSI,2002.
José C. Prats Ortiz, “Design of components for a NoC-Based MPSoC Platform: Adding a shared
memory node to the mNoC”, Master of Science Thesis, Department of Electrical Engineering,
Eindhoven University of Technology, 2005.

G. Blake, R.G. Dreslinski, and T. Mudge, “A Survey of Multicore Processors,” IEEE Signal
Processing Magazine, vol. 26, no. 6, pp. 26-37, Nov. 2009.

L. Karam, I. AlKamal, A. Gatherer, G. Frantz, D. Anderson, and B. Evans, “Trends in Multicore


DSP Platforms,” IEEE Signal Processing Magazine, vol. 26, no. 6, pp. 38-49, Nov. 2009.

R. Kamal and N. Yadav, “NoC and Bus Architecture: A Comparison,” International Journal of
Engineering Science and Technology, vol. 4, no. 4, pp. 1438-1442, Apr. 2012.

N.E. Jerger and L. Peh, “On Chip Networks (Synthesis Lectures on Computer Architecture),”
Morgan and Claypool Publishers, 2009.
M. Hübner, “Multiprocessor System-On-Chip: Hardware Design and Tool Integration,” Springer,
2011.

S. Pasricha and N. Dutt. “On-Chip Communication Architectures: System on Chip Interconnect,”


Morgan Kaufmann, 2008.

Jerraya and W. Wolf, “Multiprocessor Systems-on-Chips,” Morgan Kaufmann, 2004.

H. Iwai. “Roadmap for 22 nm and beyond,” Microelectronic Engineering, vol. 86, no. 7-9, pp.
1520-1528, July-Sept. 2009

W. Group. International technology roadmap for semiconductors. 2011 Edition. [Online]


Available at: http://www.itrs.net/Links/2011ITRS/Home2011.htm [Accessed: 18 December
2012].
Cidon and I. Keidar, “Zooming in on network-on-chip architectures,” SIROCCO Proceedings of
the 16th international conference on Structural Information and Communication Complexity, pp.
1, 2009.

K. S. Hassan, A. Reza, M. Reshadi, “Yield modeling and Yield-aware Mapping for Application
Specific Networks-an-chip,” NorChip, pp. 1-4, Nov. 2011.

W. Group. International technology roadmap for semiconductors. 2007 Edition. [Online]


Available at: http://www.itrs.net/links/2007ITRS/Home2007.htm [Accessed: 10 February 2012].

H. Haznedar, M. Gall, V. Zolotov, P. S. Ku, C. Oh and R. Panda, “Impact of Stress-Induced


Backflow on Full-Chip Electromigration Risk Assessment,” IEEE Trans. Computer-Aided
Design of Integrated Circuits and Systems, vol. 25, no. 6, pp. 1038-1046, Jun. 2006.

S. Shamshiri and K.Cheng, “Modeling Yield, Cost, and Quality of a Spare-Enhanced Multicore
Chip,” IEEE Trans. Computers, vol. 60, no. 9, pp. 1246-1259, Sept. 2011.

M. Pirretti, G. M. Link, R. R. Brooks, N. Vijaykrishnan, M. T. Kandemir and M. J. Irwin, “Fault


tolerant algorithms for network-on-chip interconnect,” VLSI Proceedings. IEEE Computer society
Annual Symposium, pp. 46-51, Feb. 2004.

M. Behrouzian Nejad, A. Mehranzadeh, M. Hoodgar, “Performance of Input and Output


Selection Techniques on Routing Efficiency in Network-on-Chip,” International Journal of
Computer Science and Information Security, vol. 9, no. 9, pp. 125-130, 2011.

T. Bjerregaard and S. Mahadevan, “A Survey of Research and Practices of Network-on-Chip,”


ACM Computing Surveys, vol. 38, no. 1, p. article No. 1, 2006.

W. Wolf, A. A. Jerraya, and G. Martin, “Multiprocessor System-on-Chip (MPSoC) Technology,”


IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 10, pp.
1701-1713, Oct. 2008.

W. J. Dally and B. Towles, “Route packets, not wires: On-chip interconnection


Networks,”Design Automation Conference, Proceedings of the 38th Conference, pp. 684–689,
Jun.2001.

W. J. Dally and B.P. Towles, “Principles and Practices of Interconnection Networks,” Morgan
Kaufmann, 2004.

S. Chen, Y. Lan, W.Tsai, “Reconfigurable Networks-On-Chip,” Springer, 2012.

Jantsch and H. Tenhunen, “Networks on Chip,” Springer, 2003.

D. L. Nicholls, “Network-on-Chip,” Mar. 2012

P. Lotfi-Kamran, A. M. Rahmani, M. Daneshtalab, A. Afzali-Kusha and Z. Navabi, “EDXY - A


low cost congestion-aware routing algorithm for network-on-chips,” Journal of Systems
Architecture: the EUROMICRO Journal, vol. 56, no. 7, pp. 256-264, Jul. 2010.
9) APPENDIX
 Main Code
`timescale 1ns / 1ps
////////////////////////////////////////////////////////////////////
//////////////
// Company:
// Engineer:
//
// Create Date: 01:07:33 12/22/2014
// Design Name:
// Module Name: NOC
// Project Name:
// Target Devices:
// Tool versions:
// Description:
//
// Dependencies:
//
// Revision:
// Revision 0.01 - File Created
// Additional Comments:
//
////////////////////////////////////////////////////////////////////
//////////////
module NOC(
input datain,
input clk,
input r_w,
output reg dout,
input [1:0] xaddr,
input [1:0] yaddr
);
reg [2:0] buff [2:0];
reg [5:0] inbuff;
reg datain1;
reg r_w1;
reg [1:0] xaddr1;
reg [1:0] yaddr1;
reg [1:0] cx,cy;
initial begin
inbuff=0;
xaddr1=0;
yaddr1=0;
cx =1;
cy =1;
end
always@(posedge clk)
begin
buff[0]=0;
buff[1]=0;
buff[2]=0;
inbuff <= {yaddr,xaddr,datain,r_w};
yaddr1 <= inbuff[5:4];
xaddr1 <= inbuff[3:2];
datain1 <= inbuff[1];
r_w1 <= inbuff[0];
end
always@(posedge clk)
begin
if(yaddr1 != cy)
begin
if(cy > yaddr1)
cy <= cy-1;
else
cy <= cy+1;
end

else if(cy == yaddr1)


begin
if(cx < xaddr1)
cx <= cx+1;
else if(cx > xaddr1 )

cx <= cx-1;
else if(cx == xaddr1)
begin
if(r_w1)
dout <= buff[cx][cy];
else
buff[cx][cy] <= datain1;
end
end
end

endmodule

 Test Bench
`timescale 1ns / 1ps

////////////////////////////////////////////////////////////////////
////////////
// Company:
// Engineer:
//
// Create Date: 19:38:36 12/25/2014
// Design Name: NOC
// Module Name: C:/Users/admin/FPGA/NOC/tb.v
// Project Name: NOC
// Target Device:
// Tool versions:
// Description:
//
// Verilog Test Fixture created by ISE for module: NOC
//
// Dependencies:
//
// Revision:
// Revision 0.01 - File Created
// Additional Comments:
//
////////////////////////////////////////////////////////////////////
////////////

module tb;

// Inputs
reg datain;
reg clk;
reg r_w;
reg [1:0] xaddr;
reg [1:0] yaddr;

// Outputs
wire dout;

// Instantiate the Unit Under Test (UUT)


NOC uut (
.datain(datain),
.clk(clk),
.r_w(r_w),
.dout(dout),
.xaddr(xaddr),
.yaddr(yaddr)
);
initial begin
#100
clk =0;
forever #5 clk = ~clk;
end
initial begin

datain = 0;

r_w = 0;
xaddr = 2;
yaddr = 2;

// Wait 100 ns for global reset to finish


#200;

datain=1;

#100
datain =1;
xaddr =1;
yaddr =1;
#100
r_w =1;
#50
r_w =1;
xaddr=0;
yaddr=0;

// Add stimulus here

end

endmodule

You might also like