Professional Documents
Culture Documents
NETWORKS
PROJECT REPORT
NETWORK ON CHIP
Class: Bet-6A
Date: 09/01/2015
ABSTRACT
This report represents a basic Network on Chip (NOC) structure. A scalable NOC
system consisting of 5 port routers in a 16- core mesh has been designed,
analyzed and implemented for the performance evaluation in terms of latency, of
deterministic routing algorithm. All the work has been done on Xlinx Software
using Verilog Language.
TABLE OF CONTENTS
1) Introduction
System on Chip
Problem Background
Objective
2) Background
On Chip vs. Off Chip Design
On Chip Interconnection Types
o Point to Point Interconnection
o Single Bus Interconnection
o Hierarchal/Segmented Bus Interconnection
o Crossbar Interconnection
3) Network on Chip
Basic Concept
Features of NoC
o Links
o Routers
o Network Adaptor(NA) or Network Interface(NI)
Front End
Back End
Performance Evaluation
o Quantitative Terms
Network Equilibrium
Bandwidth
Throughput
latency
Area
Power
o Qualitative Terms
Quality of Service
(i) Best-Effort (BE) NoCs
(ii) Guaranteed Services (GS) NoCs
Load Balancing
Reconfigurability
(i) Static Part
(ii) Reconfigurable Part
Fault tolerance
Problems Addressed by NoC
Network Abstraction
Network Architectures
NoC Building blocks
o Topologies
o Routing Algorithms
o Routing Mechanisms
o Switching
o Flow Control
Handshake Flow Control
Credit Flow Control
o Network Interface
o Router Architecture
4) Design
Routing Algorithm
Deadlock
Livelock
5) Applications
Telecommunication Systems and NoC
6) Conclusion
7) Future work
8) References
9) Appendix
Main Code
Test Bench
1) INTRODUCTION
With the expansion of multi-core technology, researchers in the field have exposed critical
scalability issues and limiting bandwidth performance of customary bus-based interconnect
architectures and point-to-point links. This major flaw has led Network-on-Chip (NoC)
interconnects to become an increasingly popular solution. To comprehend the origins of the
NoC, the evolution of processing core technology will be highlighted along with the ensuing
problems which will lead onto discussion of why NoCs are considered to be a viable solution to
accommodate the ever growing communication demands of future multi-core systems.
Problem Background
Integrated circuits are prone to errors both in fabrication and utilization, this equates to
potential profit loss for companies and expensive costs for consumers.
o Fabrication Failures
A continued reduction in chip feature sizes have made the fabrication process increasingly
difficult to perfect since process variations have become magnified; leading to greater defect
rates and significantly poor yield .The integrated circuit (IC) manufacturing process is an
extremely delicate process that involves several stages each of which must achieve minimal
contamination. Working in the scale of tens of nanometers has lowered the precision of control
and can result in imprecise impurity deposition and non-uniform fields leading to transistor
malfunction. A report by the ITRS predicts percentage variation in 22nm technology to be three
times greater than compared to 45 - 65nm at 18% compared to 5%.
o Electro-Migration
The effect of operation time on chips causes an elemental ageing phenomenon named electro-
migration (EM). The EM process gradually diffuses metal ions in wires during current flow and
causes them to thin, increasing their local resistance. Ultimately, the resistance becomes too
large to hold and results in EM link failure. With present day chip designs housing a greater
number of interconnections per chip, the possibility of EM failures are becoming more probable.
Additional superfluous hardware into designs has been implemented to cope with failures. It
was shown that including extra wires and cores in the initial design allowed repair during
manufacture and also post-production. It was revealed that with one spare wire per link the
percentage of chips ready for market could be increased by up to 10%. Even though redundancy
does improve yield, the increased manufacturing costs of extra area overhead can outweigh the
yield savings.
A software based solution which uses packet flooding algorithms was also investigated. In these
algorithms, injected packets are sent to all neighbors with a probability ‘p’ and are dropped with
probability ‘1-p’. The value of ‘p’ must be carefully selected to achieve both, near flooding
performance and minimize transmission of redundancies. Eventually, these solutions are
probabilistic. Henceforth, packets are not guaranteed to reach the destination. The final decree
for flooding methods is that they are limited to low injection rates due to a large communication
overhead.
Objective
Research on Network on Chip and implement a 16-core 2D mesh using simple XY routing
algorithm.
2) BACKGROUND
This section places NoCs into context by detailing the characteristics of existing interconnects
and leads onto why a new paradigm is essential for sustainability.
In an Off-Chip design, cost is in the links. Latency is tolerable .Traffic/applications are unknown.
It changes at runtime. It is an adherence to networking standards .
o Point-to-Point
o Single Bus
Traditional bus architectures form broadcast mediums in which a single shared backbone
enables intra-chip communication. Nevertheless, there is often a need for dedicated point-to-
point links for critical signals which, when scaled can result in an exponential increase of wiring
and prolonged design cycles. Therefore, wiring can contribute to excessive power consumption
and extremely poor scalability. Moreover, with multiple processors there are significant
increases in latency due to contention.
A bus architecture depicting system bottleneck due to a single point arbitration scheme.
These designs are only efficient when accommodating fewer than five cores.
Buses were developed further by incorporating crossbars to form a switch fabric. This overcame
the problem of increased non-parallelism by enabling concurrent transactions to occur through
simulation of point-to-point interconnections. Although this facilitates bandwidth
improvements, the number of links increases exponentially with rising core count toting up to
the ever restricting area overhead and power consumption. This translates to the intricacy of
crossbars scaling as the number of ports squared. Crossbars can also suffer from cross-chip wire
delays that limit clock frequencies due to large RC delays and quadratic scaling with distance,
this eventually forces cross-chip communication latencies up to tens of cycles in sub-hundred
nanometer technology. These conclusions show that, although crossbars overcome limitations
of the bus design, they ultimately have poor scalability.
Buses and crossbars have been the trusted communication providers for multiprocessor
technology since their introduction but, with new generations of complex designs appearing
there is a requirement for a future sustainable interconnection framework. A change that has
been commonly accepted is, the transition towards a shared and segmented global
communication architecture since it can provide flexibility and scalability. This structure is
described as a data-routing network; a NoC.
3) NETWORK ON CHIP
Basic Concept
As the number of IP modules in Systems-on-Chip (SoCs) increases, bus-based interconnection
architectures may prevent these systems to meet the performance required by many
applications. For systems with rigorous parallel communication requirements buses may not
provide the required bandwidth, latency, and power consumption. A solution for such a
communication holdup is the use of an embedded switching network, called Network-on-Chip
(NoC), to interconnect the IP modules in SoCs. NoCs design space is significantly larger when
compared to a bus-based solution, as different routing and arbitration strategies can be
implemented. In addition, NoCs have an innate redundancy that helps tolerate faults and deals
with communication bottlenecks. NoC is a platform for system integration, debugging and
testing.
Features of NoC
NoC has the following three features:
o Links
The first and most important one is the links that physically connect the nodes and actually
implement the communication.
o Routers
The second block is the router, which implements the communication protocol (the
decentralized logic behind the communication protocol). The router basically receives packets
from the shared links and, according to the address informed in each packet, it forwards the
packet to the core attached to it or to another shared link. The protocol itself consists of a set of
policies defined during the design (and implemented within the router) to handle common
situations during the transmission of a packet, such as, having two or more packets arriving at
the same time or disputing the same channel, avoiding deadlock and livelock situations,
reducing the communication latency, increasing the throughput, etc.
The design and implementation of a router requires the definition of a set of policies to deal
with packet collision, the routing itself, and so on. A NoC router is composed of a number of
input ports (connected to shared NoC channels), a number of output ports (connected to
possibly other shared channels), a switching matrix connecting the input ports to the output
ports, and a local port to access the IP core connected to this router, we use the terms router
and switch as synonymous, but the term switch can also mean the internal switch matrix that
actually connects the router inputs to its outputs.
Front End
The front end handles the core requests and is ideally unaware of the NoC. This part is
usually implemented as a socket.
Back End
The back end part handles the network protocol (assembles and disassembles the packet,
reorders buffers, implements synchronization protocols, helps the router in terms of
storage, etc).
Performance Evaluation
The performance of a NoC can be evaluated on the basis of quantitative and qualitative terms
which are briefly discussed as follows:
o Quantitative Terms
Network Equilibrium
The network is initialized with vacant buffers and idle resources for ease of design purpose.
Initial packets injected into the network will therefore experience minimal contention and exit
with low latency. After a certain time period, buffers begin to populate and increase overall
packet latency. Eventually, the initialization effect diminishes and the network reaches steady
state. A similar phenomenon occurs at the end of the simulation where the last remaining
packets exit the network and endure less traffic because packet injection stop this is known as
the drain phase. Latency measurements within these periods introduce systematic errors
therefore, to minimize their influence; measurements will be taken during steady state
operation. To do this, generated packets will be flagged with a ‘measure’ bit once the warm-up
phase has passed, the test-bench sink will assess this and only record measurements when it is
logic high. Testing will be monitored to ensure packets injections continue long enough for all
the measurement packets to reach their destinations. This continual injection is required up
until the very last measurement packet so that it experiences similar background traffic
interactions as the previous packets did throughout the test.
Bandwidth
The bandwidth refers to the maximum rate of data propagation once a message is in the
network. The unit of measure for bandwidth is bit per second (bps) and it usually considers the
whole packet, including the bits of the header, payload and tail.
Throughput
Throughput is defined as the maximum traffic accepted by the network, that is, the maximum
amount of information delivered per time unit. The throughput measure is messages per second
or messages per clock cycle. One can have a normalized throughput (independently from the
size of the messages and of the network) by dividing it by the size of the messages and by the
size of the network. As a result, the unit of the normalized throughput is bits per node per clock
cycle (or per second).
Latency
Latency is the time elapsed between the beginning of the transmission of a message (or packet)
and its complete reception at the target node. Latency is measured in time units and mostly
used as comparison basis among different design choices. In this case, latency can also be
expressed in terms of simulator clock cycles. Normally, the latency of a single packet is not
meaningful and one uses the average latency to evaluate the network performance. On the
other hand, when some messages present a much higher latency than the average, this may be
important. Therefore the standard deviation may be an interesting measure as well.
Area
As the non-segmented buses operate at a very slow frequency and has no parallelism, it has to
be made excessively wide in order to provide the same effective bandwidth as the NoC. As a
result, its width grows and its length grows, so that its total area cost function increases.
Average link frequency is slower than in the NoC (longer links with higher capacitance). The, link
length grows and since the link width is asymptotically one, its total area also grows.
Power
Power dissipated by all architectures is proportional to the product of operating frequency and
total wire length. Advantage of NoC, assuming a uniform traffic distribution and also assuming
that load capacitance depends only on the interconnect (ignoring the capacitance of system
module ports). Moreover it is clear that non-uniform, mostly-local traffic favors NoC, as does the
inclusion of input port capacitance. In more advanced VLSI technology generations the
capacitance and delay of long interconnect wires becomes even more dominant. As the
technology improves, NoC is the only communication architecture where the links become
shorter and less vulnerable to delays and noise.
Qualitative Terms
Quality of Service
Quality of service (QoS) is defined “as service quantification that is provided by the network to
the demanding core”. Quality of service is also about predictability of the communication
behavior. One must first define the required services and a quantification measure. Typically,
services are defined based on the performance metrics of the network (low latency, high
throughput, low power, etc.). Then, one must define implementation mechanisms to meet the
core’s demands using the network services.
These NoCs offer no commitment. For most NoCs this means that only completion of the
communication is ensured. Best effort NoCs tends to present a better utilization of the
available resources.
Those NoCs ensure that some services requirements can be accomplished. Commitment can
be defined in several levels, such as correctness of the result, completion of the transaction,
bounds on performance, and so on.
Due to the characteristics of the on-chip networks and the systems based on those
structures, service guarantees and predictability are usually given through hardware
implementations, as opposed to statistical guarantees of macro-networks. In order to give
hard guarantees, one must use specific routing schemes (such as virtual channels, adaptive
routing, fault-tolerant router and link structures, etc.) to ensure the required levels of
performance and reliability.
In guaranteed services NoCs, costs and complexity grows with the complexity of the system
and the QoS requirements.
Load Balancing
Reconfigurability
The static part consists of all the computational elements and the network interfaces. On
the one hand, computational elements can be further divided into two categories. The first
one consists of masters, which are the active components of the system, such as
microprocessors that can initialize new transactions on the network these components are
connected to the communication infrastructure. The second one consists of slaves, such as
memories, that represent the components that act in a passive mode, by receiving and
answering transaction coming from active elements.
The reconfigurable part is composed by all the reconfigurable elements, used to adapt at
run-time the structure of the system implemented on the FPGA. These elements can be
either computational components or elements used to update the communication
infrastructure. Network interfaces toward the communication infrastructure can implement
bridges between On-chip Peripheral Bus (OPB), Processor Local Bus (PLB) or Open Core
Protocol (OCP) and the network protocol. The only part of network interfaces (both initiator
and target network interfaces) that has to be modified at run-time is routing tables that are
used to dynamically change the routing of packets on the network. Thus, all the network
interfaces have been placed into the static part of the system and routing tables have been
deployed on BRAM blocks.
o Fault Tolerance
As technology scales, fault tolerance is becoming a key concern in on-chip communication. Two
different flooding algorithms and a random walk algorithm are investigated. The flood-based
fault tolerant algorithms have an exceedingly high communication overhead. We find that the
redundant random walk algorithm offers significantly reduced overhead while maintaining
useful levels of fault tolerance. We then compare the implementation costs of these algorithms,
both in terms of area as well as in energy consumption, and show that the flooding algorithms
consume an order of magnitude more energy per message transmitted. It offers many
advantages:
NoC links:
Regular
c) NoC Scalability
System integration
Productivity problem
Multicore Processors
ILP is limited
Multicore chip
Network Abstraction
o Application Layer
Flow control
Handling of contention
o Physical layer
o Topologies
This is another important design choice with the algorithm taxonomy that is very
complex. It determines path(s) from source to destination.
Unicast delivers a message to a single specific node. Unicast is the dominant form of
message delivery on the Internet.
Implementation of algorithm can be as a finite state machine or via the look up table
which contains the entry for each IP in the network and the contents can be dynamically
changed by OS at run time. This allows us to customize the routing for each IP at the link
level. The drawback arises that is of course the required memory space grows
proportionally with the number of IPs. In comparison with the finite state machine that
has a fixed size independent of the number of IPs but has no flexibility. Each time a task
that produces a change in the communication between the IPs is started, the OS
computes a new optimum distribution of the communication channels and the look up
tables changes accordingly.
o Switching
Switching algorithms is relatively simple; it is the same for most routing protocols. In
most cases, a host determines that it must send a packet to another host. Having
acquired a router's address by some means, the source host sends a packet addressed
specifically to a router's physical (Media Access Control [MAC]-layer) address, this time
with the protocol (network layer) address of the destination host .As it examines the
packet's destination protocol address, the router determines that it either knows or
does not know how to forward the packet to the next hop. If the router does not know
how to forward the packet, it typically drops the packet. If the router knows how to
forward the packet, however, it changes the destination physical address to that of the
next hop and transmits the packet. The next hop may be the ultimate destination host.
If not, the next hop is usually another router, which executes the same switching
decision process. As the packet moves through the internetwork, its physical address
changes, but its protocol address remains constant. The preceding discussion describes
switching between a source and a destination end system. The International
Organization for Standardization (ISO) has developed a hierarchical terminology that is
useful in describing this process. Using this terminology, network devices without the
capability to forward packets between sub networks are called end systems (ESs),
whereas network devices with these capabilities are called intermediate systems (ISs).
ISs are further divided into those that can communicate within routing domains (intra
domain ISs) and those that communicate both within and between routing domains
(inter domain ISs). A routing domain generally is considered a portion of an
internetwork under common administrative authority that is regulated by a particular
set of administrative guidelines. Routing domains are also called autonomous systems.
With certain protocols, routing domains can be divided into routing areas, but intra
domain routing protocols are still used for switching both within and between areas.
Flow Control
The flow control is the mechanism that allows to the transmitter to know if the receiver
is qualified to receive data (the receiver can be with its full buffers or some momentary
problem that disables the data reception).
In the handshake flow control, when a router A wishes to transmit a data to a neighbor
router B, the router A activates the tx signal and puts the data in the data out. When the
router B perceives the activated rx signal, it stores the data that is in data in signal and
activates the ack-rx signal, indicating the data reception. After that, the router A can
transmit the subsequent data. This protocol is asynchronous and each flit consumes, at
least, 2 (two) clock cycles.
In the credit based flow control, when a router A wishes to transmit a data to the
neighbor router B, the router A verifies if the channel selected to the data transmission
has credits, i.e., if the credit-i signal is activated. After that, the router A puts the data in
the data out signal and activates the tx signal. When the router B perceives the
activated rx signal, it stores the data that is in the data-in signal and decreases the
number of credits. The router A can transmit data while the credit-i signal is activated.
The number of credits is incremented when a data is consumed by the router B, i.e.,
when the router B transmits a data to the core or to a neighbor router. In this protocol,
the data transmission is synchronized through the clock_tx signal and each data
consumes, at least, 1 clock cycle.
Network Interface
The network interface is designed as a bridge between an OCP interface and a NOC
switching fabric. Its purposes are the synchronization between OCP and network timing,
(de-) packetization, the computation of routing information (stored in a Look-Up Table,
LUT) and flit buffering to improve performance. Differentiated bridges exist between
communication initiators and the network (network interface initiator) and between
communication targets and the network (network interface target). Each NI is split into
two sub-modules: one for the request and one for the response channel. These sub-
modules are loosely coupled. Whenever a transaction requiring a response is processed
by the request channel, the response channel is notified; whenever the response is
received, the request channel is unblocked. The request path of the NI is built around
two registers one holds the transaction header (1 refresh per OCP transaction), while
the second one holds the payload (refreshed at each OCP burst beat). A set of flits
encodes the header register, followed by multiple sets of flits encoding a snapshot of
the payload register subsequent to a new burst beat. Header and payload content is
never allowed to mix, and padding is eventually used. Routing information is attached to
the header flit of a packet by checking the transaction address against a LUT. The length
of this field depends on maximum switch radix and maximum number of hops in the
specific network instance at hand. The NI performs clock domain crossing, however in
order to keep the architecture simple the ratio between network and core clock
frequencies needs to be an integer divider.
Router Architecture
Depending on topology and on the number of IPs attached to them, the routers have a
variable number of input and output ports. Separating the input ports from the output
ports allows a flexible network design. Each router provides fixed length input FIFO
buffers to accommodate the local processing core and the neighboring routers. No
output buffering is provided. Write asserted flits are sampled every clock cycle and are
written into input buffers. Routing logic produces output port requests for each buffer
depending on the routing information contained within the head-of-line flit and are sent
to a switch allocator, which configures the switch fabric to connect the input buffers to
the output ports. A cyclic iterative priority arbiter employing round-robin scheduling is
used for decision making.
4) DESIGN
Requirements
High-performance interconnect
High-throughput, latency, power, area
Complex functionality
Support for virtual-channels
QoS
Synchronization
Reliability, high-throughput, low-latency
Routing Algorithm
Dimension Ordered Routing (DOR-XY) routing algorithms are the most commonly used routing
algorithms for on-chip networks due to their simplicity for implementation within hardware. A
deterministic XY dimension-ordered routing (DOR-XY) algorithm, also a shortest path algorithm,
is implemented and operates by first routing packets along the x-direction until they reach the x-
destination address and then likewise for the y-dimension. This means that packets always
traverse the same path between a particular set of source-destination nodes. A variant of the
deterministic DOR-XY is the ‘XYX’ which uses DOR-XY and DOR-YX. This algorithm works by
forcing source nodes to inject redundant copies of every message, with a header flag bit to
indicate whether the message is the original or the copy. Original messages are routed using XY
while, copies follow YX routing. Overall, the results show that the traffic distribution is close to
uniform. The problem with this technique is that the addition of redundancy increases the
amount of traffic on the network and limits the maximum injection rate for acceptable
performance.
Deadlock
Deadlock occurs when a set of agents holding resources are waiting on another set of resources
such that a cycle of waiting agents is formed. Most networks are designed to avoid deadlock,
but it is also possible to recover from deadlock by detecting and breaking cyclic wait-for
relationships. Deadlock is more of an issue on routing algorithms that do not allow for
misrouting of data. When this data reaches a node or link it cannot traverse through, it stops its
progress, and is usually dropped. Because of finite buffers, and random failures, deadlock is
usually an issue to be considered within routing algorithms.
Livelock
Livelock occurs when a packet is not able to make progress in the network and is never
delivered to its destination. Unlike Deadlock, though, a livelock packet continues to move
through the network. Livelock tends to occur most frequently when data is allowed to be
misrouted, or routed non-minimally. More specifically, livelock becomes a reality when data is
forever in transit towards its destination, and continues to move within the network, but does
not reach the destination within finite time. Usually this issue comes about because of
overutilized channels or link failures. There are two schemes towards tackling the issue of
livelock within routing algorithms. By using only minimal paths, and having great restrictions on
non-minimal paths, livelock can be less of an issue.
5) APPLICATIONS
To design on-chip network for the application the chip is utilized for, there is a growing integration of
various application cores within a single chip. Hence there is a need for the network to be flexible and
robust so that it can support diverse types and volume of traffic generated by different cores.
a. Network processors,
b. Multimedia hubs
6) CONCLUSION
From our research, we concluded that on-chip networks have now become a promising solution
not only for parallel applications but also for data intensity applications due to the following
reasons:
Implementation and optimization of layers is independent.
It gives simplified customization per application.
It supports multiple topologies and options for different parts of the network.
It is a simplified feature development, interface interoperability, and scalability.
7) FUTURE WORK
8) REFERENCES
http://en.wikipedia.org/wiki/Network_on_a_chip
http://www6.in.tum.de/pub/Main/TeachingWs2013MSE/bolotin_NoC_costs.pdf
http://www.ann.ece.ufl.edu/courses/eel6935_10spr/papers/Reconfigurable_Network-
on_Chip_Architecture.pdf
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.156.5414
http://webee.technion.ac.il/bolotin/papers/QNoC-Dec2003.pdf
http://gram.eng.uci.edu/comp.arch/lab/NoCOverview.htm
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6121922&url=http%3A%2F%2Fieeexplo
re.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6121922
https://www.ee.ucl.ac.uk/~pwatts/Danny_Ly_Final.pdf
http://www.arteris.com/Arteris_mobility_pr_4_june_2013
http://pop-art.inrialpes.fr/~girault/Fmgals07/Final/03-gebhardt.pdf
http://www.altera.com/literature/wp/wp-01149-noc-qsys.pdf
B. Ahmad, Ahmet T. Erdogan, and Sami Khawam, “Architecture of a Dynamically Reconfigurable
NoC for Adaptive Reconfigurable MPSoC”, Proceedings of the First NASA/ESA Conference on
Adaptive Hardware and Systems, IEEE Computer Society, 2006.
Tobias Bjerregaard and Shankar Mahadevan, “A survey of research and practice of Network on
Chip”, ACM Computing Surveys, Vol. 38, 2006.
Williams James Dally and Brian Towles, “Principles and Practices of Interconnection Networks”,
Morgan Kaufmann Publishers, San Francisco, 2004.
Rickard Holsmark and Magnus Hugberg, “Modelling and Prototyping of Network on Chip”,
Master of Science Thesis, Ingenjorshogskolan, 2002.
Nikolay Kavaldijev and Gerard J. M. Smit, “A survey of efficient On-Chip communications for
SoC”, Department of EEMCS, University of Twente, Netherland, 2003.
Shashi Kumar, Axel Jantsch, Juha-Pekka Soininen, Martti Forsell, Mikael Millberg, Johnny Oberg,
Kari Tiensyrja, and Ahmed Hemani, “A Network on Chip Architecture and Design Methodology”,
in Proceedings of IEEE Computer Society Annual Symposium on VLSI,2002.
José C. Prats Ortiz, “Design of components for a NoC-Based MPSoC Platform: Adding a shared
memory node to the mNoC”, Master of Science Thesis, Department of Electrical Engineering,
Eindhoven University of Technology, 2005.
G. Blake, R.G. Dreslinski, and T. Mudge, “A Survey of Multicore Processors,” IEEE Signal
Processing Magazine, vol. 26, no. 6, pp. 26-37, Nov. 2009.
R. Kamal and N. Yadav, “NoC and Bus Architecture: A Comparison,” International Journal of
Engineering Science and Technology, vol. 4, no. 4, pp. 1438-1442, Apr. 2012.
N.E. Jerger and L. Peh, “On Chip Networks (Synthesis Lectures on Computer Architecture),”
Morgan and Claypool Publishers, 2009.
M. Hübner, “Multiprocessor System-On-Chip: Hardware Design and Tool Integration,” Springer,
2011.
H. Iwai. “Roadmap for 22 nm and beyond,” Microelectronic Engineering, vol. 86, no. 7-9, pp.
1520-1528, July-Sept. 2009
K. S. Hassan, A. Reza, M. Reshadi, “Yield modeling and Yield-aware Mapping for Application
Specific Networks-an-chip,” NorChip, pp. 1-4, Nov. 2011.
S. Shamshiri and K.Cheng, “Modeling Yield, Cost, and Quality of a Spare-Enhanced Multicore
Chip,” IEEE Trans. Computers, vol. 60, no. 9, pp. 1246-1259, Sept. 2011.
W. J. Dally and B.P. Towles, “Principles and Practices of Interconnection Networks,” Morgan
Kaufmann, 2004.
cx <= cx-1;
else if(cx == xaddr1)
begin
if(r_w1)
dout <= buff[cx][cy];
else
buff[cx][cy] <= datain1;
end
end
end
endmodule
Test Bench
`timescale 1ns / 1ps
////////////////////////////////////////////////////////////////////
////////////
// Company:
// Engineer:
//
// Create Date: 19:38:36 12/25/2014
// Design Name: NOC
// Module Name: C:/Users/admin/FPGA/NOC/tb.v
// Project Name: NOC
// Target Device:
// Tool versions:
// Description:
//
// Verilog Test Fixture created by ISE for module: NOC
//
// Dependencies:
//
// Revision:
// Revision 0.01 - File Created
// Additional Comments:
//
////////////////////////////////////////////////////////////////////
////////////
module tb;
// Inputs
reg datain;
reg clk;
reg r_w;
reg [1:0] xaddr;
reg [1:0] yaddr;
// Outputs
wire dout;
datain = 0;
r_w = 0;
xaddr = 2;
yaddr = 2;
datain=1;
#100
datain =1;
xaddr =1;
yaddr =1;
#100
r_w =1;
#50
r_w =1;
xaddr=0;
yaddr=0;
end
endmodule