Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
9Activity
0 of .
Results for:
No results containing your search query
P. 1
Fat-Pyramid-NOC and Fat-Stack-NOC: New Frameworks Network-On-Chip Architectures

Fat-Pyramid-NOC and Fat-Stack-NOC: New Frameworks Network-On-Chip Architectures

Ratings: (0)|Views: 222 |Likes:
Journal of Computing, eISSN 2151-9617, http://www.JournalofComputing.org
Journal of Computing, eISSN 2151-9617, http://www.JournalofComputing.org

More info:

Published by: Journal of Computing on May 13, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

08/29/2013

pdf

text

original

 
 
Fat-Pyramid-NOC and Fat-Stack-NOC:
 
New Frameworks Network-On-Chip Architectures
Reza KourdyDepartment of Computer EngineeringIslamic Azad University,Khorramabad Branch, IranMohammad Reza Nouri radDepartment of Computer EngineeringIslamic Azad University,Khorramabad Branch, Iran
Abstract
 — Network-on-Chip (NoC) has emerged as a very promising paradigm for designing scalable communicationarchitecture for Systems on Chips (SoCs). This paper proposes a general framework for the design and simulation of network-on-chip-based pyramid architectures such as Fat-Pyramid-NOC and Fat-Stack-NOC. Several parameters in the design spaceare investigated, namely, network topology, parallelism degree, and the Scalability. Emulation is necessary to evaluate andvalidate the performance of the NoC system.
Index Terms
 — Network-on-Chip (NoC), Systems on Chip (SoC), Field-Programmable Gate Array (FPGA), processing element(PE).
——————————
 
 
——————————
1 I
NTRODUCTION
atest applications ported to embedded systems(e.g., scalable video rendering, communication proto-cols) demand a large computation power, while mustrespect other critical embedded design constraints, suchas, short time-to-market, low energy consumption or re-duced implementation size.Thus, embedded systems are complex Systems-on-Chip (SoCs) that consist of a large number of components,such as, processing elements, storage devices and evenreconfigurable devices, such as Field-Programmable GateArrays (FPGAs), to enhance the flexibility of final SoCs tobe used in different environments [1], [2].Nevertheless, one of the most critical areas of MPSoCdesign is the definition of the suitable interconnect sub-system for all these SoC components, due to architecturaland physical scalability concerns [3]. In fact, traditionalshared bus interconnects are relatively easy to design, butdo not scale well for latest and forthcoming SoC consum-er platforms.In order to cope with the large communication de-mands of such SoCs, the use of modular and scalableNetworks-on-Chips (NoCs) has been proposed [3]. Then,designing custom-tailored NoC interconnects that satisfythe performance and design constraints of the SoC for allthe different combinations of possible executed applica-tions is a key goal to achieve optimal commercial prod-ucts [4],[5].However, as general-purpose processor cores are usedto run software tasks of different applications in SoCs, thecommunication between the cores cannot be precharacte-rized and fully optimized, since the application processescan be mapped differently to the cores, typically with thesupport of the compiler. Thus, to provide predictable per-formance of the NoC, the bandwidth capacity of the dif-ferent links must be sufficient to support the peak rate oftraffic on the links of the possible different mappings ofthe tasks onto the final SoC. Otherwise, the networkmight experience traffic congestion and the latency for thetraffic streams and, hence, the interconnect performancewill become unacceptable, which needs to be avoided toprovide appropriate consumer devices.As a result, NoCs designs that guarantee worst-casebandwidth conditions of SoC operation with multipleconcurrent application often leads to over-sized topolo-gies and links on regular operation of the SoC. In this con-text, the development of new methods and frameworksthat increase the runtime versatility of initial static NoCdesigns to adapt to different working conditions, origi-nated by the diversity of sets of applications at each mo-ment, is an important research area in the NoC domain.Networks on Chips (NoCs) have been proposed as apromising solution to complex on-chip communicationproblems. However, many challenging research problemsremain unsolved at all levels of design abstraction, suchas design exploration of NoC architecture for applica-tions;scheduling and mapping algorithms; evaluation ofswitching, topology or routing algorithms for efficientexecution of applications; and optimizing communicationcosts, area, energy, and so forth. A solution to solving theabove problems calls for the development of a synthesiz-able, parameterizable NoC framework that would eva-luate and implement these problems and algorithms withminimum ease and flexibility.[6]
2
 
N
ETWORK
-O
N
-C
HIP
I
MPLEMENTATION
 
The proposed NoC framework consists of five main
L
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG1
 
 
modules: [6]i) The Processing Architectureii) The Communication Infrastructureiii) A Communication Paradigmiv) The Monitorv) The Traffic Generation Module.The Processing Architecture module consists of aProcessing Element (PE) and Network Adapter (CoreNetwork Interface) module. The Communication Infra-structure consists of network topology and a routingnode. The Communication Paradigm describes theswitching techniques and routing algorithms employed inthe NoC Communication Infrastructure. The Monitormodule includes two sub modules: a) a Node monitor,which monitors the activities in a routing node, and b) anNoC monitor, which monitors the communication withinthe framework. Figure 1 shows the NoC framework mod-el.The processing architecture module consists of aprocessing element (PE) and a network adapter module.The communication infrastructure consists of the networktopology and the routing node. The communication pa-radigm describes the switching techniques and routingalgorithms employed in the NoC communication infra-structure. The monitor module includes two sub-modules: a node monitor, which monitors the activities ina routing node, and a NoC monitor, which monitors thecommunication in the framework. The traffic generator(TG) module injects packets (traffic) into the network. Itcan initiate either a request or a start of transmission fromthe top level. The TG also determines the type of traffic(uniform, hotspot, sporadic) as well as the source anddestination nodes for traffic flow. Different congestionscenarios and node failures can also be created throughthe TG. The design consists of a node monitor at eachrouter and an NoC monitor at the network level (see Fig-ure 1).The transaction monitor at each router contains informa-tion about the buffer count in each virtual channel andsends this information to the top NoC monitor and trafficcontroller. It also keeps track of PE status.
2.1. Processing Architecture
The processing element (PE) in the framework can be amaster PE or a slave PE. Only master PEs can initiate amessage transfer. Slave PEs respond to the requests fromthe master PE either by sending back the requested sig-nals/data or by saving the received information. In ourframework, UART, TIMER, Instruction/Data Memoryand slave processors are considered as slave PEs, and themaster PEs and slave processors are capable of perform-ing computational operations.
2.2. Communication Infrastructure
The communication infrastructure consists of a routingnode and network topology. The routing node consists ofa link controller and a router. The link controller (LC)provides an interface between the NA and the NoC. Itsmain function is to match the NA clock rate with that ofthe network topology. Routing nodes run at four timesthe frequency of PEs. Synchronization registers are usedto match clock rates between the slow PE and fast routingnodes. First-in first-out (FIFO) buffers are also added inthe LC to store data packets from the network beforetransmitting to adjacent PEs.
2.3. Communication Paradigm
In order to forward the message/packet, the imple-mented NoC framework can choose either the Store andForward (SF) switching technique or the Wormhole (WH)switching technique. In SF switching, the message can besent either as packets or in the form of flits. Each flit iscontains 25 bits.When the message is transmitted as flits, each routingnode will wait until the entire message is received beforeprocessing the HEADER. The end of the message/packetis determined by the TAIL flit. In Wormhole routing, themessage is transmitted as soon as the HEADER is availa-ble. The path is determined from the HEADER as itmoves through the network. The remaining flits followthe same path. The path is disconnected when the TAILflit is received (see Figure 2).
2.4. Monitor Module
Every routing node in the NoC is connected to a “Nodemonitor,” which connects to a top-level monitor calledthe “NoC monitor.” The main function of the NoC moni-tor is to collect information from individual Node moni- 
Fig.1.
NoC Framework 
Fig.2.
Packet Format
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG2
 
 
tors regarding the traffic. The Node monitors generatecontrol information based on the buffer conditions of thatrouter node. The Node monitor uses a few ON/OFF sig-nals, such as FAIL, FULL and ALMOST FULL, to com-municate with the NoC monitor.
2.5. Traffic Generator Module
The Traffic Generator (TG) module is responsible for ge-nerating different traffic distribution in the network.
3 SYSTEM
 
T
OPOLOGY
 
Since the ability of the network to efficiently disseminateinformation depends largely on the topology, we espe-cially focus on different types of Topologies:
3.1 Fat-tree (hyper-tree)
A typical fat-tree (see fig.3) assumes a 4-ary tree structurewith link capacities doubling up the levels of the tree. Thefat-tree is the first proved universal network [7].The architecture "fat-tree" (hyper-tree) is offered byCharles E. Leiserson in 1985. Processors are localized inleaves of a tree while internal units of a tree are groupedin an internal network. Sub-trees can communicateamong themselves, not mentioning higher levels of anetwork.K-ary n-trees are implemented by using identical switch-es of a fixed radix. The number of stages is n and k is thearity or the number of links of a switch that connect to theprevious or to the next stage (i.e., the switch radix is 2k).Notice that k-ary n-trees are bidirectional MINs. A k-aryn-tree connects N = k
n
cores using nk
(n-1)
switches and2nk
n
−k unidirectional links.
3.2 Fat-pyramid
The fat-pyramid inherits the 4-ary tree framework ofthe fat-tree and adds a mesh on each level of the nodes upthe tree. The fat-tree is universal only under unit wiredelay condition; its universality does not hold undernonunit wire delays while the fat-pyramid has beenproven to be universal under both unit and nonunit wiredelay conditions [8].
 
The fat-tree has been used in the CM-5 parallel com-puter whereas the fat-pyramid has not been adopted forany machine.Another clear advantage of the fat-pyramid over thefat-tree is its better absolute efficiency due to its hierar-chical meshes. But these same meshes of the fat-pyramidreduce its scalability, increase its wire usage considerably,and make it not scalable to represent a distributed net-work. [9]
3.3 Fat-stack
The fat-stack is a hierarchical network, consisting oftiers or levels. Each level has one or more sub-networks.Each sub-network is a ring of n nodes. A graphical repre-sentation of a GFS is shown in Figure 1. A fat-stack canhave arbitrary levels of rings.Figure 5 shows a fat-stack topology that has threenodes in a subnetwork. Each subnetwork connects to itsupper level via a node by a single link. Dashed linesrepresent tier boundaries. Link capacities double up- 
Fig.3.
(A)"Fat-tree" cluster architecture (b)"Fat-Tree" top view(Layout).
Fig.4.
Pyramid with three levels and 4× 4 base along with its 2Dlayout.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 4, APRIL 2012, ISSN 2151-9617https://sites.google.com/site/journalofcomputingWWW.JOURNALOFCOMPUTING.ORG3

Activity (9)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads
Liu Wen liked this
edemkb liked this
Ghassan Akrem liked this
edemkb liked this
edemkb liked this
edemkb liked this

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->