You are on page 1of 4

Three New Approaches to Command Handling in

NetFPGA Hardware Simulators with Custom


Internal Architecture
Anonymous Author, IEEE Member
Some Technical University

Abstract—Simulations in typical software take a long time.


To make them faster, it is possible to use hardware accelera-
tion. Some people use graphical processors (GPU) which are
faster than typical CPU (Central Processing Unit). Much more
faster and more effective is to prepare Application Specific
Integrated Circuit (ASIC). But it costs a lot of money and also
takes time. Nowadays, optimal solution is to use FPGA chips
(Field-Programmable Gate Array) which allow to prepare pro-
grammable hardware. It is flexible and fast. It requires specific
treatment, tool chain for programming and testing, but finally,
it gives powerful platform for calculation and data processing.
In this paper I will show treatment of new architecture in terms
of command handling. Commands are sent to hardware as IP
packets and after calculation results should be collected. There
are at least 3 ways for this.
Fig. 1. NetFPGA 10G Card

I. I NTRODUCTION
II. N ET FPGA C ARDS
For internal structure of NetFPGA card you can use your
own project or you can modify one of prepared demo projects
which are available on website [1]. In my proposal I focus NetFPGA cards (Figure 1) have been developed by Uni-
on efficient usage of FPGA chip performance. I assumed versity of Cambridge and Stanford University networking
new internal structure of code, which allows me to realize groups as an "open hardware" project [1]–[3]. NetFPGA card
more than one calculation in the same time. In hardware, is an extension card for PC, it has four physical network
you cam dedicate particular part or several ones for some, interfaces (4 × 1Gbps or 4 × 10Gbps electrical or optical
very strict, defined tasks. If designer multiply hardware, the (SF P +) Ethernet ports) and FPGA - programmable hardware
efficiency will be also multiplied. By simple multiplication of as a main chip. More details about structure, architecture and
calculation part, we can obtain system of parallel calculating usage of such a card can be found in the literature. What is
engines. We have to control them by sending parameters of very important, there is a public framework which implements
tasks and collecting results of calculations. In this paper I will basic network functionality (routers, switches and interface
show three approaches for sending commands and receiving cards) and it is relatively easy to add our own functionality.
results, where both are placed in IP packets. I use NetFPGA In the literature, there is a lot of information about typical
card, which can be programmed for serving ethernet traffic. usage and projects prepared by NetFPGA community [4]–[6].
But, in new architecture, the programmable core will be In primary assumption which is presented in Figure 2a,
changed, only the outer view will be realized as in primary the main chip of NetFPGA card receives packets from phys-
assumption, i.e. with ethernet frames and IP packets. Due ical ports (eth0-eth3) as input traffic, analyzes and modifies
to strict timing, commands and mainly responses, should them, and sends them out to physical ports as output traffic.
be processed in a proper way. The rest of this paper is Functionality of main chip is programmable and can be
organized as follows: Section 2 gives description of NetFPGA very flexible. The source and destination of traffic can be
cards, Section 3 presents usage of NetFPGA cards as a also logical interfaces visible in operating system as nf0-
calculating engine with proposed internal structure, section 4 nf3 interfaces. Very important fact is that this functionality
describes software tools, and Section 5 defines approaches to is realized by hardware, hence, the performance of such an
sending and receiving commands via IP packets, last section appliance allows to serve whole traffic with speed of line, in
summarizes paper and signalize future works. case of these Ethernet ports it can be 1Gbps or 10Gbps.
Fig. 2. Schema of modules on NetFPGA card and inside FPGA chip and modified schema of NetFPGA card - modification is only in FPGA chip

Fig. 3. Internal pipeline of modules on NetFPGA card and inside FPGA chip and modified schema of NetFPGA card - modification is only in FPGA chip

III. H ARDWARE PART AS A SIMULATOR between primary and proposed architecture is to use parallel
custom modules (Figure 3b), where each of them can calculate
In my apporach, which schematic is presented in Figure
its task in the same time. The speedup is not necessary, the
2b, there is no typical modules in FPGA chip. Only commu-
maximal speed of calculation is used by multiple modules in
nication with operating system through one interface is used,
the same time, hence, the final performance of calculation it
it is visible as one active interface in the operating system.
is a product of performance of one module and number of
All the traffic sent to this interface is visible by all modules,
modules. Proper control and timing mechanisms should be
they identify traffic dedicated to them by given IP/MAC
use to send requests for particular modules and also adequate
addres or even VLAN ID or UDP port. The return path for
way of receiving responses has to be implemented (in module
data is realized analogically, each module sends traffic with
with name output module arbiter). It reflects on functionality
given parameters. The control application on operating system
realized in the software part.
serves this interface in promiscuous mode and receives all
frames. Figure 3a presents primary pipeline, where 8 incom- The same FPGA chip (Virtex 5) is used on developers
ing queues and 8 outgoing queues are implemented, because boards ML555 [7] and ML505 [8] (Figures 4 and 5) which
they are related with serving traffic from two representation of are available in my laboratory. So the same procedure and
4 interfaces in two directions. In proposed architecture only architecture can be used for project for them. On those boards
one buffer is realized, because nature of transferred frame is there are only 1Gbps Ethernet SFP cages, but it is not a
different. In primary version, 8 incoming traffic streams was problem, because new architecture is focussed inside fpga
fully served thanks to internal speedup. The main difference chip, which is the same in all of three analyzed boards.
of work. When the simulator is prepared and configured (i.e.
IP/MAC addresses used in software part are also implemented
in custom modules), tasks can be sent to hardware. When
simulator decides to run some task and module for this task
is ready, parameters of tasks are packed into IP payload
and sent to NetFPGA card as typical IP packet packed in
Ethernet frame. This block of data, after passing DMA in
PCI bus, is placed in input buffer and it is visible by every
custom module. There is only one module which has matching
IP/MAC addresses, so, only one custom module will read
Fig. 4. ML555 developers board from Xilinx
this data. The content of IP payload is used as a parameters
for calculations. After finishing calculations, when results are
ready, they are packed into IP packet in order to be received
by DMA and software part. But, the output of custom modules
has to be served in proper way to omit problems. Different
types of problems causes different approach to solving them,
realization of different approaches is described further.

Approach 1: Ask one module and wait


The simples idea assumes that we will send request and
Fig. 5. ML505 developers board from Xilinx wait for response. One module in FPGA receives tasks, after
their realization it sends out results as IP packet. It will
IV. S OFTWARE PART OF THE SYSTEM work, but not efficiently. In the same time only one module
will realize calculations. This approach is relatively easy in
Basic usage of this idea assumes that every custom module
implementation, it can be realized for project, where is no
has its own MAC and IP address. All these addresses are in
focus on parallel calculations.
network which is related with nf0 interface. When operating
system (basing on typical routing table) sends IP packet,
proper module receives ethernet frame with this packet. Approach 2: Ask dedicated module and monitor
Proper configuration of IP/MAC relation in ARP table of host Second approach assumes that simulator can send requests
should be realized, but it is realized automatically by script for modules before they finish their tasks. They respond with
during addressing configuration. It is possible to control more information about progress (for example "75% done") or final
than one card from the same control application (in my labo- results if they are ready. This approach is quite good, it
ratory I have 5 NetFPGA 10G cards), proper configuration of allows to run calculations on multiple modules in the same
IP addresses, IP networks and routing information has to be time. The software does not wait for end of calculations,
realized. In my case, first assumption was to use Omnetpp just questioning for actual state of them. In this case control
software and prepare software modules related with their mechanism are more complicated, but efficiency is very good.
hardware support, i.e. each module (node from Omnetpp) All possible decisions and control of communications is
sends calculation commands for his own calculating module realized in software. Figure 7 shows example view from
in FPGA via IP protocol. It was possible but actual version software part of simulator which presents online gathered
of my application is not related directly with Omnetpp, but results.
many functionalities are copied and they are implemented in
my own software (Figure 3). In the Figure 4 there is shown Approach 3: Ask dedicated module and wait
example of network which can be realized in one fpga chip.
Each switch is represented by one module. In general case, We can also imagine situation when module generates IP
this system is designated for analysing bigger topologies of packet with results just after ending calculations. It looks
IoT devices. interesting, but it is possible that more than one module will
generate packet in the same time (or almost in the same time).
V. C OMMANDS AND RESULTS IN IP PACKETS For proper serving such a cases, dedicated output module
To simulate some system its model has to be prepared. arbiter is necessary. In my approaches, I prefer to use simple
The calculating part will be realized in hardware, it means hardware and more sophisticated functionality is realized
that in VHDL or Verilog modules have to be defined and rather in software than in hardware for two reasons: 1) It
implemented in FPGA chip. The software part should also is easier and faster, 2) and also I can save hardware resources
reflect analyzed system in its internal structures, i.e. model for bigger number of modules in hardware. Moreover - in
in hardware has to reflects model in software and vice versa. this case, software part has no full control of whole process,
This process is complicated and time consuming, it costs a lot because some decisions are taken in hardware.
ACKNOWLEDGEMENTS
I would also like to thank (in alphabetical order):
• NetFPGA Teams from Stanford University and Uni-
versity of Cambridge [1] for their support and help in
organizing NetFPGA workshop at "my anonymous uni"
and overall work with NetFPGA cards;
• Xilinx University Program [9] for donating to My
University of Technology five NetFPGA cards with 10G
interfaces.
R EFERENCES
[1] Website of community and NetFPGA project: http://www.netfpga.org.
[2] G. Gibb, J.W. Lockwood, J. Naous, P. Hartke, and N. McKeown.
“Netfpga: An Open Platform for Teaching How to Build Gigabit-rate
Network Switches and Routers”, In IEEE Transactions on Education,
Volume: 51, Issue: 3 pp. 364-369, August 2008.
[3] N. Zilberman, Y. Audzevich, G.A. Covington, A.W. Moore: NetFPGA
SUME: Toward 100 Gbps as Research Commodity, IEEE Micro, vol.34,
no. 5, pp. 32-41, Sept.-Oct. 2014, doi:10.1109/MM.2014.61
[4] NetFPGA reference projects: http://www.netfpga.org/project_table.html.
Fig. 6. Example of GUI of control application for network simulation [5]
[6]
[7] Virtex-5 FPGA ML555 Development Kit for PCI and PCI Express
Designs https://docs.xilinx.com/v/u/en-US/ug201
[8] ML505/ML506/ML507 Evaluation Platform User Guide -
https://docs.xilinx.com/v/u/en-US/ug347
[9] Xilinx University Program: http://xilinx.com/support/university.html.

Fig. 7. Example of result view from software when second approach is


realized

VI. C ONCLUSIONS AND F UTURE W ORK

The main chip of NetFPGA card can be programmed.


Although, Designers of NetFPGA cards assumed them to
serve network traffic, it can be used as a almost general
purpose hardware accelerator. When you have a practise with
demo projects for network devices (as switch, card, router),
which use MAC/IP addressing, registers, buffers and PCI
operations, it is easy to implement your own functionality,
which extends the already enormous possibilities even more.
As my future work I plan to use another FPGA chips and
prepare system of tools for convenient and user friendly
simulator. Also automated procedure for huge amount of
IoT devices organized in advanced topologies are considered.
Additional user friendly functionalities will be implemented
and added in software.

You might also like