Professional Documents
Culture Documents
Efficient Mapping of Heuristic Packet Classifier On Network Processor Based Router To Enhance QoS For Multimedia Applications
Efficient Mapping of Heuristic Packet Classifier On Network Processor Based Router To Enhance QoS For Multimedia Applications
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 107
Abstract— Packet classification is an important function performed by network devices such as edge router, firewalls and intrusion detec-
tion systems to provide QoS and network security. With its complexity, the exponential growth of link speed and diversified services offered by
the Internet, Packet classification is becoming a major bottleneck in the router performance. Traditional router performs this classification
using Application Specific Integrated Circuits (ASICs), which suffers from lack of flexibility. Powerful Embedded Network Processor (NPs) a
flexible and cost efficient network appliance has been introduced by many companies that can be an alternative to implement the packet
classification at nearly link speed. The objective of this paper is to design and implement a new low complexity packet classification algorithm
of heuristic type named Trie based Tuple Space Search (TTSS) and to efficiently map this Packet classifier component on Network Proces-
sor based router. The performance is evaluated using Intel’s IXP2400 NP Simulator. The results demonstrate that, TTSS outperforms the
other heuristic packet classification algorithms. Parallel mapping of TTSS on Network processor based router gives better performance than
its pipelined mapping which is more suitable to enhance QoS for multimedia applications.
Index Terms — Multimedia, QoS, Packet Classification, TTSS, Network Processor, and IXP 2400.
1 INTRODUCTION
placed in routers to execute various network related tasks the microblocks are arranged in a pipeline. For example,
at packet level. The design and development of routers when the arrival of a new packet is being handled by a
using Network Processor has gained significance due to receive microengine, an existing packet can be classified
its high performance. and transmitted by the respective microblocks simulta-
Network Processors (NPs) have a high- neously, which entails interaction between microblocks in
performance parallel processing architecture on a single the form of pipeline. Also the impact of different design
chip which is more suitable for detailed packet inspection, mappings namely parallel and pipelined mapping of the
processing having complex algorithms and forwarding at packet classification on the Microengines has been ex-
wire speed. Network Processors have a set of hierarchi- amined and the performance measures show that the pa-
cally distributed memory devices, a set of on-chip proces- rallel design mapping of Microengines has better packet
sors (Micro Engines) to carry out packet level parallel processing rate than the pipelined mapping.
processing operations through multitasking and multith-
readed programming [16]. Each Micro Engine (ME) has The remainder of this paper is organized as fol-
multiple hardware thread contexts that enable thread lows: Section 2 presents the background concept of packet
context switches with zero or minimal overhead [3]. Mi- classification. Section 3 and 4 describes the proposed
cro Engines can examine and forward packets indepen- packet classification algorithm and its implementation
dently without using the host processor, bus, or memory. details. The performance analysis is presented in section 5
All these features reveal that, Micro Engines in the Net- and conclusion in Section 6.
work Processor can be assigned different packet
processing functionalities so that classification of packets
can be done efficiently to provide QoS. Studies [16-19]
have focused on implementing the networking services
using programmable network processors.
The work presented in this paper is based on the
fully programmable Intel ® IXP 2400 processor. The In-
tel® IXP2400 is a member of the Intel’s second generation
network processor family. The architecture of this inte-
grated network processor IXP2400 [4] shown in Figure 1
has a single 32-bit XScale core processor, eight 32-bit Mi-
croEngines (MEs) organized as two clusters, standard
memory interfaces and high speed bus interfaces. Each
ME has eight hardware-assisted threads and 4KB local
memory. Each microengine has 256 general purpose reg- Figure 1. Architecture of IXP2400
isters that are equally shared between eight threads. Mi-
croengines exchange information by using either an on-
chip scratchpad memory or 128 special purpose next
2 RELATED WORK
neighbor registers. Data transferring across the MEs and
locations external to the ME, (for eg DRAMs, SRAMs etc.)
are done by using 512 Transfer Registers. The Xscale core
is responsible for initializing and managing the chip,
T his section presents a brief overview of the Packet
Classification algorithms. Survey on packet classifica-
tion algorithms [5], [6], [7] and [8] shows that the
handling control and management functions and loading Packet Classification problem is inherently hard. Linear
the ME instructions. search is the simplest method of packet classification that
IXP2400 chip has a pair of buffers (BUF), Receive uses linked list to perform searching through a set of
BUF (RBUF) and Transmit BUF (TBUF) that are used to rules. If ‘N’ is the number of rules, then its spatial and
send / receive packets to / from the network ports with temporal complexity is O (N). Tuple Space (TS), a heuris-
each of size 8 Kbytes. The packets are injected into the tic for packet classification [9] maps the set of rules to a
Network Processor from the network through the Media Tuple that is stored in a Hash table. As the number of
Switch Fabric Interface and then forwarded to the Micro- distinct Tuples is much lesser than the number of rules,
Engines for processing. All threads in a particular ME even a simple linear search in the Tuple Space provides a
execute program code called microblock stored in the significant increase in speed of packet classification. It has
local memory of that ME. Finally the processed packets the time complexity reduced with order O (Wk) where W
are driven into the network by Media Switch Fabric Inter- is the length of the IP prefix and ‘k’ is the number of fields
face (MSF) at output port. used in classification. Another search namely Pruned
In this work, the heuristic Packet clas- Tuple Space Search (PTS) [10] has the time complexity
sifier named Trie based Tuple Space Search(TTSS) has reduced further by performing searches on the subset of
been proposed and implemented on the IXP 2400 Proces- the Tuples. Though any field or combination of fields
sor with two supporting microblocks namely Packet Re- may be used for pruning, it is found that pruning on
ceive and Packet Transmit. In the router, packet arrival source and destination address strikes a favorable balance
and departure can occur simultaneously; hence to utilize between the reduction in number of Tuples and overhead
the parallel processing of Network processor effectively, for the pruning steps. Tuple-pruning algorithm is able to
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2010, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 109
achieve good performance in practical environment: Matching is carried out at first level for Protocol and Pre-
however its worst-case speed is not guaranteed. Trie fix Matching is carried out at the next level for the IP ad-
search is another most popular high speed IP route look- dress.
up scheme for packet classification that uses Tree data Taylor and Turner have given useful characteris-
Structure with individual bit look-up[11]. tics of the filter [22], by analyzing rule sets provided by
Some of the trie-based algorithms follow the hie- ISPs, network equipment vendor, and other researchers
rarchical approach of the packet classification which re- working in the field to verify and compare the perfor-
cursively performs search in each field. In [12-15], trie mance of the proposed classification algorithms. From
based classification algorithm suitable for high speed two [22] it is observed that even though the transport-layer
dimensional packet classifications have been described. A fields have a wide variety of specifications, the most
one-dimensional 1-bit trie is a binary tree like structure, common are TCP (49%), followed by UDP (27%), the
in which each node has two element fields, le(the left wildcard (13%), ICMP (10%) and the other protocols such
element) and re (the right element), and each element as OSPF, IGMP, etc are lesser than 1% of the filters. Be-
field has the components child and data. Branching is cause of the small number of protocol types, node of the
done based on the bits in the search key. If the ith bit of the trie at level 1 splits the rule set based on the protocol field
search key is 0, then at level i child branch followed at a of the header that reduces the number of rules to be
node is from left-element (the root is at level 0); other- searched to 50% at the next level of the trie. The sample
wise, from right element. In One-Dimensional Multibit rule table and the associated data structure of TTSS are
Tries the branching is based on the number of bits known shown in Figure 2.
as “stride”. Multi Dimensional Multibit Trie (MDMT)
search [13-15] is a Trie Search where the packets are clas- Moreover the speed and efficiency of several
sified by searching the Destination Trie, Source Trie, Pro- longest prefix matching and packet processing algorithms
tocol Trie and Port Trie at different levels using stride. It depend upon the number of unique prefix lengths and
has time complexity O (W/L) and space complexity O (N the distribution of rules across those unique values. A
(W/L) 2(k-1)), where ‘L’ is the average length of strides. majority of the rule sets specify fewer than 15 unique pre-
Though there are several algorithms specialized for the fix lengths for either source or destination address prefix-
case of rules on two fields (e.g. Source and destination IP es [16]. The number of unique source/destination prefix
address only), it is necessary to design a packet classifica- pair lengths is generally lesser than 32, which is smaller
tion algorithm that uses more number of header fields to compared to the filter size and 8% of the rules are redun-
provide support for diversified services with the re- dant. According to classic IP addressing structure, in [11]
quirement of both low memory space and low access it has been shown that most of the rules ignore subnet-
overhead[21]. A new low complexity Trie Based Tuple ting.
Space Search (TTSS) Packet classification algorithm has Based on these observations [16] and [22], each
been proposed to give a remarkable enhancement to the node of the trie at level 2 has been constructed with mul-
existing Trie based and Tuple based algorithms to sup- tiple elements referring hash tables of prefix pair and pre-
port QoS of multimedia applications. fix length to further reduce the lookup time by reducing
the search space. A node with prefix length ‘w’ has 2w
element fields. All matching candidate rules are identified
by using the destination prefix length ‘w’ as a search key
3 TRIE BASED TUPLE SPACE SEARCH
and for those candidates, rules are further filtered using
ALGORITHM the source prefix field. Left most element of the node has
DRAM for performing packet classification. Packet clas- and destination addresses and exact match is used for
sifier validates the IP header of esch incoming packet protocol flag.
based on RFC 1812[5]. If the validity check fails, then the
packet is dropped. Otherwise, the packet is classified into 5.1 Design Mappings
different traffic flows based on the IP header and is en- The implementation of the classification phase on
queued in the respective queue using the proposed pack- the IXP2400 can be done in different ways namely parallel
et classification algorithm. For each valid packet, the mi- mapping and pipeline mapping. In both the cases, Pack-
cro block then builds a hash input from the header and et Rx and Packet Tx are processed by microengine 0
then compares IP header fields with the rule stored in the (ME0) and microengine 3 (ME3) and the microengines
hash entry. If a matching entry is found then the classifier ME1 and ME2 are used for classification purpose. In pa-
writes selected dispatch loop variables with data stored in rallel mapping, all the classification steps for a single
a hash entry and enqueues the packet in the queue ac- packet are processed by the single microengine and hence
cording to class_id field of the rule for further processing ME1 and ME2 perform the classification of different
by the router. Otherwise, if matching fails the algorithm packets simultaneously.
loads the next hash entry in a chain as indicated by In pipelined mapping, the classification of a single packet
next_entry_ptr and repeats the entry matching procedure. is done by two different micro engines. ME1 and ME2,
If a classifier reaches end of chain without finding a one is for Field Extraction and the other is for Header va-
matching entry, a default rule is applied. lidation (according to RFC1812 [5]) and for table lookup.
Then the Packet Transmit microblock is executed
by microengine (ME3) that moves packet into TBUFs for
transmitting over the media interface through different RX RX
ports. The MSF is monitored by the packet Transmitter
microblock to stop the transmission on that port if the
TBUF threshold for specific ports has been exceeded and PC
if so it queues up the requests to transmit packets on that
port in local memory. The Packet Transmitter microblock PC PC
periodically updates the classifier with information about PC
how many packets have been transmitted.
20
0
5 PERFORMANCE EVALUATION 20000 40000 60000
packet classification algorithms and hence the perfor- This is due to the fact that in pipelined mapping, a new
mance of the different design mappings is analyzed with packet cannot be handled by microengines earlier in the
TTSS classification algorithm and is shown in Figure 6. pipeline until the availability of inter-microengine buffer
entries. These entries are available only when the entire
T H R O UG H P UT
processing for that packet is completed by all microen-
250
200
gines.
150
P A C KE T S E N T / R E C E IV E R A T IO
100
50
1.2
0 1
LS TSS P TS TTSS 0.8
A lgo rithms 0.6
0.4
P ipelined M apping P arallel M apping
0.2
0
LS TSS PTS TTSS
Figure 6 Throughput Packet Classification Algorithms
I D LE T I M E ( M E 2 )
90
80
70
60
50
40
30
20
10
0
LS TSS PTS TTSS
A lgorit hms
rithm at line speeds to provide QoS and Network securi- Two-Dimensional Multibit Tries,” http://www.cise.ufl.edu/~wlu/ pa-
pers/p-2dtries.pdf, 2008.
ty. This paper describes the design of Packet Classifier [15]. W. Lu and S. Sahni, “Packet Classification Using Space-
component on Network based Router and its perfor- Efficient pipelined Multibit Tries,” IEEE Transactions on Computers,
mance enhancement. The proposed low complexity heu- Vol. 57, No.5, May 2008.
ristic Trie based Tuple Space Search (TTSS) packet classi- [16]. Intel Corporation. Intel Network Processors Product Infor-
mation. http://www.intel.com/design/ network/ products/npfamily.
fication has been implemented on Intel’s IXP 2400 Net- [17]. “Microengine version 2 (MEv2) Assembly Language cod-
work Processor to improve the performance of Router. ing Standards. Revision 1.01g” Intel Corporation, June2003.
By dividing the tuple space into multiple subspaces, the [18]. “Intel IXP2400/IXP2800 Network Processors - Develop-
space complexity and the time complexity achieved by ment Tool User Guide” Intel Corporation, March 2004.
[19]. Intel Corporation, “Intel IXP2400 & IXP2800 Network Proces-
TTSS is O (N) and O (log W) respectively. The implemen- sors Programmer’s Reference Manual” 2004.
tation results lead to the observation that Throughput of [20]. John.L.Hennessy and David.A.Patterson” Computer Archi-
Trie based Tuple Space search (TTSS) is almost 60% more, tecture Quantitative Apporach “3/e Morgan Kaufmann Publishers
2003.
compared to TSS and PTS. In this work, the performance [21]A.Feldman and S.Muthukrishnan, “Tradeoffs for Packet Classifi-
of TTSS is also evaluated using pipelined as well as paral- cation” in IEEE INFOCOM, pp. 1193-1202, March 2000
lel design mapping. The results show that the speedup [22]. D.E Taylor, J.S Turner “Class Bench, A packet classification
factor of TTSS in parallel mapping is 2.18 and in pipelined Benchmark”, in IEEE INFOCOM, vol 3, March 2005, PP. 2068-
2079.
mapping is 1.52 compared to TTSS classification using [23]. Pankaj Gupta and n.Mckeown, “Packet classification on mul-
single microengine. Results prove that classification rate tiple fields”ACM SIGCOMM, pp. 147-160, August 99.
in TTSS is 2788 KPPS (Kilo Packets per Sec) and 1937 [24]. Atsushi Yoshioka, Shariful Hasan Shakot, and Min Sik Kim
“Rule Hashing for Efficient Packet Classification in Network Intrusion
KPPS in parallel and pipelined mapping respectively. Detection”, IEEE 2008.
Moreover, the pipelined design mapping has a packet
processing rate of 31.25% lesser than the parallel map-
ping, primarily due to multiple memory reads per packet
in the latter. As compared with the pipelined mapping,
parallel mapping of TTSS classifier can provide higher
Throughput and classification rate. Thus the work sug-
gests that TTSS based packet classification in parallel
mapping is efficient for enhancing QoS of multimedia
applications.
Mrs. R. Avudaiammal has received her
B.E. degree in Electronics and Communication Engineering from
REFERENCES Madurai Kamaraj University, India in 1992 and M.E. degree in Ap-
[1]. Michael Coss and Ron Sharp, “The Network Processor plied Electronics from Bharathiar University, India in 2000. She is an
Decision”, Bell Labs Technical Journal, pp: 177-189, 2004 Associate Professor at St.Joseph’s College of Engineering, Chennai,
[2]. Douglas.E. Comer, “Network Systems Design Using Net- India. She has 17 years of teaching experience. She has published
work Processors”. Pearson Education, 2003.
[3]. E.J. Johnson and A.R. Kunze, “IXP2400/ 2850 Program- books on Microprocessors with Dhanpatrai publication and on Infor-
ming”, Intel Press, 2004. mation coding Techniques with TMH Publishers. She is currently
[4]. Intel IXP 2400/ IXP2800 Network Processor “Hardware pursuing her research at Anna University Tiruchirappalli , India. Her
Reference manual”, Intel Corporation 2003. research interests are in Embedded systems, Multimedia Networks
[5]. F. Baker “Requirement for IP Version 4 Router” June 1995. and Network Processor .
Intenet Ingineering Task Force, ftp://ftp,ietf.org/rfc/rfc1812.txt.
[6]. David E. Taylor “Survey & Taxonomy of Packet Classifica-
tion Techniques” ACM Comput. Survey. Vol 37, No 5, pp. 238-275,
Sep 2005.
[7]. Pankaj Gupta and Nick Mc Keown “Algorithm for Packet
Classification” IEEE Network Magazine, Vol 15,no2,pp.24-32, Apirl,
2001.
[8]. M.A. Ruiz-Sanchez, E.W. Biersack, W.Dabbous, “Survey
and Taxonomy of IP Address Lookup Algorithms” IEEE Network,
Vol.15, No2,.pp.8-23,April 2001.
[9]. V.Srinivasan, S.Suri and G.Varghese “Packet Classification
Using Tuple Space Search” ACM SIGCOMM, September1999,
pp.135-146.
[10]. Pi-Chung Wang, Chia-Tai Chan, Chun-Liang Lee, and
Hung-Yi Chang “Scalable Packet Classification for Enabling Internet
Differentiated Services” IEEE Trans, Multimedia, vol.8, no.8,
pp.1239-1249, Dec 2006. Dr.P. Seethalakshmi has received
[11]. Stefano Giordano, Gregorio Procissi, Federico Rossi, and her B.E. degree in Electronics and Communication Engineering in
Fabio Vitucci “ Design of a Multi-Dimensional Packet Classifier for 1991 and M.E. degree in Applied Electronics in 1995 from Bhara-
Network Processors” in Proc. IEEE ICC 2006, pp. 503-508. thiar University, India. She obtained her doctoral degree from Anna
[12]. W.Lu and S. Sahni, “Efficient Construction of Pipelined University Chennai, India in the year 2004. She has 15 years of
Multibit-Trie Router Tables,” IEEE Trans. Computers, vol. 56, no. 1,
pp. 32-43, Jan. 2007. teaching experience. She is Director/ CAE, Anna University Thiruchi-
[13]. W. Lu and S. Sahni, “Packet Classification Using Two- rappalli. Her areas of research includes Multimedia Streaming, Wire-
Dimensional Multibit Tries,” Proc. 10th IEEE Symp. Computers and less Networks, Network Processors and Web Services.
Comm., 2005.
[14]. W. Lu and S. Sahni, “Packet Classification Using Pipelined