You are on page 1of 6

A Summary on Characterizing Processor Architectures for Programmable Network Interfaces

Crossing points in communication


Nodes of Network
Goal : Speedup nodes(Faster
Equipment)
Characterizing on Network Processors
Application workloads
Emerging Applications

Processor Architectures
Out of order
Speculative Super scalar
processor
Finegrain multithreaded
processor
Single chip multiprocessor
Simultaneous Multithreaded
processor(Effective for NI
environment)
Current trend : Programmable microprocessors on network interfaces (PNI) that can be customized with domain-
specific software.

Fill with chip architectures designed specifically to match the network application workload of PNIs
what
workloads
must the
processor
architecture
support,
what level of
performance is
required,
what type of
architecture
provides the
required level
of
performance.
Performance Evaluation of Network processor
Architectures

Bottlenecks of current trend in Networking technology
The key
metric is the number of messages per
second
Instruction issue and
Execution
Cache accesses
Memory bandwidth and
latency
Memory contention between
the processor and DMA
transfers caused by network
send and receive operations.
Performance evaluation is done assuming accurate single cycle with respect to

Workflow

Identifying number of applications that can
considered as components of workload

Maximum sustainable link-rate

Best performance shown by architectures
designed for a high-degree of thread-level parallelism
Conventional application workload of such communication devices used to consist of simple packet forwarding and
filtering algorithms based on the addresses found in layer-2 or layer-3 protocol packets.
Current Trend Application workloads

Traffic shaping,
Network firewalls,
Network address and protocol
translations (NAT),
High-level data transcoding (e.g., to
convert a data stream going from a high
speed link to a low-speed link).
Load balance HTTP client requests over a
set of WWW servers to increase service
availability
Exploiting packet level parallelism at the architectural level to
achieve the sustained high-performance
Application domains for Programmable NIs
Server NI software,
Web Switching software
Active Networking software
Application specific packet processing routines
Packet
Classification
/Filtering
IP Packet
Forwarding
Network
Address
Translation(N
AT)
Flow
management
TCP/IP
Web
Switching
Virtual
Private
Network
IP Security
(IPSec)
Data
Transcoding
Duplicate
Data
Suppression
Process limited amount of data in protocol headers
Process all of the
data contained in
the packet
Packet processing is parallel in these applications
Ipv4(IP
forwardi
ng)
MD5(IP
Security)
3DES(IP
Security)
Benchmarks from workloads
Packets are delivered to the NI via
The host controller (out-bound packets) or
The network controller (in-bound packets)
Prediction
rates
Lower level performance metrics

Execution Environment of PNI
Store-process-forward mechanism

Store, Forward transfers data into and
out of NIs Buffer Memory

Process stage invokes application
specific handlers based on some
matching criteria applied to the message
Network
Processor
Host Controller
Network
Controller
Buffer Memory
Generic Programmable Network Interface Architecture
High throughput and low latency can be achieved making messages
pipelined through these stages
Superscalar (SS).
Deep Pipeline(7stages)
Score boarding and register renaming to solve dynamic
dependencies
Issued number of instructions in each cycle
Fine-Grain Multithreaded (FGMT).
Multiple hardware thread contexts
Extending core out of order super scalar
Exploit ILP within thread of execution
Improvement in system throughput
Round robin fetch and issue policy
Chip-Multiprocessor (CMP)
Separate Execution pipelines, Separate register files
Separate fetch units
Private L1 cache (Instruction and data)and shared L2 cache
Simultaneous Multithreaded (SMT).
Instruction fetch and issue from
multiple threads
Exploit both ILP and TLP

Experiments wrt to
Standalone application performance
Standalone operating system
overhead
OS governed application performance
Dynamic Discovery of ILP : Aggressive Superscalar
Tolerating blocked threads: FGMT
Observations
Simple replication: CMP
SS and FGMT have basically the same performance on workloads, and, likewise,
CMP and SMT have roughly equivalent performance that is 2 to 4 times greater.
Key : Scale both issue width and number of hardware thread contexts
Network processor workloads exhibit a high-degree of parallelism at the packet-level,
which represents an opportunity for high performance
SMT performs better than CMP and more than a factor of two better than FGMT and
SS by dynamically exploiting both instruction and thread level parallelism.
Question and Answer

1)What is DMA controller
A)Direct Memory Access (DMA) is one of several methods for coordinating the timing
of data transfers between an input/output (I/O) device and the core processing unit
or memory in a computer.

DMA saves core MIPS because the core can operate in parallel.
DMA saves power because it requires less circuitry than the core to move data.
DMA saves pointers because core AGU pointer registers are not needed.
DMA has no modulo block size restrictions, unlike the core AGU.

2)Specify the OS used for the experiment of OS overhead and its details

A)SPINE is the OS used for basic packet delivery equal rate on FGMT,SS,and SMT
architectures for standalone operating system overhead
It runs a single thread of execution

3)Which processor architectures exploit packet level parallelism in a better way
(consider first experiment set)

A)CMP and SMT clearly demonstrate their superiority over SS and FGMT in exploiting
the packet-level parallelism available within the workloads. The ability to issue from
multiple threads simultaneously is key to this scalable performance

You might also like