Professional Documents
Culture Documents
1286
0 1992 IEEE
~~
Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 24, 2009 at 16:21 from IEEE Xplore. Restrictions apply.
Singapore ICCS/ISITA '92
Fig 1.
NIU protocol stack
z
monollthlc NIU stack monolithic NIU stack monolllhlc NIU siack
111.1l.0. mull...
FIg ld. Monollthlc host alack. Fig. le. Multi-thread host alack, FIg 11. Multl-taaklng host stack.
Multl-thread NIU stack Multl-thread NlU alack Multl-thread NIU sIaCk
4.1 Overhead due to Protocol Purtilioning host or the NIU proccssing strcam is slow, data packets will get
queued in this buffer memory. I f the diflcrcnce in their processing
When all the protocol layers are running on a single processor, times is large, morc and morc packets will gel queued. At some point
passing of c o n k and daia between them-is easier. -Da;a may be 01 time, inter-layer flow control will be cnlurccd, or the faster stream
passed between layers using pointers in shared memory or through will gct blocked o n uvcrrunning the bulfcr space. I f the buffer
system area. The underlying operating system provides suitable memory is suflicicnlly large, then hlocking may nut be observed. but
primitives for inter layer communication and synchronization. When still response time for a packcl queued at the interface, will
protocol stack is partitiuncd. it CXCCUICSun separate processors. It deteriorate due to longcr queue lengths. The blocking will remove
may involve an addition:il copy of dxta, as data is cxchangcd across temporal parallelism bctwccn the host and ihc NIU processes.
the processors. Overhead of this copy is borne by the host processor Balanced protocol processing hy the host and the NIU will rcducc
o r by the NIU processor, Synchronizalion between the two queuing time of data packets at the inlcrl;ice, and will increase the
processors has to he explicitly done through interrupts, or through degree of purallclisnl bctwecn the hurt and the NIU processors.
1287
@ 1992 IEEE
Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 24, 2009 at 16:21 from IEEE Xplore. Restrictions apply.
Singapore ICCSllSlTA '92
OSINET I61 is an implementation of a subsct of ISO-OS1model for We have performed our work in unitasking host machines running
LANs. I t has a network kcrnel which providcs the hasic networking MS-DOS operating system. The host-NIU operating environment
support, and User Applications that run over the kernel. The looks like the one in figure la. The data and commands are
exchanged between the host and the NIU through the shared
network kernel consists of lower six layers of the OS1 reference memory on the NIU.
model. The MAC and the LLC type Ilayers form the data link layer.
Network layer is null. Transport class 4 is implemented to provide 7.1 Host-NIU Data Exchange
reliable connection oriented service. The session layer supports
dialogue management and synchronization. The presentation layer In OSINET. data cxchangc across the protocol layers i s done by
provides A S N l encoding and dccoding. The application layer rcfcrciice, using a buffer passing mcchmism. Two buffer pools are
consists of FTAM, CASE and D A S E services. A single control loop used. The OSINET handles the application layer messages in chunks
schedulesexecution of each laycr in round robin hshion.
of Woo bytes, which is scgmcntcd by the transport laycr in scgnicnts
of IWO bytes each. One of the buffer p o l s , callcd the session buffer
pool, thus has lixcd sized hullers of 2044 bytes each, enough to
6.0 l l i e Intelligent NIU
accommodate largest application layer message. The other buffer
pool, called the transport buffer pool, has buffers of size 1518 bytes
A n intelligent NIU, PC Link2 Network Interlace Adapter, from Intel
each, enough to accommodate a largest cthcrnct frame. The
Inc. was uscd in our work. The hoard is an ethernet interface NIU.
application laycr treats large ;ind small data differently, and copies it
providing is2586 ethernet co-processor, iLWlX6 proccssor and 256k
into a buffer from the appropriate buffer pool. This avoids an c n r a
hytcs local memory. Any arbitrary protocol software can be
copy of data. The host and the N I U protociil partitions use similar
downloaded and executed on the NIU. The hoard represents a
buffer structure. in their own address spiice. The data exchange
general purpose intelligcnt NIU, and hence was sclccted for the
experiments. k t w e c n the host and the N I U takes pi;icc through the buffer p(u)ls
dclined in the NIU mcmory spacc. The host protocol partition passes
6.1 The Host-NIU Coniniunieation data to thc N I U partition by writing i t intu appropriate buffer pool in
the NIU memory. Handshake variables arc uscd, which indicate to
The host machine can access entire 25hk hylcs of N I U memory, the host protocol partition. as to where the data meant for the NIU is
through an Rk window which is mappcd into its nddrcss space. Thc to bc written. This data is proccrsed by the NIU. and handed over to
N I U memory is shared by the hint processor. on-board XOIW and by the ethernet coprticcsror for transmission on the network. The data
the 82586. Access to the memory by the host m x h i n c i s through R- on the network is received by the CO-proccaa~ir into a transport huffcr
bit data path, while the NIU local processors access it through 16-bit pool buffer. This data is processed by the N I U and rcassemhled into
data path. Bccausc of this, and due t o the dual-ported a session buffer p l buffer if necessary. The NIU passes this data to
implementation. access to the N I U memory is somewhat slower for the host, hy posting the buller number and the buffer pool identity
the host processor, compared to the access to i t s local mcniory. The into the handshake variables. Figure 3 shows the typical data flow
host machine controls the w i d o w i n g mcchmisiii. and opcration of between the host and the NIU. The NIU offers shared memory
the NLA, through two control ports mapped in its IO spacc. The N I U option. but still we prefer a copy cif data across the intcrlace. This is
communicates with the hurt machinc through interrupt signals on the because of the slower access to the N I U memory from the host
standard PC bus. Apart from this, it cannot intcrlcre with the host machine, as explained in the earlier section.
opcratiiin.
E
Figure 3. Host-NIU Data Movement
H o s t Memory
NIU Memory
1 Network
linked to COProCeSsOr data s l r u c l s
1288
0 1992 IEEE
Authorized licensed use limited to: UNIVERSITI UTARA MALAYSIA. Downloaded on August 24, 2009 at 16:21 from IEEE Xplore. Restrictions apply.
Singapore ICCSllSlTA '92
73 Organization ortlie N I U Sonw;rre Each case of protocol partitioning represents different loads
on the host and the N I U processors. To compute the processing load
The software running in the N I U is organiicd into following three on each processor, we mcasurcd typical processing time for the
modules. individual protocol layer in each of the host and in the NIU, and the
1. Host Inlerhcs Module :This module handles interaction with the overhead at the interface. With these timings, partitioning of the
host machine. A corresponding module runs on the host machine. protocol processing load was calculated. Throughput offered by the
2. OS1 Protocol Softwure : This module consists of the protocol host-NIU system was measured for application message sizes of 16 to
laycrs of the OSINET software. 1024 bytes.
3. LAN Interfuce Module : Thih module consists of software that
handles the ethernet co-processor.
8 3 Observations
Figure 4
Comparative Throughput
10 100 1000
- AT,SessNIU
+ XT,SessNIU
Message size (Bytes)
+ AT.TransNIU
++ XT.TransNIU
-
-3-
AT.MacNIU
XT.MacNIU
AT: 2 5 M h z h o s t , XT: 8 M h z h o s t m a c h i n e
1289
0 1992 IEEE
7~~-
Singapore ICCS/ISITA '92
these cases, protocol partitions are well balanced in terms of their T h e advantages due to balanced protocul load sharing hold good for
execution times. No blocking of the host process is observed at the a typical host-NIU pair ofa network node. Our implementation used
interface. nie liighcr degree ofparu//elistii berwecii (lie liosr and rlir a general purpose NIU which gave pcrforniance gain by suitably off-
N / U p m e s s u r s msrr/ls iiiro higher rhroirglipirr. loading the host machine. Upto four times improvement in
throughput of a host-NIU system was obtained. by exploiting
In the 25 MHz host machine, the protocol layers arc migrated temporal parallelism bctwecn the host and the NIU processors.
from a faster host machine to a slower NIU. Case I of protocol Imbalance in partitioning of prutocol processing load deteriorated
partitioning represents 46% load on the host. while case II and case the performance by a factor of two in another host-NIU system.
111 represent 23% and 21% load on the host machine respectively. In a high pcrfurmancc NIU, migrating the protocol functions
Thus, case I reflects balanced partitioning of the protocol processing to the efficient NIU may give better results. even in the absence of
load. For this case, no blocking was observed at the host-NIU any parallelism. But the performance may be further improved if the
interface. This results in higher throughput than the other two cases host and the NIU processors exhibit higher degree of parallelism.
of partitioning. In cases I I and 111, host process having less Balancing the host and the NIU protocol processing times, will give
processing load, queues up packets at the interlace at a faster rate different protocol layer rcsidcncy solutions in different host-NIU
than the NIU can process them. Hence. it runs out of the buffer pairs. This is being investigated further.
space in the NIU and gets blocked at the interface for every packet
transferred to the NIU. Thus, the host-NIU communication reduces
to a stop-and-wait type of protocol. The host process has to wait for a Hrrcrencrs
buffer and throughput of the systcm drops down compared to that
obtained for case 1. These two cases emphasize the loss in
performance caused by blocking of the host process. The blocking [ I ] G. Chcsson, "XTP/PE Design Considerations', in Profocdsfor
can bc avoided by balancing the protocol processing load. Hi811 Specd Ncrwurks, H. Rudin & R. Williamson (Eds), Elsevier
The performancc gain observed due to balanced protocol load Scicncc Puhl. 1989, pp 27-33.
partitioning, is available lor all sizes of application message. Our
experimental results show th:it, by appropriate partitioning of [ 2 ] D. Clark er U/, "Architectural Considerations for a New
protocol layers, temporal parallelism between the host and the NIU Generation ol Protocols". in Proc. 01ACAI SIGCOAIAI 'YO, l W , pp
processors can bc exploited and thruughpul of the network node can 2lXl-2IM.
be improved. For the 8 MHz host machine. up to 4 times increase in
throughput is obtained due to b:ilanccd sharing of protocol load of [ 3 ] D . (iiarrizyo el a / , 'High S p e e d Parallel Protocol
cases I I and 111. For the 25 MHz host machine, the blocking at the Implcmcntation", in Pr<~focol.~
Jor High Slxd Nc,lworks, ti. Rudin &
interface observed in cases II and 111 of layer partitioning, reduces the R. Williamson (Eds), Elsevier Science Puhl. 19x9, pp 27-33.
throughput by a factor of 2 compared t o the throughput obtained in
case 1. Balaticiiig 111e protocol processiil&! loud srrgesls diflemiit 141 A. l d o u c er U / , 'Design a n d implementation o f OS1
prolocol /ayer resideticyfur diflemtif host-NlUpairs. Communication Board for Personal Computers and Workstations',
in Proc. u/lCCC '90, New Delhi, India. lW1, pp 585-592.
9.0 Summary [SI H . Kanakia el ul, "The VMP Nctwork Board (NAB): High
Performance Network Communication for Multiprocessors", in Pmc.
The analysis or the host-NIU system presented in this paper is ofacMslccoAinis w , I ~ X Xpp, 175.187.
applicable to any of the host-NIU systems of figure 1. No assumption
is made about the host machine, the NIU Or the PrO~ocolsoflware. 161 S.V. Raghavan el a/," OS1 Protocol Suite Implementation - An
Indian Experience", in Pmc. uJlNFOCOA1 'YI, 1991. pp 41-78.
1290