. PVFS2 over Quadrics is im-plemented completely in the user space overthese libraries, compliant to the modular de-sign of PVFS2. As shown in Fig. 1, itsnetworking layer, Buffer Message Interface(BMI) , is layered on top of the libelanand libelan4 libraries. More details on thedesign and initial performance evaluation of PVFS2/Quadrics can be found in .
Fig. 1. PVFS2 over Quadrics Elan4
3. Designing Zero-Copy QuadricsScatter/Gather for PVFS2 ListIO
Noncontiguous IO access is the main ac-cess pattern in scientiﬁc applications. Thakuret. al.  noted that it is important to achievehigh performance MPI-IO with native noncon-tiguous access support in ﬁle systems. PVFS2provides list IO interface to support such non-contiguous IO accesses. Fig. 2 shows an ex-ample of noncontiguous IO with PVFS2. InPVFS2 list IO, communicationbetween clientsand servers over noncontiguous memory re-gions are supported over list IO so long as thecombined destination memory is larger thanthe combined source memory. List IO can bebuilt on top of interconnects with native scat-ter/gather communication support, otherwise,it often resorts to memory packing and un-packing for converting noncontiguousmemoryfragments to contiguous memory. An alterna-tiveis to perform multiplesend and receive op-erations. This can lead to more processing andmore communication in small data chunks, re-sulting in performance degradation.
¡ ¡¢ ¢ £ £ ¤ ¤ ¥ ¥¦ ¦ § §¨¨ ¨¨ © © ! ! " " # # $ $ % %& & ' '(( (( )) ))0 0 1 12 2 3 34 4 5 5
ClientServerDisk List IOTrove
Fig. 2. An Example of PVFS2 List IO
There is a unique chain DMA mechanismover Quadrics. In this mechanism, one ormore DMA operations can be conﬁgured aschained operations with a single NIC-basedevent. When the event is ﬁred, all theDMA operations will be posted to QuadricsDMA engine. Based on this mechanism,the default Quadrics software release providesnoncontiguous communication operations inthe form of
. How-ever, these operations are speciﬁcally designedfor the shared memory programming model(SHMEM) over Quadrics. The ﬁnal placementof the data still requires a memory copy fromthe global memory to the application destina-tion memory.Tosupportzero-copyPVFS2 listIO,wepro-posea software zero-copy scatter/gathermech-anism with a single event chained to multipleRDMA operations. Fig. 3 shows a diagram
666 666 666 666 77 77 77 77 88 88 88 88 99 99 99 99 @ @ @ @ A A A A BBB BBB BBB BBB CC CC CC CC DDD DDD DDD DDD EE EE EE EE FFF FFF FFF FFF GGG GGG GGG GGG H H H H I I I I PP PP PP PP Q Q Q Q
Nsingle eventHost Event
412 3destination NIC
Destination MemorySource Memory
Fig. 3. Zero-Copy Noncontiguous Communica-tion with RDMA and Chained Event