A T6P||P transport |ayer for the 0A0 of the A T6P||P transport |ayer for the 0A0 of the
6H8 Exper|ment 6H8 Exper|ment
H|k|os Koz|ovszky H|k|os Koz|ovszky for the 6H8 Tr|0A8 co||aborat|on for the 6H8 Tr|0A8 co||aborat|on 6ERN 6ERN European 0rgan|zat|on for Nuc|ear Research European 0rgan|zat|on for Nuc|ear Research A6AT03 A6AT03 -- 0ecember 2003 0ecember 2003 6H8 & 0ata Acqu|s|t|on 6H8 & 0ata Acqu|s|t|on Co|||s|or rale 10 Vlz Leve|-1 Vax|rur lr|gger rale 100 |lz Average everl s|ze - 1 Voyle No. ol lr-0ul ur|ls 1000 Readoul relWor| oardW|dlr - 1 Terao|l/s Everl l||ler corpul|rg poWer - 5 10
VlP3 0ala producl|or - Toyle/day 6H8 0etector Frontend 6omput|ng 8erv|ces Readout 8ystems F||ter 8ystems Event Hanager u||der Networks Leve| 1 Tr|gger Run 6ontro| Data Data Event buiIder : Physical system interconnecting data sources with data destinations. t has to move each event data fragments into a same destination Event fragments : Event data fragments are stored in separated physical memory systems FuII events : Full event data are stored into one physical memory system associated to a processing unit 1 2 3 3 512 1 1 2 2 512 512 3 512 Data sources for 1 MByte events ~1000s HTL processing nodes NxM EVB u||d|ng the events u||d|ng the events Distributed DAQ Iramework developed within CMS. Construct homogeneous applications Ior heterogeneous processing clusters. Multi-threaded (important to take advantage oI SMP eIIiciently). Zero copy message passing Ior the event data. Peer to peer communication between the applications. I 2 O Ior data transport, and SOAP Ior conIiguration and control. Hardware and transport independency. $ and Device Drivers HTTP Ethernet Myrinet XDAQ Util/DDM Processing $ensor readout TCP PC Subject oI presentation 0A0 Framework 0A0 Framework Reuse old, 'cheap Ethernet Ior DAQ Transport layer requirements Reliable communication Hide the complexity oI TCP EIIicient implementation Simplex communication via sockets ConIigurable Support oI blocking and non-blocking I/O T6P||P Peer Transport Requ|rements T6P||P Peer Transport Requ|rements Pending Queues Thread saIe PQ management One PQ Ior each destination Independent sending through sockets Only one 'Select Iunction call both to receive the packet and send the blocked data. |mp|ementat|on of the non |mp|ementat|on of the non--b|ock|ng mode b|ock|ng mode 1 2 3 4 5 n 1 2 3 4 5 n #2 Pending Queues XDAQ Application Framesen d 1 2 3 4 5 n #n Select Receiver Object(s) OS XDAQ Executive Peer Transport Layer ptATCP Applications (XDAQ) ptATCPPort(s) XDAQ Framework Sender Object(s) Input SAP(s) Output SAP(s) Driver(s) NIC (10GE) NIC (FE) NIC (GE) Creation oI object Sending Receiving other communication 6ommun|cat|on v|a the transport |ayer 6ommun|cat|on v|a the transport |ayer Throughput opt|m|sat|on Throughput opt|m|sat|on Single rail Multi-rail App 1 App 2 App 2 App 1 Operating System tuning (kernel optionsbuIIers) Jumbo Frames Transport protocol options Communication techniques Blocking vs. Non-Blocking I/O Single/Multi-rail Single/Multi-thread TCP options (e.g.:Nagle algorithm) .. Test network Test network Cluster size: 8x8 CPU: 2x Intel Xeon (2.4 GHz), 512KB Cache I/O system: PCI-X: 4 buses (max 6) . Memory: Two-way interleaved DDR: 3.2 GB/s (512 MB) NICs: 1 Intel 82540EM GE 1 Broadcom NeXtreme BCM 5703x GE 1 Intel Pro 2546EB GE (2port) OS: Linux RedHat 2.4.18-27.7 (SMP) Switches: 1 BATM- T6 Multi Layer Gigabit Switch (medium range) 2 Dell Power Connect 5224 (medium range) 0 20 40 60 80 100 120 140 100 1000 10000 100000 Fragment Size (Byte) T h r o u g h p u t
p e r
N o d e
( M B / s ) Iink BW (1Gbps) 8x8 EVB [P4 e1000 Powerconnect 5224] 32x32 EVB [P3 AceNIC FastIron8000] Conditions: XDAQEvent Builder o Readout Unit inputs o Builder Unit outputs o Event Manager PC: dual P4 Xeon Linux 2.4.19 NIC: e-1000 Switch: Powerconnect 5224 Standard MTU (1500 Bytes) Each BU builds 128 events Fixed Iragment sizes #0sult: For Iragment size ~ 4 kB: Thru /node ~100 MB/s i.e. 80 utilisation orking point Event u||d|ng on the c|uster Event u||d|ng on the c|uster Two Ra|| Event u||der measurements Two Ra|| Event u||der measurements %0st cas0 Bare Event Builder (2x2) W o RU inputs W o BU outputs W o Event Manager 5tions Non blocking TCP Jumbo Irames (mtu 8000) Two rail One thread #& working 5oint (16 kB) Throughput/node 240 MB/ s i.e. 95 bandwidth W Achieved 100 MB/s per node in 8x8 configuration (1rail). W mprovements seen with the use of two rail, non-blocking /, with Jumbo frames. n 2x2 configuration over 230 MB/s obtained. W High CPU load. W e are also studying other networking and traffic shaping options. 6onc|us|ons 6onc|us|ons